Topology Beats Noise: Entity-Centric Detection of SSLVPN Abuse

Introduction

ElasticSearch is rarely the ‘first tool’ I would grab when performing any sort of analysis or IR. I find that, while they’ve expanded their feature set substantially in the last year, they still fall short of Splunk and Kusto in terms of raw features for their query languages. However, necessity is the mother of invention, and most patterns should still be detectable with some weaponized ES|QL.

So without further adieu, a blog is born. I’m going to call this Entity-Centric Authentication Anomaly Detection. In this blog, we’ll discuss the trials and tribulations of developing more statistically advanced detection mechanisms; where they’re weak, and where they’re excelling in my day-to-day.

A Primer on ES|QL

I’m going to generally step through this blog anticipating that individuals have at least a precursory knowledge of ElasticSearch’s ES|QL language. If you feel like you’re lost at any point, I’d strongly advocate you consult the documentation and go experiment with a discrete concept yourself. Don’t let this preempt scare you; the concepts I’m using aren’t particularly complex, but it’s easy to get lost in the sauce in query languages.

Further to this, I’ll be mentioning some columns that are unique to the organization I work at. While we (try) to abide Elastic Common Schema (ECS), there are internal warehousing mechanisms that come with performing security at scale— that individual user environments may not face. While the methods discussed in this blog are highly transposable, you may find that certain characteristics simply do not apply to your organization.

What’s important is that we acknowledge that ES|QL is simply a delivery mechanism for the broader concept of anomaly detection (and specifically, the different ways we can apply this.)

Starting Out

Every good hunt and methodology starts out with a hypothesis. I’m going to establish four pillars that I know to be generally true based on my experience, and kind of expand our scope from there. These pillars aren’t about volume. They’re about structure— how identities, infrastructure, and organizations intersect in time.

Pillar 1 - Infrastructure Reuse

The overwhelmingly vast majority of adversary authentications are non-discrete.

Let’s take a moment to muse on this. Most compromise is “happenstance” when dealing with the business segments that my day job occupies. Rarely is something a coordinated, targeted attack with truly dedicated infrastructure. Even if an adversary walks right in through the VPN, there’s a high likelihood that they are victims purely by vulnerability exposure or credential theft.

Resultantly, adversaries cannot afford (or elect not to) spin up infrastructure for individual compromise, instead favoring utilizing a Virtual Private Server (VPS) for some time, then discarding it. This is the first pillar we will exploit in our “Entity-Centric Authentication Anomaly Detection”. It’s also, conveniently, our most important.

Pillar 2 - Authentication Anomalies

Users do not tend towards authenticating from shared infrastructure.

The point I’m lining up may be painfully obvious after our second hypothesis; but it also comes with the most investigative burden. We discussed the infrastructure’s behavior in the first pillar, but now we’re looking specifically at user behavior.

There are refuting trends with this; notably globally used infrastructure such as AWS, Azure, and even shared offices and VPNs. However, for the most part, users of an organization will all use similar/same IP addresses. Remote work has made this strangely harder and easier. Because IP addresses are either generally unique per user or globally shared by most users in a given organization, we have two very extreme patterns which we can exploit for our detection.

Now— you may be thinking “Isn’t a business IP address shared infrastructure?” You’d be generally correct— but when we examine these at a global scale, it’s much easier to detect that most users authenticate from a given IP and attribute that as malicious or benign. More on this later.

Pillar 3 - Cross-Boundary Anomalies

Infrastructure tends to be associated with unique organizations.

This is the primary detection mechanism— a singular IP address should rarely cross organizational boundaries. Again, we have some refuting data: VPNs, global shared infrastructure, etc. But in general— IP address “x” should not be seen across organizations “a”, “b”, and “c”. This means that tracking entities which cross these boundaries becomes a useful tool in weighting our detection methodology.

Pillar 4 - Authentication Velocity

Benign cross-boundary authentications tend to occur with long time deltas.

When cross-boundary/cross-organization authentications do occur due to shared infrastructure, they tend to be punctuated somewhat. If you’re using a VPN service, statistically, you’re likely sharing the specific tunnel exit node with hundreds if not thousands of other people. However, the odds of them performing the same authentications to the same interfaces as you in a given time window are infinitely small.

Without this pillar, we lose context of authentication velocity. In other words, we will build a simple VPN detector without the concept of time.

Building Upon the Pillars

Let’s put all our pillars together; and examine what we might conclude with this context:

The overwhelmingly vast majority of adversary authentications are non-discrete.
Users do not tend towards authenticating from shared infrastructure.
Infrastructure tends to be associated with unique organizations.
Benign cross-boundary authentications tend to occur with long time deltas.

Jam that all together and we have our thesis: Adversaries authenticate from dedicated shared infrastructure in a manner that users do not— and we can delineate this because the authentications occur over a short time delta with shared attributes such as AS/IP across rational boundaries.

The translation from thesis to ES|QL was less elegant than the idea.

Forming the Query

We’re going to look specifically at SonicWall SSLVPN telemetry here. Specifically, I want to examine the authentications associated with this telemetry source. We’re going to establish some pre-amble, then never mention it again.

FROM alerts-security.alerts-siem
| WHERE @timestamp >= NOW() - 90 days 
| WHERE kibana.alert.rule.name LIKE "*SonicWall*" 
| WHERE source.ip IS NOT NULL 
| WHERE source.as.number IS NOT NULL

To avoid this becoming an ES|QL tutorial, this simply establishes patterns I do/do not care to examine. (SonicWall authentications in the last year where I have data to pivot on.)

Time Bucketing

Now we mentioned the concept of velocity. I delineate this from volume, because we’re referring to the quantity of something in relation to time. Where acceleration might be the derivative (rate of change) of authentications over time, instead velocity is the authentication rate at a point in time. I understand that’s a bit to chew on, but consider it the ‘volume at a point in time’ rather than the raw volume.

How do we do this in query languages? Time buckets! But it would be too easy if we could do this ergonomically with ElasticSearch’s Bucket feature, so we need a way to manually define a bucket.

| EVAL hour_bucket = DATE_TRUNC(1 hour, @timestamp)

Fortunately, we can do that! All we’re doing here is trimming the timestamp down to the hour it occurred.

If user A authenticated at 17:00:00 UTC, and user B authenticated at 17:59:59 UTC, they would both be trimmed down to hour bucket 17:00 following this evaluation statement. I’d be remiss if I didn’t acknowledge the caveat here— malicious activity crossing over the ‘hour’ boundary is less likely to get detected. There is, in theory, a pattern of compromise that I will discuss later that this hunt is completely blind to.

It’s important we acknowledge these shortfalls and layer defenses around them.

| STATS 
	auth_count     = COUNT(*), 
	accounts_count = COUNT_DISTINCT(account.id), 
	users_count    = COUNT_DISTINCT(user.name),
	org_names      = VALUES(source.as.organization.name),
	user_names     = VALUES(user.name),
	account_ids    = VALUES(account.id) 
	BY hour_bucket, source.ip, source.as.number

The second half of our self-implemented time-bucketing logic is the above statement. This is where I anticipate regular users of ES|QL for basic pivots may get a little lost, but the logic is almost as-read.

If we understood EVAL statements, we can pretend that an EVAL predicates every assignment (indicated by ”=”). Essentially, we’re saying give me the number of authentications, the DISTINCT number of accounts and users involved, and then the autonomous systems (AS), users, and account IDs for internal warehousing purposes.

After that, our “hour_bucket” makes a reappearance— we’re essentially saying “group all that data by” the hour it occurred, the source IP it occurred from, and the AS it occurred from. This gives us the ‘time’ aspect of authentications. Instead of grouping by simple IP, AS, etc., we can combine these factors to start to make some decisions about the rate that activity occurs at (our velocity).

Scoring System

We need a way to sort through this data— as it stands right now, we’re given 3 distinct values of interest:

auth_count - The number of authentications in a given hour, from an IP address
accounts_count - The number of boundaries crossed in a given hour from an IP address.
users_count - The number of users authenticated in a given hour from an IP address.

So we can set to deciding how much each of these factors into our overall decision.

| EVAL malicious_score = (auth_count / 5.0) + ((accounts_count - 1) * 3) + (users_count - 1)

This was my initial stab at weights, and it took a lot of refining to get something I was pleased with. Do note these change per environment. Your weights, or the weights of your global/regional/local SOC, will differ dramatically.

Your initial question might be “Are these values arbitrary” as you unpack this? Yes— yes they are. We need to start somewhere.

We are taking the sum of these three parameters with weight modifiers (which are multiplication or division operations applied to each.)

auth_count / 5.0
- This is a modifier to the authentication velocity. Recall our hypothesis— cross-boundary authentications tend to occur with long time deltas.
- If a single IP address authenticates 400 times over an hour, this will yield a score of 80 for that given row.
(accounts_count - 1) * 3
- This is a modifier to the unique accounts accessed. Once again, refer to our hypothesis— IP addresses are reused by adversaries and rarely cross boundaries in typical use. In other words— we can heavily weight this parameter in global SOC operations.
- Of note, we see the subtraction here, this is a tuning lever. What do we consider benign cross-boundary authentication? Well, 1— they haven’t crossed any boundary. We can turn this up to start to exclude things like shared infrastructure. Perhaps we’re uninterested in IP’s that authenticate to two environments. Well, we subtract 2.
- To keep with our example, perhaps an IP address authenticates to 3 environments. It would yield a score here of (3 - 1) * 3 = 6
- Editor’s note: You can probably see now why this initial pass was insufficient for weight tuning.
users_count-1
- The number of unique users authenticating from an IP. Simple enough, we’ll resort again to our pillars. Users do not tend towards authenticating from shared infrastructure. We need to account for legitimate business use, but we have some tuning tricks that can help make this a useful parameter without drowning us in noise.

Finishing Up

All that’s left now is to display this data. We’ll go back to tuning in a bit.

| KEEP 
	hour_bucket,
	malicious_score,
	auth_count,
	accounts_count,
	users_count,
	source.ip,
	source.as.number,
	org_names,
	user_names,
	account_ids
| SORT 
	malicious_score DESC,
	auth_count DESC,
	accounts_count DESC

Again we can kind of gloss over this, we sort first so that highest malicious score appears at the top, followed by the number of authentications if they’re the same, followed by the number of accounts.

This helps us prioritize and triage multi-organization compromise, which is patently a higher priority than a single organization being unilaterally compromised, just due to time investment. “Hey shut off your SonicWall, something bad is happening,” is a lot quicker to get out to one person than 30.

Playing with Levers

I discussed previously that my initial stab at weights was wrong; in fact, it’s still probably not perfect— these systems are fickle. These weights (levers) are tied to our hypothesis pillars and as we turn one lever, the others are inversely affected. One compromise may have hundreds of users but only a few accounts, another may have hundreds of accounts but only target the ‘admin’ account of a given organization. (Which, quite annoyingly, is not surfaced by our query properly. More on how to avert this in another blog post. Shoutout Anton for identity hashing though.)

The obvious truth was that I was improperly weighting things initially, and had to iterate. So how’d we do this?

The easiest way: Look for known compromise. You see, when I said that without some of our pillars, we build a VPN detector— that’s not actually far from what I did initially. I simply looked at IP addresses that authenticated across boundaries— and that’s it. This itself worked shockingly well, aside from the slew of false positives I had to deal with. However, of note, it gave me some working data. I’ll discuss the result of all this hunting later, but the reality is I knew what I was looking for before I started.

Because of this, I can take clusters that this is designed to highlight, and play with tuning until those are disproportionately moved upwards in weight. In theory, my system should additionally drag up clusters I may not know about because it is less naïve; or at minimum quell false positives so that only malicious clusters remain.

Multi-Account Compromise

The first and most obvious lever was to weight IP addresses that authenticate across boundaries much higher. How’d we go about this?

Recall our formula was (accounts_count - 1) * 3. First, I observed an annoying but convenient pattern. Rarely did adversaries access only two environments from shared infrastructure. Less rarely, were environments co-managed. This means that there were very benign and usual phenomenon where these boundaries were crossed.

Quelling this is actually pretty simple, we’re just going to increment the amount we’re subtracting so that the accounts_count parameter becomes 0 more often. We’ll use 2 here. Adversaries that authenticate across 1 trust boundary (two separate organizational accounts) are thereby less weighted; but also administrative activity is quelled. Truth be told, this was a fair weight and of the known-compromise cases— this affected very few. Again, defense in depth.

We can also weight this significantly higher. In fact, we’re going to weight it by 10, so that (paired with some other changes) we start to surface boundary-crossing compromise more frequently.

So now our weight looks like this: (accounts_count - 2) * 10 And multi-account compromise is now weighted much heavier!

Dealing with Volume

The number of automated systems hosted on AWS that rapidly authenticate to VPNs to perform some action is insanely high. Ludicrously so for some organizations. We need to quell the velocity parameter a bit. We can make use of our lever in auth_count.

Instead of auth_count / 5.0, we’ll pursue something a little more ambitious. I found auth_count / 15.0 to be tolerable. This takes our 400 auths in an hour from an IP from a score of 80 to a score of 26.6666 (repeating of course.)

This prevents us from drowning in velocity without requiring we ignore it entirely.

The Fruits of Our Labors

Now we’re availed with a simple query which examines authentications over time and applies the formula (auth_count / 15.0) + ((accounts_count - 2) * 10) + (user_count - 1).

What did we find? Well, I alluded to the fact that we already knew about a cluster of compromises— but this pattern further cemented this in a manner that was immediately brought to the top of our list, easily actionable, and trivially maintainable.

The query uncovered over 200 malicious SonicWall SSLVPN authentications, often from shared infrastructure. It reached back 90 days in time, uncovering a swathe of cases which were trivially attributed to shared infrastructure and a small group of TTPs and authentication patterns which undeniably highlighted:

Autonomous access to SSLVPN accounts
Using alphabetized user lists
Where authentications rarely failed
And shared infrastructure that was being intermittently rotated between low-reputation service providers.

An impressive threat picture for an afternoon playing with ES|QL.

Discussing Shortfalls

Before we conclude this article, I think it’s important to be aware of our blindspots.

First, we make concessions in our scoring systems. We exclude compromise that crosses a single trust boundary. We can help shore up detections here by looking at organizational behaviors and anomalous/first time authentications from AS within that organization.

Second, we suffer due to bucketing. All of our parameters depend on occurrences per hour, which means that a theoretical pattern exists where:

An ‘admin’ account is compromised (1 username)
Across two different ‘accounts’ (1 boundary)
At ~17:59:59 UTC.

Suppose then this pattern repeats again at 18:01:00 UTC. Does our detection capture it? Technically, it would, but it would not be properly weighted. What might be a strong score (44) would instead be reduced by half because it spans two time buckets.

Again, there are levers and techniques to tune around this, but it’s important to acknowledge blind spots.

Key Takeaways

Authentication detection is often treated as a rate problem. It isn’t.

It’s a topology problem.

We can form hypotheses based on real world experience and intrusions such as:

The overwhelmingly vast majority of adversary authentications are non-discrete.
Users do not tend towards authenticating from shared infrastructure.
Infrastructure tends to be associated with unique organizations.
Benign cross-boundary authentications tend to occur with long time deltas.

These can help inform hunts and detections which are designed to exploit adversarial patterns, and tuned in a manner that causes malicious behavior to stand out.

With it, we can hunt through telemetry such as SSLVPN authentications to reveal compromise with much higher fidelity and confidence, and use this data to prioritize efforts and identify clusters of activity worth tracking.

ES|QL is but one manner to do this, any modern query language is capable of similar tactics, and by thinking with an analytical mindset, we can start to write detections more advanced than simple pattern recognition.