Musings and confusings. All things DFIR.

Category: Uncategorized

Windows RDP-Related Event Logs: Identification, Tracking, and Investigation

Early in my DFIR career, I struggled with understanding how exactly to identify and understand all the RDP-related Windows Event Logs. I would read a few things here and there, think I understood it, then move on to the next case – repeating the same loop over and over again and never really acquiring full comprehension. That is until one day I finally got tired of repeating the same questions/research and just made a cheat sheet laying out the most common RDP-related Event ID’s that I’d encountered along with their relevance and descriptions. From that point on, as I sporadically encountered related questions/confusion from others in the community, I would simply refer to my cheat sheet to provide an immediate response or clarification – saving them from the hours of repeated questioning and research I had already done.

However, it seems the community continues to encounter the same struggle in identifying and understanding RDP-related Windows Event Log ID’s, where each is located, and even what some of them mean (no thanks to some of Microsoft’s very confusing documentation and descriptions). As such, I recently set out to try and find an easy route to the solution for this problem (i.e. hopefully find a single website to point to with all this information). Though I’ve found parts of the answer in posts here and there, each of them were missing parts of the puzzle (either missing ID’s, descriptions, explanations, and/or overall how they fit together in a chronological fashion). I will say JPCERTCC did an awesome job capturing a ton of information here, I just can’t quite decipher or discern the clear order of events and some appear out of order (at least how I have encountered them, but maybe I’m reading it wrong…). At any rate, as they say, necessity is the mother of invention.

So, I decided to create a blog post that I hope can serve as a succinct one-stop shop for understanding and identifying the most commonly encountered and empirically useful* RDP-related Windows Event Log ID’s/entries for tracking and investigating RDP usage on a Windows Vista+ endpoint. The Windows Event ID’s in the XP days were different than those in Vista+ Operating Systems. So, I decided to leave those out for now, but perhaps I will add them in the future.

*Yes, there are Event ID’s like 1146, 1147, and 1148 which look great in Microsoft’s documentation as a very useful source of information. However, I’ve yet to see (m)any of these commonly occurring in the wild.

I debated back and forth on the best way to sort/group these. Ultimately, in truly pragmatic fashion, I figured it would likely be most useful to sort them in the (chronological) order in which you might expect to find them. Ergo, the flow/section breakup is the following:

Network Connection
->-> Authentication
->-> Logon
->-> Session Disconnect/Reconnect
->-> Logoff

Network Connection

This section covers the first indications of an RDP logon – the initial network connection to a machine.

Log: Microsoft-Windows-Terminal-Services-RemoteConnectionManager/Operational

Log Location: %SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-TerminalServices-RemoteConnectionManager%4Operational.evtx

Event ID: 1149
Provider Name: Microsoft-Windows-Terminal-Services-RemoteConnectionManager
Description: “User authentication succeeded”
Notes: Despite this seemingly clear-cut description, this event actually DOES NOT indicate a successful user authentication in the sense that many might expect (e.g., successful input and acceptance of a username and password). Instead, “authentication” in this sense is referring to successful network authentication, as in someone successfully executed an RDP network connection to the target machine and it successfully responded and displayed a login window for the next step of entering credentials. For example, if I launched the RDP Desktop Connection program on my computer, input a target IP, and hit enter, it would simply display the target system’s screen and produce an 1149 Event ID indicating I had successfully connected to the target, WELL BEFORE I even entered any credentials. So, repeat after me, “An Event ID 1149 DOES NOT indicate successful authentication to a target, simply a successful RDP network connection”.
TL;DR: NOT AN AUTHENTICATION. Someone launched an RDP client, specified the target machine (possibly with a username and domain), and hit enter to make a successful network connection to the target. Nothing more, nothing less.


Authentication

This section covers the authentication portion of the RDP connection – whether or not the logon is allowed based on success/failure of username/password combo.

Log: Security

Log Location: %SystemRoot%\System32\Winevt\Logs\Security.evtx

Event ID: 4624
Provider Name: Microsoft-Windows-Security-Auditing
LogonType: Type 3 (Network) when NLA is Enabled (and at times even when it’s not) followed by Type 10 (RemoteInteractive / a.k.a. Terminal Services / a.k.a. Remote Desktop) OR Type 7 from a Remote IP (if it’s a reconnection from a previous/existing RDP session)
Description: “An account was successfully logged on”
Notes: I thought this one was pretty straight forward – just look for Type 10 logons for RDP. However, in a bit more research, I discovered that often a Type 3 logon (for NLA) will occur prior to the Type 10 logon. In addition, I also discovered that RDP’ing to a system of which you’d previously RDP’ed and not formally logged off/out would instead yield a Logon Type 7 logon versus the Logon Type 10 we’d expect. This makes sense in a way in that a Logon Type 7 (“This workstation was unlocked”) is essentially what is happening. However, to delineate this from non-RDP Type 7 logons in which a person was sitting at the machine and just unlocked the machine, we can look for remote non-local IP’s in the IpAddress field.
TL;DR: User successfully logged on to this system with the specified TargetUserName and TargetDomainName from the specified IpAddress.


Event ID: 4625
Provider Name: Microsoft-Windows-Security-Auditing
LogonType: Type 3 (Network) when NLA is Enabled (and at times even when it’s not) and/or Type 10 (RemoteInteractive / a.k.a. Terminal Services / a.k.a. Remote Desktop)
Description: “An account failed to log on”
Notes: Why do we care about failures? Well, this is helpful in identifying (brute force) failure attempts and seeing when/where an attacker may be attempting stolen/compromised credentials. The Status/Sub Status Code will also be helpful in delineating legitimate failures (e.g. “expired password”) as well as possibly providing insight into attacker activity (e.g. repetitive “user name does not exist” codes could indicate brute force guessing by a tool and/or a more targeted lack of username knowledge/awareness in the environment by the attacker).
TL;DR: User failed to log on to this system with the specified TargetUserName and TargetDomainName from the specified IpAddress.

#ProTip(s):

1) When NLA is enabled, a failed RDP logon (due to wrong username, password, etc.) will result in a 4625 Type 3 failure. When NLA is not enabled, you *should* see a 4625 Type 10 failure.

2) Both of these entries also contain a “SubjectLogonID” or a “TargetLogonID” field. This ID is unique for each logon session and is also present in various other Event Log entries, making it theoretically useful for tracking/delineating a specific user’s activities, particularly on systems allowing multiple logged on users. However, do take note that a unique *LogonID is assigned for each session, meaning if a user connects, then disconnects (without logging out, thus simply ending the current session), then reconnects (i.e. starting a new session), they will be assigned a different unique *LogonID. All to say that a single user(name) may have multiple unique *LogonID’s to track depending how many sessions they’ve instantiated, not to mention Windows makes it very confusing sometimes with multiple 4624’s with different *LogonID’s for the same session. So, YMMV.

Additional References:

David Cowen’s Forensic Lunch Test Kitchen – RDP Testing (1 , 2 , 3)
Microsoft Forum Answer Re: RDP 4624 Type 3 Logons (link)

Logon

This section covers the ensuing (post-authentication) events that occur upon successful authentication and logon to the system.

Log: Microsoft-Windows-TerminalServices-LocalSessionManager/Operational

Log Location: %SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-TerminalServices-LocalSessionManager%4Operational.evtx

Event ID: 21
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Remote Desktop Services: Session logon succeeded:”
Notes: This typically immediately precedes an Event ID 22 when the “Source Network Address” contains a remote IP address. Note that a “Source Network Address” of “LOCAL” simply indicates a local logon and does NOT indicate a remote RDP logon. this event with a “Source Network Address” of “LOCAL” will also be generated upon system (re)boot/initialization (shortly before the proceeding associated Event ID 22) . For remote RDP logons, take note of the SessionID as a means of tracking/associating additional Event Log activity with this user’s RDP session.
TL;DR: Indicates successful RDP logon and session instantiation, so long as the “Source Network Address” is NOT “LOCAL”.

Event ID: 22
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Remote Desktop Services: Shell start notification received:”
Notes: This typically immediately proceeds an Event ID 21. Note that a “Source Network Address” of “LOCAL” simply indicates a local logon and does NOT indicate a remote RDP logon. This event with a “Source Network Address” of “LOCAL” will also be generated upon system (re)boot/initialization (shortly after the preceding associated Event ID 21).
TL;DR: Indicates successful RDP logon and shell (i.e. Windows GUI Desktop) start, so long as the “Source Network Address” is NOT “LOCAL”.

Session Disconnect/Reconnect

This section covers the various session disconnect/reconnect events that might occur due to either system (idle), network (network disconnect), or purposeful user (X out of the RDP window, Start -> Disconnect, Kicked off by another user, etc.) action.

Log: Microsoft-Windows-TerminalServices-LocalSessionManager/Operational

Log Location: %SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-TerminalServices-LocalSessionManager%4Operational.evtx

Event ID: 24
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Remote Desktop Services: Session has been disconnected:”
Notes: The user has disconnected from an RDP session, when the “Source Network Address” contains a remote IP address. A “Source Network Address” of “LOCAL” simply indicates a local session disconnection and does NOT indicate a remote RDP disconnection. Note the “Source Network Address” for the source of the RDP connection. This is typically paired with an Event ID 40. Also take note of the SessionID as a means of tracking/associating additional Event Log activity with this user’s RDP session.
TL;DR: The user has disconnected from an RDP session, so long as the “Source Network Address” is NOT “LOCAL”.


Event ID: 25
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Remote Desktop Services: Session reconnection succeeded:”
Notes: The user has reconnected to an RDP session, when the “Source Network Address” contains a remote IP address. A “Source Network Address” of “LOCAL” simply indicates a local session reconnection and does NOT indicate a remote RDP session reconnection. Note the “Source Network Address” for the source of the RDP connection. This is typically paired with an Event ID 40. Take note of the SessionID as a means of tracking/associating additional Event Log activity with this user’s RDP session.
TL;DR: The user has reconnected to an existing RDP session, so long as the “Source Network Address” is NOT “LOCAL”.


Event ID: 39
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Session <X> has been disconnected by session <Y>”
Notes: This indicates that a user has formally disconnected from an RDP session via purposeful Disconnect (e.g., via the Windows Start Menu Disconnect option) versus simply X’ing out of the RDP window. Cases where the Session ID of <X> differs from <Y> may indicate a separate RDP session has disconnected (i.e. kicked off) the given user.
TL;DR: The user formally disconnected from the RDP session.


Event ID: 40
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Session <X> has been disconnected, reason code <Z>”
Notes: In true Microsoft fashion, although the description is always “Session has been disconnected”, these events also indicate/correlate to reconnections, not just disconnections. The most helpful information here is the Reason Code (a function of the IMsRdpClient::ExtendedDisconnectReason property), the list of which can be seen here (and this pairs it with the codes to make it easier to read). Below are some examples of codes I encountered during my research.
0 – “No additional information is available.” (Occurs when a user informally X’es out of a session, typically paired with Event ID 24)
5 – “The client’s connection was replaced by another connection.” (Occurs when a user reconnects to an RDP session, typically paired with an Event ID 25)
11 – “User activity has initiated the disconnect.” (Occurs when a user formally initiates an RDP disconnect, for example via the Windows Start Menu Disconnect option.)
TL;DR: The user disconnected from or reconnected to an RDP session.


Log: Security

Log Location: %SystemRoot%\System32\Winevt\Logs\Security.evtx

Event ID: 4778
Provider Name: Microsoft-Windows-Security-Auditing
Description: “A session was reconnected to a Window Station.”
Notes: Occurs when a user reconnects to an existing RDP session. Typically paired with Event ID 25. The SessionName, ClientAddress, and LogonID can all be useful for identifying the source and associated activity.
TL;DR: The user reconnected to an existing RDP session.


Event ID: 4779
Provider Name: Microsoft-Windows-Security-Auditing
Description: “A session was disconnected from a Window Station.”
Notes: Occurs when a user disconnects from an RDP session. Typically paired with Event ID 24 and likely Event ID’s 39 and 40. The SessionName, ClientAddress, and LogonID can all be useful for identifying the source and associated activity.
TL;DR: The user disconnected from from an RDP session.


Logoff

This section covers the events that occur after a purposeful (Start -> Disconnect, Start -> Logoff) logoff.

Log: Microsoft-Windows-TerminalServices-LocalSessionManager/Operational

Log Location: %SystemRoot%\System32\Winevt\Logs\Microsoft-Windows-TerminalServices-LocalSessionManager%4Operational.evtx

Event ID: 23
Provider Name: Microsoft-Windows-TerminalServices-LocalSessionManager
Description: “Remote Desktop Services: Session logoff succeeded:”
Notes: The user has initiated a logoff. This is typically paired with an Event ID 4634 (logoff). Take note of the SessionID as a means of tracking/associating additional Event Log activity with this user’s RDP session. This event with a will also be generated upon a system shutdown/reboot.
TL;DR: The user initiated a formal system logoff (versus a simple session disconnect).


Log: Security

Log Location: %SystemRoot%\System32\Winevt\Logs\Security.evtx

Event ID: 4634
Provider Name: Microsoft-Windows-Security-Auditing
LogonType: 10 (RemoteInteractive / a.k.a. Terminal Services / a.k.a. Remote Desktop) OR Type 7 from a Remote IP (if it’s a reconnection from a previous/existing RDP session)
Description: “An account was logged off.”
Notes: These occur whenever a user simply disconnects from an RDP session or formally logs off (via Windows Start Menu Logoff). This is typically paired with an Event ID 21 (RDP Session Logoff). I’ve also discovered these will also be paired (i.e. occur at the same time) with successful authentications (Event ID 4624). Why, I have no idea.
TL;DR: A user disconnected from, or logged off, an RDP session.


Event ID: 4647
Provider Name: Microsoft-Windows-Security-Auditing
Description: “User initiated logoff:”
Notes: Occurs when a user initiates a formal system logoff and is not necessarily RDP specific. You will need to use some reasoning and temporal analysis to understand if/when it is related to a system logoff via an RDP session or is from a local interactive session as there is no LogonType associated specify which it is.
TL;DR: The user initiated a formal logoff (NOT a simple disconnect).


Log: System

Log Location: %SystemRoot%\System32\Winevt\Logs\System.evtx

Event ID: 9009
Provider Name: Desktop Window Manager
Description: “The Desktop Window Manager has exited with code (<X>).”
Notes: Occurs when a user formally closes an RDP connection and indicates the RDP desktop GUI has been shut down as a result. This is useful to identify a closed/finalized RDP connection. Though, this event is not always produced for reasons I do not know.
TL;DR: A user has closed out an RDP connection.


Wrap-Up

Hopefully that provides a little better insight into some of the most common and (IME) most empirically useful RDP-related Event logs, when/where you might encounter them, what they mean, what they look like, and (most importantly) how they all fit together.

As a result of this post, Richard Davis (@richarddavisg, @13CubedDFIR) of 13Cubed on YouTube has also put together an RDP flow chart that is very helpful in visualizing the expected (though, not guaranteed) flow of these logs. Feel free to check out his short video walkthrough as well.

The Importance of Incident Scoping/Assessment

In consulting, all engagements begin with what we refer to as “scoping” in order to, at a very high level, determine if/how we may be able to help a client. Sure, they can sometimes be arduous or monotonous and often involve a comedy of conference call errors, but they are absolutely CRITICAL to the success of the engagement. So critical, in fact, that I wanted to take some purposeful time to address this often-overlooked, admittedly very unsexy, but nonetheless integral piece of performing effective DFIR.

Though I’m approaching this from a consulting point of view, this is in no way unique nor applicable only to consulting. So, don’t tune out now just because, “I don’t have clients.” You do, actually, whether they are external entities or people within your own organization. Each of us has “clients” in various forms that rely (often heavily) on us as DFIR professionals to ask the right questions and respond with the appropriate guidance and/or actions to mitigate threats and protect them from harm. As such, every DFIR professional will find themselves in situations where they must first comprehensively understand the problem(s) and issue(s) at hand before it can be effectively addressed in response and analysis. In fact, this is often where many of us find ourselves at the onset of an alert or suspected compromise that requires a well-formulated and concerted DFIR response. Though it can be very difficult to take step back and perform a high level assessment instead of diving right into action (especially when you may have external entities and/or higher level management wanting answers LIKE YESTERDAY), doing so will pay you back ten-fold.

At a lower level, scoping (or an initial assessment) must be performed as a due diligence effort to acquire all appropriate/pertinent information, addressing a variety of aspects that may affect the success, efficacy, or efficiency of the investigation. From my private sector experience dealing with a wide array of clients from all types of verticals, industries, and organizational sizes, I’ve attempted to unjustly distill the scoping topic and question set down a few (ok, a few more than a few) important bullets provided below.

However, do keep in mind that this is not intended to be all-inclusive set of questions to ask by any means. Rather, the below set of topics and specific questions are intended to serve as a solid baseline in facilitating comprehensive response and should be augmented or modified as needed. In addition, the below questions are from the perspective of me/us asking an external entity (i.e. client). So, feel free to change/replace the pronouns as well as instances of “the client” as needed for your use and application.

  • Purpose: Understand the background of the situation
    • How did we (the client/organization) get here?
    • When and how did they first notice any sort of issue?
    • Can they think of any legitimate (non-malicious) causes for the current issue?
    • Have there been any other anomalies either previous to or during the timeframe of concern that may be related to the current issue?
  • Purpose: Understand what responsive actions have been performed to date
    • Are the systems still running or have they been powered down?
    • Have firewall blocks been put in place?
    • (Tons more questions I ask…)
    • Have any external entities been notified (particularly if under certain regulations)?
  • Purpose: Understand the set of artifacts available to assist us in our response
    • What type(s) of machines are involved (workstations/servers, OS)?
    • Are they virtual or physical?
    • Where are they hosted, and do they span disparate physical locations?
    • What specific logging is enabled (host, network, and appliance)?
    • …and which of the above are actually collected and available?
    • …and do the available logs contain the right (useful) content?
      • Let’s just say it is not uncommon to get “VPN logs” that simply show syslog interface up/down, “Web Logs” that contain only the load balancer as the “client-ip”, or “DNS logs” that aren’t logging query/response and logging only the IP of the DNS server as the “client”.
    • …and do the available logs span the time frame(s) of concern/interest?
  • Purpose: Identify the goals (in priority order) for the engagement
    • Ask the question, “What would a successful engagement look like to you?”
    • Is the client interested in simply getting back to business?
    • Would they like a root cause analysis (how exactly did this get in/past their defenses)?
      • If so, are they looking to specifically prove/disprove something?
    • Are they simply looking to check a box (fulfilling some sort of requirement)?
      • Often this will not be stated, but you can determine it by asking certain questions and gauging certain entities’ investment in the response.
    • What would they like us to do, specifically, in priority order?
      • Now, we don’t take this verbatim and slap it into a SoW, but we do use it to guide our response as best as possible to achieve their priority goals in tandem with an appropriate, comprehensive, and best practice response.

Though it is not uncommon for a situation to merit even further questioning and level of excruciating detail, depending on complexity, the above set of topics are what I would consider the minimum requirements for a comprehensive initial assessment. Suffice to say that misunderstandings or simple lack of required information in any one of these areas can lead to a variety of undesirable consequences, ranging from that of a simply sub-optimal outcome to that of a completely disastrous one for both the company and the client.

The DFIR industry is a “pay now or pay later” industry, and this is no exception. As such, I highly encourage all of us to be purposeful in spending our time performing due diligence on the front end instead of paying the consequences of not doing so throughout our investigations.

/JP

Know Your Tools: Linux (GNU) vs. Mac (BSD) Command Line Utilities

Welcome to first post in the “Know Your Tools” series!

Without further ado…

Have you ever wondered if/how *nix command line utilities may differ across distributions? Perhaps it never even occurred to you that there was even a possibility the tools were any different. I mean, they’re basic command line tools. How and why could/would they possibly differ?

Well, I’m here to say… thy basic command line utilities art not the same across different distributions. And, the differences can range from those that can cause a simple nuisance to those that can cause oversight of critical data.

Rather than going into aspects of this discussion that have already been covered such as how Linux and BSD generally differ, I would instead like to focus on a few core utilities commonly used in/for DFIR artifact analysis and some caveats that may cause you some headache or even prevent you from getting the full set of results you’d expect. In highlighting the problems, I will also help you identify some workarounds I’ve learned and developed over the years in addressing these issues, along with an overarching solution at the end to install GNU core utilities on your Mac (should you want to go that route).

Let’s get to it.

Grep

Grep is one of the most useful command-line utilities for searching within files/content, particularly for the ability to use regular expressions for searching/matching. To some, this may be the first time you’ve even heard that term or “regex” (shortened version of it). Some of you may have been using it for a while. And, nearly everyone at some point feels like…

Amirite?

Regardless of whether this is your first time hearing about regular expressions or if you use them regularly albeit with some level of discomfort, I HIGHLY suggest you take the time to learn and/or get better at using them – they will be your most powerful and best friend for grep. Though there is a definite regex learning curve (it’s really not that bad), knowing how to use regular expressions translates directly to performing effective and efficient searches for/of artifacts during an investigation.

Nonetheless, even if you feel like a near master of regular expressions, equally critical to an expression’s success is how it is implemented within a given tool. Specifically for grep, you may or may not be aware that it uses two different methods of matching that can highly impact the usefulness (and more important, validity) of results returned – Greedy vs. Lazy Matching. Let’s explore what each of these means/does.

At a very high level, greedy matching attempts to find the last (or longest) possible match, and lazy matching attempts to find the first possible match (and stops there). More specifically, greedy matching employs what is called backtracking and look-behind’s but that is a separate discussion. Suffice to say, using an incorrect, unintended, and/or unexpected matching method can completely overlook critical data or at the very least provide an inefficient or invalid set of results.

Now having established some foundational knowledge about how grep searches can work, we will drop the knowledge bomb – the exact same grep expression on Linux (using GNU grep) may produce completely different or no results on Mac (using BSD grep), especially when using these different types of matching.

…What? Why?

The first time I found this out I spent an inordinate and unnecessary amount of time banging my head against a wall typing and re-typing the same expression across systems but seeing different results. I didn’t know what I didn’t know. And, well, now I hope to let you know what I didn’t know but painfully learned.

While there is an explanation of why, it doesn’t necessarily matter for this discussion. Rather, I will get straight to the point of what you need to know and consider when using this utility across systems to perform effective searches. While GREEDY searches execute pretty much the same across systems, the main difference comes when you are attempting to perform a LAZY search with grep.

We’ll start with GREEDY searches as there is essentially little to no difference between the systems. Let’s perform a greedy search (find the last/longest possible match) for any string/line ending in “is” using grep’s Extended Regular Expressions option (“-E”).

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+is'
thisis

Both systems yield the same output using a completely transferrable command. Easy peasy.

Note: When specifying Extended Regular Expressions, you can (and I often do) just use “egrep” which implies the “-E” option.

Now, let’s look at LAZY searches. First, how do we even specify a lazy search? Well, to put it simply, you append a “?” to your matching sequence. Using the same search as before, we’ll instead use lazy matching (find the first/shortest match) for the string “is” on both the Linux (GNU) and Mac (BSD) versions of grep and see what both yield.

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+?is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+?is'
this

Here the fun begins. We did the exact same command on both systems and it returned different results.

Well, for LAZY searches, Linux (GNU) grep does NOT recognize lazy searches unless you specify the “-P” option (short for PCRE, which stands for Perl Compatible Regular Expressions). So, we’ll supply that this time:

(Linux GNU)$ echo “thisis” | grep -Po ‘.+?is'
this

There we go. That’s what we expected and hoped for.

*Note: You cannot use the implied Extended expression syntax of “egrep” here as you will get a “conflicting matchers specified” error. Extended regex and PCRE are mutually exclusive in GNU grep.

Note that Mac (BSD), on the other hand, WILL do a lazy search by default with Extended grep. No changes necessary there.

While not knowing this likely won’t lead to catastrophic misses of data, it can (and in my experience will very likely) lead to massive amounts of false positives due to greedy matches that you have to unnecessarily sift through. Ever performed a grep search and got a ton of very imprecise and unnecessarily large (though technically correct) results? This implementation difference and issue could certainly have been the cause. If only you knew then what you know now…

So, now that we know how these searches differ across systems (and what we need to modify to make them do what we want), let’s see a few examples where using lazy matching can significantly help us (note: I am using my Mac for these searches, thus the successful use of Extended expressions using “egrep” to allow for both greedy and lazy matching)…

User-Agent String Matching
Let’s say I want to identify and extract the OS version from Mozilla user-agent strings from a set of logs, the format of which I know starts with “Mozilla/“ and then contains the OS version in parenthesis. The following shows some examples:

  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
  • Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)

Lazy Matching
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+?\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)

Searching for Malicious Eval Statements
Let’s say I want to identify and extract all of the base64 eval statements from a possibly infected web page for analysis, so that I can then pipe it into sed to extract only the base64 element and decode it for plaintext analysis.

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+\)'
eval(base64_decode("DQplcnJvcl9yZ=")); var ua = navigator.userAgent.toLowerCase()

Lazy Matching (matches exactly what we want)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+?\)'
eval(base64_decode("DQplcnJvcl9yZ=“))

There you have it. Hopefully you are now a bit more informed not only about the differences between Lazy and Greedy matching, but also about the difference in requirements across systems.

Strings

Strings is an important utility for use in extracting “human-readable” strings from files/binaries. It is particularly useful in extracting strings from (suspected) malicious binaries/files to attempt to acquire some insight into what may be contained within the file, its capabilities, hard-coded domains/URL’s, commands, … the list goes on.

However, not all strings are created equal. Sometimes, Unicode strings exist within a file/program/binary for various reasons, those of which are also important to identify and extract. By default, the GNU (Linux) strings utility searches for simple ASCII encoding, but also allows you to specify additional encodings for which to search, to include Unicode. Very useful.

By default, the Mac (BSD) strings utility also searches for simple ASCII encoding; however, I regret to inform you that the Mac (BSD) version of strings does NOT have the native capability to search for Unicode strings. Do not ask why. I highly encourage you to avoid the rabbit hole of lacking logic that I endured when I first found this out. Instead, we should move on and instead just be asking ourselves, “What does this mean to me?” Well, if you’ve only been using a Mac to perform string searches using the native BSD utility, you have been MISSING ALL UNICODE STRINGS. Of all the pandas, this is a very sad one.

So, what are our options?

There are several options, but I personally use one of the following (depending no the situation and my mood) when I need to extract both Unicode and ASCII strings from a file using a Mac (BSD) system:
1. Willi Ballenthin’s Python strings tool to extract both ASCII and Unicode strings from a file
2. FireEye’s FLOSS tool (though intended for binary analysis, it can also work against other types of files)
3. GNU strings*

*Wait a minute. I just went through saying how GNU strings isn’t available as a native utility on a Mac. So, how can I possibly use GNU strings on it? Well, my friends, at the end of this post I will revisit exactly how this can be achieved using a nearly irreplaceable third-party package manager.

Now, go back and re-run the above tools against various files and binaries from your previous investigations you performed from the Mac command line. You may be delighted at what new Unicode strings are now found 🙂

Sed

Sed (short for “Stream editor”) is another useful utility to perform all sorts of useful text transformations. Though there are many uses for it, I tend to use it mostly for substitutions, deletion, and permutation (switching the order of certain things), which can be incredibly useful for log files with a bunch of text.

For example, let’s say I have a messy IIS log file that somehow lost all of its newline separators and I want to extract just the HTTP status code, method, and URI from each line and output into its own separate line (restoring readability):

…2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test…

Looking at the pattern, we’d like to insert a newline before each instance of the date, beginning with “2016-…”. Lucky for us, we’re on a Linux box with GNU sed and it can easily handle this:

(Linux GNU)$ sed 's/ \(.+\?\)2016/\1\n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa
2016-08-0112:31:17HTTP200GET/owa/profile
2016-08-0112:31:18HTTP404POST/owa/test
...

You can see that it not only handles lazy matching, but also handles ANSI-C escape sequences (e.g., \n, \r, \t, …). This statement also utilizes sed variables, the understanding of which I will leave to the reader to explore.

Sweet. Let’s try that on a Mac…

(Mac BSD)$ sed 's/\(.+\?\)\(.+\)/\1\n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test

… Ugh. No luck.

Believe it or not, there are actually two common problems here. The first is the lack of interpretation of ANSI-C escape sequences. BSD sed simply doesn’t recognize any (except for \n, but not within the replacement portion of the statement), which means we have to find a different way of getting a properly interpreted newline into the statement.

Below are a few options that will work around this issue (and there are more clever ways to do it as well).

1. Use the literal (i.e., for a newline, literally insert a new line in the expression)
(Mac BSD)$ sed ’s//\*Press Enter*
> /g'

2. Use bash ANSI-C Quoting (I find this the easiest and least effort, but YMMV)
(Mac BSD)$ sed 's//\'$'\n/g’
3. Use Perl
(Mac BSD)$ perl -pe ‘s||\n|g'

Unfortunately, this only solves the first of two problems, the second being that BSD sed still does not allow for lazy matching (from my testing, though I am possibly just missing something). So, even if you use #1 or #2 above, it will only match the last found pattern and not all the patterns we need it to.

“So, should I bother with using BSD sed or not?”

Well, I leave that up to your judgment. Sometimes yes, sometimes no. In cases like this where you need to use both lazy matching and ANSI-C escape sequences, it may just be easier to skip the drama and use Perl (or perhaps you know of another extremely clever solution to this issue). Options are always good.

Note: There are also other issues with BSD sed like line numbers and using the “-i” parameter. Should you be interested beyond the scope of this post, this StackExchange thread actually has some useful information on the differences between GNU and BSD sed. Though, I’ve found that YMMV on posts like this where the theory and “facts” may not necessarily match up to what you find in testing. So, when in doubt, always test for yourself.

Find

Of all commands, you might wonder how something so basic as find could differ across *nix operating systems. I mean, what could possibly differ? It’s just find, the path, the type, the name… how or why could that even be complicated? Well, for the most part they are the same, except in one rather important use case – using find with regular expressions (regex).

Let’s take for example a regex to find all current (non-archived/rotated) log files.

On a GNU Linux system this is somewhat straight forward:

(Linux GNU)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"

You can see here that rather than using the standard “-name” parameter, we instead used the “-regextype” flag to enable extended expressions (remember egrep from earlier?) and then used the “-regex” flag to denote our expression to utilize. And, that’s it. Bless you, GNU!

Obviously, Mac BSD is not this straight forward, otherwise I wouldn’t be writing about it. It’s not exactly SUPER complicated, but it’s different enough to cause substantial frustration as your Google searches will show that the internet is very confused about how to do this properly. I know. Shocking. Nonetheless, there is value in traveling down the path of frustration here so that you don’t have to when it really matters. So, let’s just transfer the command verbatim over to a Mac and see what happens.

(Mac BSD)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -regextype: unknown primary or operator

Great, because why would BSD find use the same operators, right? That would be too easy. By doing a “man find” (on the terminal, not in Google, as that will produce very different results from what we are looking for here) you will see that BSD find does not use that operator. Though, it still does use the “-regex” operator. Easy enough, we’ll just remove that bad boy:

(Mac BSD)$ find /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*
(Mac BSD)$

No results. Ok. Let’s look at the manual again… ah ha, to enable extended regular expressions (brackets, parenthesis, etc.), we need to use the “-E” option. Easy enough:

(Mac BSD)$ find /var/log -E -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -E: unknown primary or operator

Huh? The manual says the “-E” parameter is needed, yet we get the same error message we got earlier about the parameter being an unknown option. I’ll spare you a bit of frustration and tell you that it is VERY picky about where this flag is put – it must be BEFORE the path, like so:

(Mac BSD) $> find -E /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
/var/log/alf.log
/var/log/appfirewall.log
/var/log/asl/StoreData
/var/log/CDIS.custom
/var/log/corecaptured.log
/var/log/daily.out
/var/log/DiagnosticMessages/StoreData
/var/log/displaypolicyd.log
/var/log/displaypolicyd.stdout.log
/var/log/emond/StoreData
/var/log/install.log
/var/log/monthly.out
/var/log/opendirectoryd.log
/var/log/powermanagement/StoreData
/var/log/ppp.log
/var/log/SleepWakeStacks.bin
/var/log/system.log
/var/log/Tunnelblick/tunnelblickd.log
/var/log/vnetlib
/var/log/weekly.out
/var/log/wifi.log

Success. And, that’s that. Nothing earth shattering here, but different and unnecessarily difficult enough to be aware of in your switching amongst systems.

So, now what?

Are you now feeling a bit like you know too much about these little idiosyncrasies? Well, there’s no going back now. If for no other reason, maybe you can use them to sound super smart or win bets or something.

These are just a few examples relevant to the commands and utilities often used in performing DFIR. There are still plenty of other utilities that differ as well that can make life a pain. So, now that we know this, what can we do about it? Are we doomed to live in constant translation of GNU <—> BSD and live without certain GNU utility capabilities on our Macs? Fret not, there is a light at the end of the tunnel…

If you would like to not have to deal with many of these cross-platform issues on your Mac, you may be happy to know that the GNU core utilities can be rather easily installed on OS X. There are a few options to do this, but I will go with my personal favorite method (for a variety of reasons) called Homebrew.

Homebrew (or brew) has been termed “The missing package manager for OS X”, and rightfully so. It allows simple command-line installation of a huge set of incredibly useful utilities (using Formulas) that aren’t installed by default and/or easily installed via other means. And, the GNU core utilities are no exception.

As a resource, Hong’s Technology Blog provides a great walk-through of installation and considerations.

You may already be thinking, “Great! But wait… how will the system know which utility I want to run if both the BSD and GNU version are installed?” Great question! By default, homebrew installs the binaries to /usr/local/bin. So, you have a couple options, depending on which utility in particular you are using. Some GNU utilities (such as sed) are prepended with a “g” and can be run without conflict (e.g., “gsed” will launch GNU sed). Others may not have the “g” prepended. In those cases, you will need to make sure that /usr/local/bin is in your path (or has been added to it) AND that it precedes those of the standard BSD utilities’ locations of /usr/bin, /bin, etc. So, your path should look something like this:

$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

With that done, it will by default now launch the GNU version installed in /usr/local/bin instead of the standard system one located in /usr/bin. And, to use the native system utilities when there is a GNU version installed with the same name, you will just need to provide their full path (i.e., “/usr/bin/<utility>”).

Please feel free to sound off in the comments with any clever/ingenious solutions not covered here or stories of epic failure in switching between Linux and Mac systems 😃

/JP

Powered by WordPress & Theme by Anders Norén