Musings and confusings. All things DFIR.

Month: January 2017

Mac Dumpster Diving – Identifying Deleted File References in the Trash (.DS_Store) Files – Part 1

If you have ever plugged a USB drive into a Mac, done some things, then plugged it into a Windows system, you have no doubt seen (if you have viewing of hidden files enabled) various “.DS_Store” files (among others) strewn throughout the folders on the drive. Though essentially useless to a Windows system, they do in fact serve a particular purpose on an HFS+ file system.

While I won’t re-invent the wheel on describing “What is a .DS_Store File?” (here as well), I would like to highlight its possible use for DFIR in containing/referencing artifacts that may be useful to investigations – traces of deleted files, with filenames and sometimes paths!

In a nutshell, the .DS_Store file stores metadata used by Finder for folder-specific display options such as window placement, layout, custom icons, background, etc. They are created in the parent folder of any folder that is viewed using the “Icons”, “List”, or “Gallery” views within Finder. Note that no .DS_Store file is created when viewing a folder in the “Columns” view. For example, if you opened your ~/Music/iTunes/ folder in Finder in “Gallery” view, a .DS_Store file would be created at ~/Music/.DS_Store.

Thus, these .DS_Store files are (theoretically) created in every folder that Finder accesses, including remote network shares and external devices. Are those annoying .DS_Store files you see in Windows on your FAT32-formatted thumb drive making more sense now?

A part of this metadata is the filename, which got me to thinking… I wonder whether or not any traces get left behind when a file is moved or deleted.

For this post/research, I focused solely on the deletion aspect of when a user deletes a file through Finder.

In testing on my systems (OS X 10.10.5 and macOS Sierra 10.12.2), when a file gets “deleted” through Finder (not via “rm” on the command line, that’s a very different story), it first gets moved to the user’s ~/.Trash/ folder. If at least one file already exists within the user’s Trash, an entry for the yet-to-be-deleted file is added to the existing ~/.Trash/.DS_Store file denoting the full path on disk where the file resided before being moved to the Trash. This entry is part of how the “Put Back” feature works. If no files currently exist in the Trash (due to the user previously emptying the trash), I assumed (more on this in a bit) a new .DS_Store file would be created (“new” meaning a clear/empty file) to again begin storing entries for “Put Back”. Upon emptying the trash (via either the “Empty Trash” or “Secure Empty Trash” option in Finder for pre-Sierra systems), the files are deleted (according to the deletion method associated with each action) from the ~/.Trash/ folder and the ~/.Trash/.DS_Store file is also “deleted” (stay tuned for why I put this in quotes). Here is a great little writeup on the HFS+ volume structure and what happens “When Mac deletes it!”.

At this point, since all of the Trash source files are deleted upon emptying the Trash, we would assume that the .DS_Store file and all of its entries would be deleted as well. But, is this the case?

Answer: Not Quite!

In my testing, while the source data files within the ~/.Trash/ folder appear to be reliably deleted (short of carving the disk), various file and path entries within the ~/.Trash/.DS_Store file do not appear to be deleted! In fact, when you move another file to the trash, the ~/.Trash/.DS_Store file is re-created and historical entries* are re-populated into the file! Even if you “Put Back” the file(s), the associated .DS_Store file and entries remain. WIN!


*Note: These appeared to only be files I’ve deleted since the last reboot of my machine. Rebooting the machine seems to finally remove all historical entries. Various hypotheses of why/how this happens and where these entries come from will be tested later in this post.

We now have the opportunity to identify references to historical file deletions (sometimes with full path)! This doesn’t just apply to the Trash’s .DS_Store files, either. This applies to any given directory’s .DS_Store file that may contain (or have contained) references to files that existed within it.

Pretty AWESOME, right? How many of you are already putting together the “find” command to identify all the .DS_Store files on your systems?

*Hint: # find / -name .DS_Store

But, we kinda started this whole story at the end, well after I had finished muddling my way through researching and experimenting to find out how to actually parse these .DS_Store files. So, let’s rewind a bit

Upon first look at a .DS_Store file, they aren’t exactly straight forward, and they can’t apparently be opened with any native system tool or application. There is no native “ds_store_viewer” utility that simply parses the file information from the command line. So, how would we be even go about trying to figure out how to parse this thing?

Well, it turns out the .DS_Store format is documented here. Given its format is published, it’s likely a parser already exists for it. But, sometimes I just like to see what I can find myself before I go an easy(er) route. So, how should we start exploring what’s inside these files?

Your initial thought may be “strings!” That’s a solid idea to start, let’s see what that yields…

[jp@jp-mba (:) ~]$ strings -a ~/.Trash/.DS_Store
Bud1
pptbNustr
gptbLustr
xptbLustr
xptbNustr
gptbNustr
...
DSDB
gptbNustr
gptbLustr
gptbNustr
gptbLustr
gptbNustr
fptbLustr

Well, that was less than useful. Oh, wait… maybe they’re Unicode strings instead of ASCII. Let’s see what the option is for Unix strings to search for Unicode strings instead of ASCII:

[jp@jp-mba (:) ~]$ man strings

At this point you may already know what I’m about to say – the BSD strings utility does NOT have the capability to search for Unicode strings. See my post “Know Your Tools: Linux (GNU) vs. Mac (BSD) Command Line Utilities” for more about all of that and why.

Fail.

So, you can go a few different ways here:

  1. Stick with native utilities
  2. Install/use a third-party utility that can identify Unicode strings (particularly big-endian Unicode)
  3. Install/use a third-party utility that can directly read .DS_Store format files

Native Utilities

So, what else might exist that we can use to view strings?

When in doubt, Hex it out!

I typically use of two native hex viewers – hexdump and xxd. They are both useful in different ways, but we’ll start with hexdump.

Using hexdump, you can dump hex+ASCII by doing the following:

$ hexdump -C

[jp@jp-mba (:) ~]$ hexdump -C ~/.Trash/.DS_Store
00000000 00 00 00 01 42 75 64 31 00 00 38 00 00 00 08 00 |....Bud1..8.....|
00000010 00 00 38 00 00 00 10 0c 00 00 02 09 00 00 20 0c |..8........... .|
00000020 00 00 30 0b 00 00 00 00 00 00 00 00 00 00 08 00 |..0.............|
00000030 00 00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 03 00 00 00 01 00 00 00 4e |...............N|
00000050 00 00 00 04 00 00 10 00 00 65 00 61 00 73 00 65 |.........e.a.s.e|
00000060 00 5f 00 44 00 00 00 00 00 00 00 00 00 00 00 00 |._.D............|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 00 00 00 00 00 00 00 02 00 00 00 02 00 00 00 04 |................|
00000210 00 00 00 30 00 50 00 6c 00 65 00 61 00 73 00 65 |...0.P.l.e.a.s.e|
00000220 00 5f 00 44 00 6f 00 63 00 75 00 53 00 69 00 67 |._.D.o.c.u.S.i.g|
00000230 00 6e 00 5f 00 74 00 68 00 69 00 73 00 5f 00 64 |.n._.t.h.i.s._.d|

Here we see the notable “Bud1” header followed by readable text. Score! But, how do we extract JUST the readable text in some effective way? You can mess around with hexdump to try to make sense of the output formats, or you could do like I did and get so overwhelmed at one point that you just use xxd to create this incredibly unpretty, certainly less than efficient, convoluted, but “working” one-liner:

$ xxd -p <path/to/.DS_Store> | sed 's/00//g' | tr -d '\n' | sed 's/\([0-9A-F]\{2\}\)/0x\1 /g' | xxd -r -p | strings | sed 's/ptb[LN]ustr//g'

Voilà. Strings output from Unicode strings only using the built-in utilities. It is very ugly and it is certainly separating at points/lines where it should not, but hey… you get what you get. At least you can more legibly make out filenames and paths that could get you somewhere.

This is an ugly hack. I do not recommend it, but sometimes ugly is better than nothing. YMMV.

Note: I would be very interested if someone who is WAY more versed in hexdump output formatting would create a much simpler way of doing the above solely using the hexdump utility.

Third-Party Utilities

GNU Strings

Believe it or not, you can actually install various GNU utilities on your Mac via a handy little thing called Homebrew. Just takes a command line one-liner to install and opens your Mac to world a new and useful utilities called “formulas”. Note that Xcode is a pre-req for installing Homebrew.

For our purposes, we want to install strings, which is a part of the GNU coreutils package. With homebrew installed, all it takes is a “brew install coreutils” and we’re up and running. Do note that various GNU utilities will be prepended with “g” due to naming conflicts. For example, the GNU strings utility must be called/run as “gstrings” (yeah, I laugh a little each time I see that).

Once installed, we now have full GNU strings capabilities, namely for searching big-endian Unicode text, a la the following:

$ gstrings -a -eb

You don’t necessarily need the “-a” option that tells strings “I don’t care whether or not you think it’s a searchable file, do it anyway”, but I add it out of habit of searching files that the system likes to gripe about.

Using FDB

https://digi.ninja/projects/fdb.php

  1. Enter CPAN shell
    1. $ perl -MCPAN -e shell
  2. Install DSStore
    1. $ cpan[1] > install Mac::Finder:DSStore
  3. Install Switch
    1. $ cpan[1] > install Switch
  4. Run FDB
    1. $ ./fdb.pl --type ds --filename /Users//.Trash/.DS_Store --base_url /Users//

Using ds_store Go Parser

https://github.com/gehaxelt/ds_store

  1. Download and Install Go
    1. Download OS X Package from here: https://golang.org/dl/
  2. Set Go Path in shell
    1. One-time (I set mine as the following but it’s up to you)
      1. $ export GOPATH=~/Projects/Go
    2. Permanent
      1. Place above line in /etc/bashrc
      2. Reload shell “source /etc/bashrc” or close and relaunch terminal
  3. Download ds_store go files
    1. $ go get github.com/gehaxelt/ds_store
  4. Change to the directory of the go project
    1. $ cd $GOPATH/src/github.com/gehaxelt/ds_store
  5. Make a directory for the new project/files (I opted to name mine “dsdump”, but feel free to alter yours) and cd to it
    1. $ mkdir -p bin/dsdump && cd "$_"
  6. (If not already done) Create a .go file (I named mine dsdump.go) and copy/paste the Example Code from https://github.com/gehaxelt/ds_store
    1. $ nano dsdump.go
    2. Copy/paste the Example Code into this file and save it
  7. Build the Go binary
    1. $ go build
  8. Run dsump
    1. $ ./dsump -i <path/to/.DS_Store>

**Note: One of the awesome things about Go is its ability to build static binaries (no additional files needed) for a variety of operating systems. For example, if you wanted to build a binary for a Windows x64 system, you would simply run “GOOS=windows GOARCH=amd64 go build -o dsdump.exe”. Then, just copy that to whatever Windows x64 system and run it. Pretty sweet, huh?

(Shout out to Slavik at Demisto for quickly getting me up and running with Go before I spent any time looking at documentation.)

— Update 7/31/19 —

Using DSStoreParser

Nicole Ibrahim recently presented at the SANS DFIR Summit on .DS_Store files and pointed us all to a parser she built.

https://github.com/nicoleibrahim/DSStoreParser

Using it is as simple as downloading it and running it (with Python2.7).

  1. Download the source
    1. $ git clone https://github.com/nicoleibrahim/DSStoreParser.git
  2. Change into the directory
    1. $ cd DSStoreParser
  3. Install the requirements (unicodecsv), if needed
    1. $ pip2.7 install unicodecsv --user
  4. Run it by pointing it to the source folder containing the .DS_Store file(s) you’d like to parse, and provide the output folder for the results
    1. $ python2.7 DSStoreParser.py -s /path/to/source/ -o output_dir/

Comparing the .DS_Store Parsing Solutions

As you can see, there are a variety of useful tools, both native and third-party, that can assist in analyzing .DS_Store files. A hex viewer is an invaluable tool for so many reasons, namely for assisting in identifying unknown structures, artifacts, or items within a given file. Gstrings offers an easy way to search for the appropriate strings with an easily installable pseudo-native utility. Fdb allows the option to specify the “base_url” to prepend its results with the appropriate path, based on the given .DS_Store file’s location. The ds_store Go parser does the job as well and it can be compiled to be portable to any major OS, which can be very handy in a Mac Forensics go-kit of sorts. And, Nicole’s DSStoreParser is a nice, clean Python-based solution that provides a variety of output reports to better assist in seeing/understanding the information contained within the files.

Wrapping It All Up

Regardless of why/how this ~/Trash/.DS_Store file re-creation occurs (which we’ll address in Part 2 of this post) and what option(s) you choose to parse/extract these items, you may now at least have an additional DFIR investigation method and artifact(s) to identify previously deleted files that are no longer resident on (allocated) disk.

Though we focused solely on .DS_Store files in this post, do note that it is not just .DS_Store files that can assist in identifying deleted files on a system. There are several other files/areas that should be searched for such investigations; however, I wanted to hone in on analysis of these files as it is possibly lesser known (at least in my research and experience).

At any rate, I hope this can be somehow useful in your investigations moving forward! As usual, YMMV, so I’m interested to hear feedback and stories of if/how this works in the field for everyone.

/JP

A Response to “The Cloud is Evil…”

In this post, I would like to respond to the SANS post titled, “The Cloud is Evil…“, as authored by John McCash. In reading the post, I felt rather compelled to dispel the myths/misstatements of the article that the cloud is evil for the 5 stated reasons. While I can certainly understand that line of thinking, definitely cannot argue one’s personal experiences that may have lead to such conclusions (I personally have had very lackluster experiences with a lot of lower tier cloud providers), and I am fairly certain he does not think the cloud is really “evil” per se (he actually states/clarifies that later in the post), I still feel a duty to respond with my experiences and facts as compiled from my interest and dedicated development of IR within AWS (and to some extent Azure).

1/18/16 Update: The link actually appears to be down right now. I wonder if it got taken down or if the site as a whole is just experiencing difficulty. Regardless, the page is Google cached here:

http://webcache.googleusercontent.com/search?q=cache:8YKTxP_iEsQJ:digital-forensics.sans.org/blog/2017/01/17/the-cloud-is-evil+&cd=1&hl=en&ct=clnk&gl=us

*Important Note*: Please PLEASE know that I mean zero personal attack, harm, or ill intent toward the author of the blog post of which this one references! If for some reason it does somehow come off that way, I ask that you please know it’s not the case and instead attribute it solely to my passion toward IR in the cloud and its usefulness and power to responders. I actually hope to hear from the author to discuss this and learn more about his position and how he came to the stated conclusions, as the priority for the DFIR community is education and healthy discussion, not pointing fingers and/or putting each other down. I much prefer someone author a contentious post and just take the time to feed back into the community for its discussion then authoring nothing at all. I am sure I will do the same (if not already done so). So, it is important that, regardless of outcome, we continue to foster participation and discussions within the DFIR community. We’re in this together.

Without further ado…

The referenced post is sub-titled, “A Forensicator’s Perspective”, but I have to say my perspective pretty much disagrees with most of the content here. Maybe that just means I’m not a “forensicator”? Now, I would like to give the benefit of the doubt and approach this from the perspective that maybe the author has no exposure or experience with AWS or Azure which is why such claims are being made without knowledge of their fallacy. However, regardless of the perspective, I’d like to address each of the claims in this article from my perspective and extensive DFIR experience with AWS (and to some extent Azure).

To begin (before the numbered statements/claims), I found the author’s below statement rather incongruent with my experience, in reference to Troy Larson’s SANS DFIR Summit presentation on “Defending A Cloud“:

“The Cloud (Specifically MS Azure), and describes numerous features that can be greatly advantageous to security professionals. Unfortunately, as I discovered upon submitting a question at the end of this talk, most of these features are currently only available to the Provider of the Cloud Service, and not to the Customer?. “

I’m not sure what was asked or discussed, but I know for certain this is not the case for AWS. A ton of features useful to IR are available to nearly anyone/everyone. It would be worthwhile knowing a bit more context for this and perhaps whether or not Azure mirrors the facilitations AWS provides. Though, from my experience, they both have very similar features and capabilities, just different ways of going about them. So, I would be very surprised to learn if Azure provides minimal features useful to security professionals.

At any rate, on to the numbered statements/claims.

“1. You can’t Forensicate the Virtual Hosts.”

  • “Even in cases of Infrastructure as a Service, if the customer desires these capabilities, he will typically have to build the associated forensic imaging support directly into his host configurations, rather than leveraging the intrinsic rapid access & direct snapshotting capabilities available to the Cloud provider.”

You absolutely can “forensicate” virtual hosts within AWS, and very easily. In fact, I contend that it is actually easier to acquire and analyze a system within AWS than it is in most any on-premises environment. Specific to AWS, all that is involved in acquiring a system is simply creating a Snapshot of it, which is a bit for bit image of the host. From there, you create a new volume from that snapshot and then attach the new volume (read-only) to your forensic host. It then appears as any other physical drive would to a forensic host, upon which you can go on your merry way with analysis right within the cloud. All of this typically takes only minutes to perform with typically no immediate need to download or transfer data outside of the cloud. No waiting hours for dd to complete and the image to transfer to another system (maybe even a day or so at this point, depending on your network speed). It’s literally as simple as point, click (or execute a couple commands), and begin analysis.

“2. You typically have no access to logs of administrative access to the Virtual Environment.”

  • “The customer may, or may not have access to logs of the activity performed using administrative accounts provided to him, and those logs may or may not contain sufficient detail to determine what actions were performed, and from what originating IP addresses.”

AWS, by default, logs API access to/from most (if not all) of its resources, namely to the console and EC2 Instances (systems/machines/hosts you provision and use) which are in the scope of this discussion. You have access to all of those logs (assuming you have the privileges within your IAM role/policy to do so), not to mention the capability to log further specifics about the machine’s performance, actions performed actually within/on the host itself, and a multitude of other items otherwise not easily achievable (or almost unachievable) in many other environments. These systems are in no way black boxes of known activity. I actually content that more logging is enabled by default within AWS than in most on-premises solutions I’ve seen.

“3. You have no way to monitor traffic to and from the Virtual Environment.”

  • “Similarly to #1 above, in the case of IaaS, it’s possible to set up netflow or full packet-capture architecture, internal to your virtual hosts, but such configurations are a pain to maintain, scale poorly, and can use significant resources.”

It’s actually very easy to set up netflow captures within AWS – they’re called VPC Flow Logs. In fact, it’s actually scarily simple to the point that I’d recommend alerting on VPC Flow Log creations as an attacker could also harness this capability to snoop on what’s going on within your network (again, with great power comes great responsibility and accountability!). Here is a nice little writeup by Jeff Barr from a while back on just how easy it is to begin monitoring all traffic over an interface (or multiple) with a few simple clicks. It scales, it stores, and uses very minimal resources. I tend to set these up during response to monitor for and identify specific known IOC’s/network indicators, a process of which is often much more arduous or impossible in many on-premises networks and architectures. They’re also great for network/endpoint baselining!

“4. Because of items two and three, an attacker (who might break into your internal network, but eventually be discovered after operating for a typical period of only 18 months or so) need only perform a single, brief, undetected intrusion. He just has to obtain Cloud access credentials or implant a backdoor in your Cloud software, and then get back out, deleting most of his tracks in the process. Subsequently there is only a miniscule chance of the existence or usage of this access ever being noticed, and he effectively has access to all Cloud-stored data in perpetuity.”

Sure, an attacker can compromise your network with elevated privilege credentials, but this is really no different than any other deployment or network. The reason DFIR sustains as a huge business is that attackers do this all the time, and it’s nothing at all unique to the cloud. As to deleting their tracks, I contend that (with the right monitoring and logging) your AWS environment can not only be instrumented to make this incredibly difficult and noisy for an attacker to perform, but also impossible. See Daniel Grzelak’s article on “Disrupting AWS Logging” for some tips on how to use the same tactics to protect against attacker log modification and deletion.

Daniel Grzelak has done a lot of awesome research and blogging on the attacker/PenTester side of the coin for AWS. I highly recommend following his stuff!

“5. The provider’s boilerplate contract agreement will typically absolve them from any possible responsibility for customer damages or losses due to data compromise.”

This is one point on which I totally agree, presuming we are talking about data compromise that occurred due to customer or user fault. However, I look at it in a positive light. It seems reasonable to me to have no preconceived notions that anyone owes me and/or is responsible for irresponsible, uneducated, or otherwise poor behavior in using what they’ve provided.

If the above statement is in reference to data compromise due to AWS operation that is outside of customer/user control, I’m not sure I’ve ever seen or heard that contract clause. So, I would be keen on getting that reference to read myself.

My last point of address is the following quote:

“Cloud providers are not necessarily evil, in and of themselves, but they enable appallingly self-destructive behaviors on the part of their customers. While there are many useful IR capabilities they could make available, there’s an unfortunate lack of demand. The businessperson customers continue to drone the mantra “Cheaper! Cheaper! Cheaper!”, and so the providers acquiesce to these demands, and streamline their offerings, reducing (or failing to develop in the first place) ‘frills’ such as security & IR capabilities.”

To the first sentence, this logic appears to obviate accountability and is essentially (in my personal view) taking the victim stance of “They let me do this.” To me, allowing a customer freedom to do what they want is not a detriment and cause for finger pointing when a poor decision is made; rather, it is the basis of what has made and continues to make AWS and Azure so successful and attractive to consumers. It is this very thing when done properly promotes both self-empowerment and drives accountability. Regardless, I don’t feel that you can make previous points of claiming fault for hand-tying and then go on to claim fault for giving too much freedom to do bad things.

To the second and third sentences of this claim, I’ve worked very closely with the AWS security team in developing our AWS IR service line. AWS provides a plethora (and, if you’ll please excuse me, I dare to say a metric sh*t ton) of incredibly useful tools for security and incident response (the baseline of which are better than most on-premises deployments I’ve encountered). However, they make one thing very clear – AWS does not perform security. Rather, AWS gives you the tools and resources you need to either perform it yourself or make an educated decision not to. If you choose “appallingly self-destructive” behavior, it is just that – a choice you have made. Further, it is a choice you have made despite the tools and resources available to you, not because of them. This is why and where accountability is so important.

Ultimately, cloud providers like AWS and Azure do so very much to enable and facilitate all levels of operation and technical (or non) capability with great freedom. They give YOU the tools to be successful (or shoot your eye out).

I highly encourage everyone to take some time to learn and understanding the offerings available from whichever top-tier cloud provider suits you best. And, if beneficial to your organization, use it and take the utmost advantage. But, whatever you do or do not do, be accountable for your own actions.

“So: Is The Cloud Evil?”

I contend that it is not, not even in the least, with the right provider.

Know Your Tools: Linux (GNU) vs. Mac (BSD) Command Line Utilities

Welcome to first post in the “Know Your Tools” series!

Without further ado…

Have you ever wondered if/how *nix command line utilities may differ across distributions? Perhaps it never even occurred to you that there was even a possibility the tools were any different. I mean, they’re basic command line tools. How and why could/would they possibly differ?

Well, I’m here to say… thy basic command line utilities art not the same across different distributions. And, the differences can range from those that can cause a simple nuisance to those that can cause oversight of critical data.

Rather than going into aspects of this discussion that have already been covered such as how Linux and BSD generally differ, I would instead like to focus on a few core utilities commonly used in/for DFIR artifact analysis and some caveats that may cause you some headache or even prevent you from getting the full set of results you’d expect. In highlighting the problems, I will also help you identify some workarounds I’ve learned and developed over the years in addressing these issues, along with an overarching solution at the end to install GNU core utilities on your Mac (should you want to go that route).

Let’s get to it.

Grep

Grep is one of the most useful command-line utilities for searching within files/content, particularly for the ability to use regular expressions for searching/matching. To some, this may be the first time you’ve even heard that term or “regex” (shortened version of it). Some of you may have been using it for a while. And, nearly everyone at some point feels like…

Amirite?

Regardless of whether this is your first time hearing about regular expressions or if you use them regularly albeit with some level of discomfort, I HIGHLY suggest you take the time to learn and/or get better at using them – they will be your most powerful and best friend for grep. Though there is a definite regex learning curve (it’s really not that bad), knowing how to use regular expressions translates directly to performing effective and efficient searches for/of artifacts during an investigation.

Nonetheless, even if you feel like a near master of regular expressions, equally critical to an expression’s success is how it is implemented within a given tool. Specifically for grep, you may or may not be aware that it uses two different methods of matching that can highly impact the usefulness (and more important, validity) of results returned – Greedy vs. Lazy Matching. Let’s explore what each of these means/does.

At a very high level, greedy matching attempts to find the last (or longest) possible match, and lazy matching attempts to find the first possible match (and stops there). More specifically, greedy matching employs what is called backtracking and look-behind’s but that is a separate discussion. Suffice to say, using an incorrect, unintended, and/or unexpected matching method can completely overlook critical data or at the very least provide an inefficient or invalid set of results.

Now having established some foundational knowledge about how grep searches can work, we will drop the knowledge bomb – the exact same grep expression on Linux (using GNU grep) may produce completely different or no results on Mac (using BSD grep), especially when using these different types of matching.

…What? Why?

The first time I found this out I spent an inordinate and unnecessary amount of time banging my head against a wall typing and re-typing the same expression across systems but seeing different results. I didn’t know what I didn’t know. And, well, now I hope to let you know what I didn’t know but painfully learned.

While there is an explanation of why, it doesn’t necessarily matter for this discussion. Rather, I will get straight to the point of what you need to know and consider when using this utility across systems to perform effective searches. While GREEDY searches execute pretty much the same across systems, the main difference comes when you are attempting to perform a LAZY search with grep.

We’ll start with GREEDY searches as there is essentially little to no difference between the systems. Let’s perform a greedy search (find the last/longest possible match) for any string/line ending in “is” using grep’s Extended Regular Expressions option (“-E”).

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+is'
thisis

Both systems yield the same output using a completely transferrable command. Easy peasy.

Note: When specifying Extended Regular Expressions, you can (and I often do) just use “egrep” which implies the “-E” option.

Now, let’s look at LAZY searches. First, how do we even specify a lazy search? Well, to put it simply, you append a “?” to your matching sequence. Using the same search as before, we’ll instead use lazy matching (find the first/shortest match) for the string “is” on both the Linux (GNU) and Mac (BSD) versions of grep and see what both yield.

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+?is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+?is'
this

Here the fun begins. We did the exact same command on both systems and it returned different results.

Well, for LAZY searches, Linux (GNU) grep does NOT recognize lazy searches unless you specify the “-P” option (short for PCRE, which stands for Perl Compatible Regular Expressions). So, we’ll supply that this time:

(Linux GNU)$ echo “thisis” | grep -Po ‘.+?is'
this

There we go. That’s what we expected and hoped for.

*Note: You cannot use the implied Extended expression syntax of “egrep” here as you will get a “conflicting matchers specified” error. Extended regex and PCRE are mutually exclusive in GNU grep.

Note that Mac (BSD), on the other hand, WILL do a lazy search by default with Extended grep. No changes necessary there.

While not knowing this likely won’t lead to catastrophic misses of data, it can (and in my experience will very likely) lead to massive amounts of false positives due to greedy matches that you have to unnecessarily sift through. Ever performed a grep search and got a ton of very imprecise and unnecessarily large (though technically correct) results? This implementation difference and issue could certainly have been the cause. If only you knew then what you know now…

So, now that we know how these searches differ across systems (and what we need to modify to make them do what we want), let’s see a few examples where using lazy matching can significantly help us (note: I am using my Mac for these searches, thus the successful use of Extended expressions using “egrep” to allow for both greedy and lazy matching)…

User-Agent String Matching
Let’s say I want to identify and extract the OS version from Mozilla user-agent strings from a set of logs, the format of which I know starts with “Mozilla/“ and then contains the OS version in parenthesis. The following shows some examples:

  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
  • Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)

Lazy Matching
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+?\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)

Searching for Malicious Eval Statements
Let’s say I want to identify and extract all of the base64 eval statements from a possibly infected web page for analysis, so that I can then pipe it into sed to extract only the base64 element and decode it for plaintext analysis.

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+\)'
eval(base64_decode("DQplcnJvcl9yZ=")); var ua = navigator.userAgent.toLowerCase()

Lazy Matching (matches exactly what we want)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+?\)'
eval(base64_decode("DQplcnJvcl9yZ=“))

There you have it. Hopefully you are now a bit more informed not only about the differences between Lazy and Greedy matching, but also about the difference in requirements across systems.

Strings

Strings is an important utility for use in extracting “human-readable” strings from files/binaries. It is particularly useful in extracting strings from (suspected) malicious binaries/files to attempt to acquire some insight into what may be contained within the file, its capabilities, hard-coded domains/URL’s, commands, … the list goes on.

However, not all strings are created equal. Sometimes, Unicode strings exist within a file/program/binary for various reasons, those of which are also important to identify and extract. By default, the GNU (Linux) strings utility searches for simple ASCII encoding, but also allows you to specify additional encodings for which to search, to include Unicode. Very useful.

By default, the Mac (BSD) strings utility also searches for simple ASCII encoding; however, I regret to inform you that the Mac (BSD) version of strings does NOT have the native capability to search for Unicode strings. Do not ask why. I highly encourage you to avoid the rabbit hole of lacking logic that I endured when I first found this out. Instead, we should move on and instead just be asking ourselves, “What does this mean to me?” Well, if you’ve only been using a Mac to perform string searches using the native BSD utility, you have been MISSING ALL UNICODE STRINGS. Of all the pandas, this is a very sad one.

So, what are our options?

There are several options, but I personally use one of the following (depending no the situation and my mood) when I need to extract both Unicode and ASCII strings from a file using a Mac (BSD) system:
1. Willi Ballenthin’s Python strings tool to extract both ASCII and Unicode strings from a file
2. FireEye’s FLOSS tool (though intended for binary analysis, it can also work against other types of files)
3. GNU strings*

*Wait a minute. I just went through saying how GNU strings isn’t available as a native utility on a Mac. So, how can I possibly use GNU strings on it? Well, my friends, at the end of this post I will revisit exactly how this can be achieved using a nearly irreplaceable third-party package manager.

Now, go back and re-run the above tools against various files and binaries from your previous investigations you performed from the Mac command line. You may be delighted at what new Unicode strings are now found 🙂

Sed

Sed (short for “Stream editor”) is another useful utility to perform all sorts of useful text transformations. Though there are many uses for it, I tend to use it mostly for substitutions, deletion, and permutation (switching the order of certain things), which can be incredibly useful for log files with a bunch of text.

For example, let’s say I have a messy IIS log file that somehow lost all of its newline separators and I want to extract just the HTTP status code, method, and URI from each line and output into its own separate line (restoring readability):

…2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test…

Looking at the pattern, we’d like to insert a newline before each instance of the date, beginning with “2016-…”. Lucky for us, we’re on a Linux box with GNU sed and it can easily handle this:

(Linux GNU)$ sed 's/ \(.+\?\)2016/\1\n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa
2016-08-0112:31:17HTTP200GET/owa/profile
2016-08-0112:31:18HTTP404POST/owa/test
...

You can see that it not only handles lazy matching, but also handles ANSI-C escape sequences (e.g., \n, \r, \t, …). This statement also utilizes sed variables, the understanding of which I will leave to the reader to explore.

Sweet. Let’s try that on a Mac…

(Mac BSD)$ sed 's/\(.+\?\)\(.+\)/\1\n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test

… Ugh. No luck.

Believe it or not, there are actually two common problems here. The first is the lack of interpretation of ANSI-C escape sequences. BSD sed simply doesn’t recognize any (except for \n, but not within the replacement portion of the statement), which means we have to find a different way of getting a properly interpreted newline into the statement.

Below are a few options that will work around this issue (and there are more clever ways to do it as well).

1. Use the literal (i.e., for a newline, literally insert a new line in the expression)
(Mac BSD)$ sed ’s//\*Press Enter*
> /g'

2. Use bash ANSI-C Quoting (I find this the easiest and least effort, but YMMV)
(Mac BSD)$ sed 's//\'$'\n/g’
3. Use Perl
(Mac BSD)$ perl -pe ‘s||\n|g'

Unfortunately, this only solves the first of two problems, the second being that BSD sed still does not allow for lazy matching (from my testing, though I am possibly just missing something). So, even if you use #1 or #2 above, it will only match the last found pattern and not all the patterns we need it to.

“So, should I bother with using BSD sed or not?”

Well, I leave that up to your judgment. Sometimes yes, sometimes no. In cases like this where you need to use both lazy matching and ANSI-C escape sequences, it may just be easier to skip the drama and use Perl (or perhaps you know of another extremely clever solution to this issue). Options are always good.

Note: There are also other issues with BSD sed like line numbers and using the “-i” parameter. Should you be interested beyond the scope of this post, this StackExchange thread actually has some useful information on the differences between GNU and BSD sed. Though, I’ve found that YMMV on posts like this where the theory and “facts” may not necessarily match up to what you find in testing. So, when in doubt, always test for yourself.

Find

Of all commands, you might wonder how something so basic as find could differ across *nix operating systems. I mean, what could possibly differ? It’s just find, the path, the type, the name… how or why could that even be complicated? Well, for the most part they are the same, except in one rather important use case – using find with regular expressions (regex).

Let’s take for example a regex to find all current (non-archived/rotated) log files.

On a GNU Linux system this is somewhat straight forward:

(Linux GNU)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"

You can see here that rather than using the standard “-name” parameter, we instead used the “-regextype” flag to enable extended expressions (remember egrep from earlier?) and then used the “-regex” flag to denote our expression to utilize. And, that’s it. Bless you, GNU!

Obviously, Mac BSD is not this straight forward, otherwise I wouldn’t be writing about it. It’s not exactly SUPER complicated, but it’s different enough to cause substantial frustration as your Google searches will show that the internet is very confused about how to do this properly. I know. Shocking. Nonetheless, there is value in traveling down the path of frustration here so that you don’t have to when it really matters. So, let’s just transfer the command verbatim over to a Mac and see what happens.

(Mac BSD)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -regextype: unknown primary or operator

Great, because why would BSD find use the same operators, right? That would be too easy. By doing a “man find” (on the terminal, not in Google, as that will produce very different results from what we are looking for here) you will see that BSD find does not use that operator. Though, it still does use the “-regex” operator. Easy enough, we’ll just remove that bad boy:

(Mac BSD)$ find /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*
(Mac BSD)$

No results. Ok. Let’s look at the manual again… ah ha, to enable extended regular expressions (brackets, parenthesis, etc.), we need to use the “-E” option. Easy enough:

(Mac BSD)$ find /var/log -E -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -E: unknown primary or operator

Huh? The manual says the “-E” parameter is needed, yet we get the same error message we got earlier about the parameter being an unknown option. I’ll spare you a bit of frustration and tell you that it is VERY picky about where this flag is put – it must be BEFORE the path, like so:

(Mac BSD) $> find -E /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
/var/log/alf.log
/var/log/appfirewall.log
/var/log/asl/StoreData
/var/log/CDIS.custom
/var/log/corecaptured.log
/var/log/daily.out
/var/log/DiagnosticMessages/StoreData
/var/log/displaypolicyd.log
/var/log/displaypolicyd.stdout.log
/var/log/emond/StoreData
/var/log/install.log
/var/log/monthly.out
/var/log/opendirectoryd.log
/var/log/powermanagement/StoreData
/var/log/ppp.log
/var/log/SleepWakeStacks.bin
/var/log/system.log
/var/log/Tunnelblick/tunnelblickd.log
/var/log/vnetlib
/var/log/weekly.out
/var/log/wifi.log

Success. And, that’s that. Nothing earth shattering here, but different and unnecessarily difficult enough to be aware of in your switching amongst systems.

So, now what?

Are you now feeling a bit like you know too much about these little idiosyncrasies? Well, there’s no going back now. If for no other reason, maybe you can use them to sound super smart or win bets or something.

These are just a few examples relevant to the commands and utilities often used in performing DFIR. There are still plenty of other utilities that differ as well that can make life a pain. So, now that we know this, what can we do about it? Are we doomed to live in constant translation of GNU <—> BSD and live without certain GNU utility capabilities on our Macs? Fret not, there is a light at the end of the tunnel…

If you would like to not have to deal with many of these cross-platform issues on your Mac, you may be happy to know that the GNU core utilities can be rather easily installed on OS X. There are a few options to do this, but I will go with my personal favorite method (for a variety of reasons) called Homebrew.

Homebrew (or brew) has been termed “The missing package manager for OS X”, and rightfully so. It allows simple command-line installation of a huge set of incredibly useful utilities (using Formulas) that aren’t installed by default and/or easily installed via other means. And, the GNU core utilities are no exception.

As a resource, Hong’s Technology Blog provides a great walk-through of installation and considerations.

You may already be thinking, “Great! But wait… how will the system know which utility I want to run if both the BSD and GNU version are installed?” Great question! By default, homebrew installs the binaries to /usr/local/bin. So, you have a couple options, depending on which utility in particular you are using. Some GNU utilities (such as sed) are prepended with a “g” and can be run without conflict (e.g., “gsed” will launch GNU sed). Others may not have the “g” prepended. In those cases, you will need to make sure that /usr/local/bin is in your path (or has been added to it) AND that it precedes those of the standard BSD utilities’ locations of /usr/bin, /bin, etc. So, your path should look something like this:

$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

With that done, it will by default now launch the GNU version installed in /usr/local/bin instead of the standard system one located in /usr/bin. And, to use the native system utilities when there is a GNU version installed with the same name, you will just need to provide their full path (i.e., “/usr/bin/<utility>”).

Please feel free to sound off in the comments with any clever/ingenious solutions not covered here or stories of epic failure in switching between Linux and Mac systems 😃

/JP

Knowing Your Tools

We all have our favorite tools – whether it’s a simple python script to parse/search a specific type of log or a mainstream commercial forensics tool to ingest a bunch of evidence and search in parallel across a multitude of artifacts. We typically use tools to help us by performing some set of actions that:

  • We are not able to perform due to lack of knowledge (e.g., parsing proprietary format objects, identifying all available information within an object, scanning for anomalies in an obscure technology/file, etc.)
  • We could perform on our own, but are better (more efficient/effectively) performed by a tool (e.g., parallelized searching, parsing of artifacts for specific information, etc.)

Regardless of the tool, one of the main benefits of using them is to save time. Time saved in performing our job as Incident Responders and Forensic Experts means less time between compromise and mitigation/recovery/return to safe operations. This is one of the core responsibilities of our line of work and why we have these jobs – to utilize our expertise and tools to reduce the impact to the business.

With this responsibility to find answers and find them fast comes both the responsibility and propensity to find/use a tool whenever possible. And, given the awesome increasingly collaborative DFIR field of ours, this is quite easily done a large majority of the time. However, with this responsibility also comes an an incredibly important onus on each and every one of us to not just be able to use a tool to get quick answers but to also ensure that the tool(s) being utilized are both well understood and performing as expected/intended. After all, time saved getting a wrong answer is not time saved at all. In fact, it often creates an additional cost of not just time but also both non-monetary and monetary costs/losses.

Do you actually know what each of your tools does?

This is a VERY important question in our line of work. Frankly, it’s an important question in most any line of work in which people rely on an expert to provide them with assistance that is accurate, effective, and efficient during times of which the wrong decision or information can yield substantial consequences. The onus is on you/us, as experts in the field, to not just provide answers spit out from a tool, but to be able to know how that information was processed and how it arrived as output from each tool you used. The effects of not understanding this can range from simply looking unintelligent when queried about what the tool does and why it does it to a catastrophic result of providing incorrect information and losing key insight into a compromise.

As an example, what if you used a tool that misrepresented an executable’s create/modify time on disk? You may have still found it via other methods, or you may have missed that insight completely. This may not necessarily be of substantial consequence to, say, a single-source ransomware case. But, what if an entire decision of possible data exfiltration for a multi-billion dollar company hinged on when, specifically, the executable was created on the system and you incorrectly reported a date/time that was MONTHS OFF because you didn’t know (or your tool misstated) that the dates/times were extracted from the NTFS $SI attribute* (easily timestomped) versus the $FN attribute? I think (hope) you get the point here. THESE ARE THINGS YOU NEED TO KNOW!

*Note: It’s not just small tools susceptible to this poor practice of displaying an NTFS $SI attribute for date/time information during forensic analysis. I know of one very large commercial forensic tool that does this by default, which is not only poor practice for such a broadly utilized and expensive tool but something of which I’ve learned many people are not aware! Gah. But, I digress…

Feeling like I’ve appropriately berated and re-stated the “why” of needing to know how your tools work, let’s move on to formulating an approach for “how” to understand them. As someone who researches, peruses, tests, utilizes, and submits bugs/improvements for a wide variety of both FOSS and commercial tools (mostly the former, I’m a true FOSS junky), my quest for understanding each tool to the fullest extent possible comes down to asking three questions in a variety of different ways:

  • Why does it do or not do <X>?
  • What does it mean when <Y>?
  • How would I know if <Z>?

The basis of these questions can be broken down into many more granular questions of which the sky is the limit. However, as a baseline of where to start, the below is a set of questions I typically ask myself when using a tool that someone else has built (and I encourage you all to add/modify as appropriate):

  • How does it acquire the specific information it is parsing?
  • How, specifically, is it parsing the data?
  • What does it mean if something is not present that I would expect?
  • What does it mean when something is present that I would not expect? 
  • Do I fully understand the intent of the tool?
  • Am I using the tool for its intended usage?
  • How would I know if it was not successfully achieving its intent/goal(s)?
  • Would I know if the tool produced an error?
  • Upon error, how do I find out whether the error may be in the source data and/or the tool’s processing of the data?

… and I could go on, but you get the point. These are all questions you should be asking yourself when you use each of your tools (yes, this includes BOTH commercial and FOSS tools). From these questions comes scenarios in which you should test the tool for expected results with all types of data you expect to encounter. To that end, I would also suggest you test it with data you do not expect to encounter in the event that it can yield better understanding of the tool.

So, we’ve covered “why” you need to understand your tools, “how” you might go about doing that, but what if you do encounter an error with the tool? Should you just give up and try other tools? I would encourage you not to give up and move on as that helps no one (not the author, not you, and not others in the community who are likely experiencing the same issue or issues you are). If there is an issue, try to find an answer! This is a community. Someone has taken the time to create a tool to save you time and effort, the least you can do is return the favor and invest your time to understand the tool and help make it better. Too often I hear about issues with tools and come to find out that the issue was never investigated, posed, or posted to gather feedback from others. And, even if the issue was posed/posted somewhere, they never attempted to contact the author(s).

ASK THE AUTHOR.

They are often the primary source of information for the tool, not to mention doing so reinforces the fact that people appreciate the time they took to create a tool and are invested in it enough to contact them about how to make it better. Just a simple question about how/why something works can be invaluable to an author. I continue to be amazed at how responsive the authors of various tools are within our community for bug reports and/or improvement requests.

As you can see, I’m very passionate about, and a huge proponent of, investing time into understanding each and every one of your tools (not to mention building your own and contributing back to the community if you are so inclined). It not only makes you better, it also makes the community better. Pointing and clicking without understanding what is going on behind the scenes is a disservice not only to yourself but to the community. Don’t be someone that can be replaced by another tool. Add value to your company/clients by going above and beyond and taking the time to understand the tools for your profession from the ground up. If you do, the investment will come back to you and to the community multiple times over.

If this topic interests you and/or you are simply interested in getting to know your Linux/Mac command line and/or FOSS tools (better), stay tuned for the future “Know Your Tools” series/posts where I will be covering various tools used in my DFIR investigations, to include basic tool usage, lesser known (and useful) capabilities, as well as tips/tricks I’ve collected over the years in refining various tools and processes.

Happy 2017!

/JP

Powered by WordPress & Theme by Anders Norén