Musings and confusings. All things DFIR.

Category: Linux

Generating File System Listings from the Command Line (with Full MACB Timestamps and Hashes)

Before you go testing/implementing the commands that are described in this article, PLEASE ensure you first understand the following major caveat of performing certain actions/commands against files on a live system:

Reading a file changes its atime eventually requiring a disk write, which has been criticized as it is inconsistent with a read only file system.”

You DO NOT WANT TO DO THIS on a target of which you are attempting to perform forensic analysis.

Further reading on the matter

When in doubt and/or fear of possibly affecting a target system’s access timestamps, you should ensure the following is true before running the below commands:

  • The target file system (or whatever directory you are running this against) has been (re)mounted read-only and/or with the “noatime” and/or “relatime” mount parameters.

If you’re planning to run these commands against a physical disk (image), you can mount the target disk’s filesystem read-only via the following:

$ sudo mount -o ro,... </src/disk> </mount/point>

If you’re planning to run these commands against a live system, you can remount the live root filesystem/directory using the mount command’s --bind option via the following:

$ mkdir /mnt/remount

$ sudo mount --bind / /mnt/remount

Now, while the root filesystem/directory will be re-mounted to a new mount point, it will still be mounted read-write by default. So, before you go accessing anything on it, you need to re-mount it (yes, again) read-only, like so:

$ sudo mount -o remount,ro,bind,noatime /mnt/remount

Note that you don’t necessarily need the noatime attribute given that you’re already mounting the system read-only (and, in theory, should not be modifying any of the file timestamps upon access). However, I’m a “belt and suspenders” kind of guy. So, I’d rather have redundancy, even if unneeded, for the peace of mind.


Disclaimer: I did not search the internet for a solution to this article’s challenge as I wanted to come up with one myself. Thus, a solution may already exist that is similar (or not). However, the point of the below article was not to just find a solution and move on. Rather, I wanted to walk readers through a problem statement, step-by-step piecing together a solution, thoroughly documenting and “teaching a man to fish” versus just giving out a fish. That said, I am in no way guaranteeing the below commands to work perfectly in ensuring it finds and properly processes every single file on the filesystem. In fact, when running this live, we are actively avoiding certain areas of the filesystem that are actively changing/ephemeral to minimize the error outputs. The only thing I can guarantee, in true *nix ad-hoc one-liner development, is (dis)function in ways beyond the imagination. ‘Tis a fact we just live with. This post simply describes *options* you can add to your toolkit that can always very much benefit from testing, troubleshooting, and improving.

In addition, while I attempted to identify and explain various aspects of each of my commands, I recognize that there are still improvements that can be done to this command. I attempted to find the balance of thorough explanation and efficiency while not bleeding over into the esoteric.


*The first Y stands for “Yuuuuuuge”

Enough with the caveats and disclaimers, let’s get down to it…


Recently, a teammate posed a request to be able to generate a file listing of a directory in Linux showing the size and hash of each file in the output format of “ls -lhS” (list files in long format, with human-readable sizes, in decreasing size output).

As I hit Reply to the email, my initial thoughts were “Why don’t you use FLS?” as that is essentially the de-facto standard for producing a file system listing from an image. However, I got to thinking… FLS doesn’t really provide a comprehensive solution here for a few reasons:

  1. We need a command to run against a LIVE system and FLS only runs against a dead system image*
  2. FLS requires a second step to convert its output to bodyfile format for human-readable timestamps
  3. FLS doesn’t perform any file hashing

*Actually, this is not true. As one of my colleagues ever-so-graciously reminded me… Although it is not well documented, FLS can run against live systems. You can run it against a live Windows system by using named pipes, a la “fls [options] \\.\<X>:“, where <X> is a logical drive letter like C:, D:, etc.  And, a September 2011 SANS blog post here describes it in operation for Windows. To run it against a live Linux or Mac/OS X system, you may do so as such “fls [options] /dev/sd<X><Y>“, where <X> is the physical drive letter like /dev/sda, /dev/sdb, etc. and <Y> is the partition number like /dev/sda1, /dev/sda2, etc.

At any rate, the last two points remain, so it’s good thing I waited to hit send before looking like a dummy.

Instead, I took the challenge in attempting to come up with a command line one-liner to provide what was requested. Initially, I came up with the following:

$ find /path/to/dir -maxdepth 1 -type f -print0 | xargs -0 -r ls -lh | awk '{cmd="md5deep -q "$9; cmd | getline md5; close(cmd); cmd="sha1sum "$9; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6" "$7" "$8","$9","md5","sha1}' | awk '{$NF=""}1' | sed 's/ ,/,/g’ | sort -t',' -hr -k5

*You may need to Right-Click and Open/View Image in New Tab to see these inserted screenshots in full resolution. Sorry about that.

However, as we can see here, the timestamps produced from a simple “ls -lh” were rather lacking in both what was provided (solely last modification time by default) as well as precision (only precise to the second by default*, and a LOT can happen on a system in a singular second that we’d need to distinguish during an investigation).

== Sidebar 1 ==
You might be wondering why I am piping find’s output to xargs to execute the “ls -lh” against the results versus simply using find’s built-in “-exec” parameter that ostensibly does the same thing. In short, this is for performance reasons which you can read about at the below links.
== /Sidebar 1 ==

== Sidebar 2 ==
Also note that all timestamps will be in the system’s local time. So, it would behoove you to collect that information from the system as well for future reference during analysis. This can be done a few different ways, as shown below:

== /Sidebar 2 ==

In light of the aforementioned issues (lacking additional timestamps and precision), I worked through a few different solutions and came up with the following which included not only timestamps with much greater precision (now with full nanosecond precision*) but also included all of the GNU “find” command’s printable timestamps (i.e., Last Modified, Last Accessed, and Inode Changed).

$ find /root -maxdepth 1 -type f -printf '%i,%M,%n,%g,%u,%s,%TY-%Tm-%Td %TT,%AY-%Am-%Ad %AT,%CY-%Cm-%Cd %CT,”%p"\n' | awk -F"," '{cmd="md5deep -q "$9; cmd | getline md5; close(cmd); cmd="sha1sum "$9; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6","$7","$8","$9","md5","sha1}' | awk '{$NF=""}1' | sed 's/ ,/,/g’ | sort -t',' -hr -k5

**Note that printf’s time field format is precise to 10 digits, while nanoseconds are (by definition) only precise to 9 digits. Thus, it is appending a 0 in the 10-th digit spot. Why? I frankly don’t know. I mean… uhh… “the reason of which will be left as an exercise to the reader.” 🙂

== Sidebar 3 ==
*I later discovered that you can show timestamps with full nanosecond resolution in ls via the “–full-time” parameter as I will show below.

$ ls -l --full-time
total 55088
drwxr-xr-x 2 root root 4096 2017-11-22 14:22:30.165725454 -0800 Desktop

== Sidebar 3 ==

At any rate, we’re making progress, but we’re still missing something. What about Inode (File) Creation? Is that not recorded in Linux? In short, Ext3 filesystems only record Last Modified (mtime, Last Accessed (atime), and Inode Changed (ctime), while Ext4 filesystem (on which a large majority of Linux distress operate) fortunately include the additional Inode Creation time (crtime). Lucky for us, I am doing this on an Ext4 filesystem, so we should be seeing those times if they’re implemented and recorded, right? You’d think so… but you’d be wrong.

Unfortunately, Linux decided not to implement an easy way (aka a natively integrated API) to view/include these (crtime) timestamps in various tools’ output (as seen here in the “find” command, and shortly in the “stat” command). Alas, FRET NOT, as there is a way to extract this timestamp using the debugfs utility. Intended as a “ext2/ext3/ext4 file system debugger”, it provides a “-R” option to execute a given command for debugging purposes. We will (ab)use this option to extract more information (i.e. the crtime timestamp) from the “stat” command than is originally provided by running the command on its own.

First, we will run “stat” against a file:

$ stat /root/VMwareTools-10.1.15-6627299.tar.gz
File: /root/VMwareTools-10.1.15-6627299.tar.gz
Size: 56375699 Blocks: 110112 IO Block: 4096 regular file
Device: fe01h/65025d Inode: 405357 Links: 1
Access: (0444/-r--r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-01-12 15:16:03.524147591 -0800
Modify: 2017-11-24 17:38:44.799279489 -0800
Change: 2017-11-24 17:38:44.799279489 -0800
Birth: -

Now, we will use the “debugfs” command to get the Inode/File Birth (crtime) timestamp. Keep in mind, you will need to provide the volume/partition on which the referenced file resides as a parameter to the command, otherwise the command will not work (namely yielding a “No such file or directory while opening filesystem” error). For my example below, my system is using LVM volumes and the file we’re querying resides on my root “k2–vg-root” LVM volume/partition.

$ debugfs -R 'stat /root/VMwareTools-10.1.15-6627299.tar.gz' /dev/mapper/k2--vg-root
Inode: 405357 Type: regular Mode: 0444 Flags: 0x80000
Generation: 763788199 Version: 0x00000000:00000001
User: 0 Group: 0 Project: 0 Size: 56375699
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 110112
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x5a18c9a4:be902604 -- Fri Nov 24 17:38:44 2017
atime: 0x5a5941b3:7cf76e1c -- Fri Jan 12 15:16:03 2018
mtime: 0x5a18c9a4:be902604 -- Fri Nov 24 17:38:44 2017
crtime: 0x5a18c9a4:5740e928 -- Fri Nov 24 17:38:44 2017
Size of extra inode fields: 32
Inode checksum: 0x53c3b2b6
(0-10239):1859584-1869823, (10240-12287):1873920-1875967, (12288-13763):1902149-1903624

There’s actually a lot of great output here that can be very useful to us as forensic analysts, but we really only need the crtime for our purposes today. So, we can do a little command-line fu to just extract the human readable portion of the crtime timestamp we care about.

$ debugfs -R 'stat /root/VMwareTools-10.1.15-6627299.tar.gz' /dev/mapper/k2--vg-root |& sed -n 's/^crtime.*- \(.*\)$/\1/p'
Fri Nov 24 17:38:44 2017

To go a bit further and match stat’s default timestamp formatting, we can do a bit more command-line fu to yield the following:

$ date +"%Y-%m-%d %H:%M:%S.%N %z" -d "$(debugfs -R 'stat /root/VMwareTools-10.1.15-6627299.tar.gz' /dev/mapper/k2--vg-root |& sed -n 's/^crtime.*- \(.*\)$/\1/p')"
2017-11-24 17:38:44.000000000 -0800

Great, now we have a crtime (Inode/File Creation) timestamp we know and love. But, wait… anyone else noticing something here? The nanoseconds are all zeroes. Hmmm. Well, if we trace our process back a bit, we can see that this is because we are attempting to produce a nanosecond-precision datetime object from a source that obviously doesn’t include it. Obviously, we can’t extract nanosecond precision from an input that doesn’t contain it. So, where do we go from here?

Well, if we look back at the crtime output (crtime: 0x5a18c9a4:5740e928 -- Fri Nov 24 17:38:44 2017) we can see that the second column there contains two sets of hex digits (0x5a18c9a4:5740e928) delineated by a colon. Could it be that this is simply a hex version of the decimal and nanosecond epoch timestamps? Oh, it could, and it is. It turns out the first entry (previous to the colon) is the epoch seconds and the second entry (after the colon) is the nanoseconds. So, we’ll need to go back to our command and alter it to extract, convert, and construct the nanosecond epoch timestamp we’re looking to produce.

The below command extracts both the first and second set of hex digits (epoch seconds and epoch nanoseconds, respectively), converts both of the hex sets to decimal, converts the epoch seconds to a human-readable datetime object using Awk’s strftime formatting, and then divides the nanoseconds portion by four (essentially performing a two-bit shift) as is necessary per Hal Pomeranz’s article on EXT4 Timestamps here.

**Big thanks to Dan (aka @4n6k) for his assist here in leading me to Hal’s article as I was banging my head on this last portion for a bit until discovering this bitwise shift needed to be done. Also, of course, huge thanks to Hal (@hal_pomeranz) as well for this monumental efforts in painstakingly documenting EXT4 Timestamps and these nuances.**

$ debugfs -R 'stat /root/VMwareTools-10.1.15-6627299.tar.gz' /dev/mapper/k2--vg-root |& sed -n 's/^ mtime: \(0x[0-9a-f]+\):\([0-9a-f]+\).*/\1.0x\2/p' | awk -F'.' '{n = strtonum($2) / 4; print strftime("%Y-%m-%d %T",strtonum($1))"."n}'
2017-11-24 17:38:44.799279489

AWESOME. We can now successfully extract the crtime file timestamps programmatically.

Now, let’s put it alllll together and build our one-liner that’s going to help us reach our original goal here of outputting a file listing with all the available timestamps (in MACB order) as well as file hashes (MD5 and SHA1). We will be using the largely native md5sum and sha1sum utilities to produce our hashes so as to avoid the need to install any additional third-party tools.

And, here it is. I give the ugliest (most epic?) command to date to output everything we’ve been looking for:

# find /path/to/dir -maxdepth 1 -type f -printf '%i#%M#%n#%g#%u#%s#%TY-%Tm-%Td %TT#%AY-%Am-%Ad %AT#%CY-%Cm-%Cd %CT#"%p"\n' | awk -F"#" '{cmd="debugfs -R '\''stat <"$1">'\'' /dev/mapper/k2--vg-root 2>/dev/null | grep -Po \"(?<=crtime: 0x)[0-9a-f]+:[0-9a-f]+(?=.*)\" | tr \":\" \" \" | { read e n; echo \"$(date +\"%Y-%m-%d %H:%M:%S\" -d @$(printf %d 0x$e)).$(printf %09d $(( $(printf %d 0x$n) / 4 )) )\";}"; cmd | getline crt; close(cmd); cmd=“md5sum "$10" | cut -d \" \" -f1"; cmd | getline md5; close(cmd); cmd="sha1sum "$10" | cut -d \" \" -f1"; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6","$7","$8","$9","crt","$10","md5","sha1}' | sed 's/ *,/,/g’ | sort -t',' -hr -k6

Note that we had to do a few things to deal with various unsavory characters that may occur within filenames (e.g., spaces, parentheses, comma’s, etc.). First, we can’t use comma’s as our print output delimiter as filenames with comma’s would then screw up our Awk parsing. So, we needed to use a non-standard character (i.e. one we would never expect to see in our output). In this case I chose “#”, but you could use whatever you’d like. To get our debugfs stat output, as well as MD5 and SHA1 hashes, we utilize Awk’s ability to execute commands and retrieve the output with its getline function. You may notice that the debugfs stat command one-liner strings together a RegEx with a Lookbehind assertion, along with some bash read/print/date functions in order to translate the hex -> decimal -> formatted human-readable datetime for us.

So… How ‘bout them apples?? WE’VE DONE IT!! All of that painstaking work has paid off in spades. We’ve put together a command that is essentially FLS on steroids (with hashes) that we can run against BOTH a live and dead system! THIS IS WHAT DREAMS ARE MADE OF!

If you’d like to use this* as an FLS replacement against a (dead) system image, simply mount the image’s file system (Read-Only, of course), adjust the command to point to the root of the file system, remove the last “sort” command (as we can do that later during analysis as needed), and simply output to CSV. Like so:

*AGAIN, I PROVIDE NO GUARANTEES HERE, only a best effort here and initial pass on doing this. For example, in one of my test VM’s I kept getting what appeared to be random “sh: 1: printf: 0x: not completely converted” errors that output a default crtime date of “1969-12-31 16:00:00.000000000” which makes no sense as I’ve verified that the crtimes on these files are present and properly output via stat/debugfs and a manual conversion of the values yields success. Yet, it did not happen in other VM’s. So, just a heads up in case something goes awry on your end.

# echo "Inode,Permissions,HardLinks,GroupName,UserName,Size(Bytes),LastModified,LastAccess,Changed,Birth,Filename,MD5,SHA1" > FS_Listing.csv

# find / -xdev ! -path '/var/run/*' ! -path '/run/*' ! -path '/proc/*' -type f -printf '%i#%M#%n#%g#%u#%s#%TY-%Tm-%Td %TT#%AY-%Am-%Ad %AT#%CY-%Cm-%Cd %CT#"%p"\n' | awk -F"#" '{cmd="debugfs -R '\''stat <"$1”>'\'' /dev/mapper/k2--vg-root 2>/dev/null | grep -Po \”(?<=crtime: 0x)[0-9a-f]+:[0-9a-f]+(?=.*)\" | tr \":\" \" \" | { read e n; echo \"$(date +\"%Y-%m-%d %H:%M:%S\" -d @$(printf %d 0x$e)).$(printf %09d $(( $(printf %d 0x$n) / 4 )) )\";}"; cmd | getline crt; close(cmd); cmd="md5sum "$10" | cut -d \" \" -f1"; cmd | getline md5; close(cmd); cmd="sha1sum "$10" | cut -d \" \" -f1"; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6","$7","$8","$9","crt","$10","md5","sha1}' | sed 's/ *,/,/g' >> FS_Listing.csv

Note that we are:

  1. First writing a “header” line to the CSV file for easier reference during analysis
  2. Now operating from a ROOT prompt (e.g. the leading “#” denoting a root prompt versus the “$” denoting a standard user prompt) as we will need root privileges to access/read the entire filesystem
  3. Avoiding traversal of external mounted filesystems (i.e. network shares, external media, etc.) via the “-xdev” parameter, and
  4. Specifically avoiding a few directories via the “! -path /path/to/avoid/*” as the aforementioned paths store ephemeral process information we aren’t interested in collecting (at least not for our purposes here).

Excel ProTips: If you are using Excel to review the CSV file, be aware that Excel only displays time precision down to the milliseconds (and no further). Alas, you will be missing everything beyond the 3 digits past the decimal place. In order to display this millisecond precision, you will want to highlight all the MACB timestamp cells, right-click, select Format Cells, select Custom under the Number tab, input the Type as mm/dd/yyyy hh:mm:ss.000 (or whatever you like, the important part is the timestamp’s trailing .000), then click OK. And, Voila!, millisecond timestamp precision. Obviously, it is most valuable to be able to actually see the full nanosecond precision but at least it’s something for those who are die-hard Excel fans.

Also, for whatever reason, Excel does something weird with displaying some of the leading permissions entries by prepending a “=“ to them. Why, I have no idea. Maybe Excel gets confused and sometimes tries to interpret “-“ text as an intended negative or minus sign and thus attempts to “fix” it for us (in true Microsoft fashion) by denoting it as a formula and prepending the “=”? ¯\_(ツ)_/¯ For whatever reason, it’s happening (see below). Just be aware that this is something Excel is adding and that it is NOT present in the original CSV if you’re using any other tools for analysis.

Now… how about doing this on a Mac? OF COURSE we’re going to translate this over…


If you’ve been reading my blog (and/or working between Linux and Mac systems for a while), you’ll know that things do not often translate directly to from Linux (GNU) to Mac (BSD) as the core utilities seem to always differ just enough to make your life a pain when working between systems. And, this situation is no different.

As you might assume, we are going to use the “stat” command again as the basis for extracting all of our timestamps. However, we will of course be using the BSD stat command and not the GNU version as used in Linux. Below is the default BSD “stat” output (the format of which is of course different from GNU “stat” because… why not):

$ stat .vimrc
16777220 1451064 -rw-r--r-- 1 jp staff 0 54 "Sep 22 12:08:02 2017" "Dec 26 10:36:32 2016" "Dec 26 10:36:32 2016" "Dec 26 10:12:15 2016" 4096 8 0 .vimrc

The upside here is that, by default, BSD “stat” outputs all 4 HFS+ filesystem timestamps we care about! Great, but which are what? Saving you some time and research, BSD “stat” outputs timestamps in the following order by default:

Last Accessed, Last Modified, Inode Changed, Inode Birth - (A,M,C,B)

Just as we discussed earlier, these reflect the time file was last accessed, the time the file was last modified, the time the inode was last changed, and birth time of the Inode. So, in order to get them into an order we like (okay, the order that I like) such as MACB (because this is how we most often see the timestamp acronym), we can perform the following:

$ stat -t "%b %-d %Y %T %Z" .vimrc | awk -F'"' '{print "Modified: "$4; print "Accessed: "$2; print "Changed: "$6; print "Birth: "$8}'
Modified: Dec 12 2016 10:36:32 PST
Accessed: Sep 9 2017 12:08:02 PDT
Changed: Dec 12 2016 10:36:32 PST
Birth: Dec 12 2016 10:12:15 PST

And, there we have it, full timestamp information in the order we (I) like it. Do note that HFS+ timestamp precision is only down to the second as it does not implement nanosecond resolution like some other filesystems. And, for that, we do a hearty ¯\_(ツ)_/¯. Fortunate for us in the future, APFS has implemented nanosecond timestamp resolution. But, that’s a separate discussion you can read about here.

Now that we’ve taken care of that timestamp acquisition and formatting issue, let’s move on to building the command line statement we’re going to run. While GNU’s find utility provides a “-printf” option to format and customize find’s output, BSD’s find lacks such an option. Alas, we will need to be a bit more creative here. What I ended up doing here was piping find’s output to BSD’s “stat” command which DOES provide a formatting option “-f” that we can utilize. But, again, it’s not as straight forward as just copy/paste of the previous formatting we used on Linux because OF COURSE the print delimiters don’t directly translate over either.

So, first we need to translate over the previous GNU print formatting string ('%i#%M#%n#%g#%u#%s#%TY-%Tm-%Td %TT#%AY-%Am-%Ad %AT#%CY-%Cm-%Cd %CT#”%p”\n') into the correlated BSD values, which end up being the following:


I’m using “^” as a delimiter this time instead of “#” as I ended up actually having files with hash/pound signs in their name on my system (THANKS, ATOM APP). Also, note that I’m using single-tick’s for the print statement and using full quote encapsulation for the filename. I’m doing this in order to avoid issues with dollar signs ($) in filenames. Again, no, using such delimiters is not very pretty, but it’s required. And, if you for some reason have files with “^” in their names, it will break this as well. So, YMMV.

$ find /Users/jp -maxdepth 1 -type f -print0 | xargs -0 stat -t "%Y-%m-%d %H:%M:%S" -f '%i^%Sp^%l^%Sg^%Su^%z^%Sm^%Sa^%Sc^%SB^"%N"' | sort -t'^' -nr -k6

Note that I also needed to specify stat’s “-t” argument to format the datetime output in the printf statement.

So, there we have it, listing directory output in decreasing file size.

Now, on to calculating and appending our MD5 and SHA1 hashes to the output. For this, we will use BSD’s native md5 and shasum utilities. Using much of the same structure from our Linux one-liner, we then come up with the following:

# find /Users/jp -maxdepth 1 -type f -print0 | xargs -0 stat -t "%Y-%m-%d %H:%M:%S" -f '%i^%Sp^%l^%Sg^%Su^%z^%Sm^%Sa^%Sc^%SB^"%N"' | awk -F"^" '{cmd="md5 "$11" | cut -d \" \" -f4"; cmd | getline md5; close(cmd); cmd="shasum "$11" | cut -d \" \" -f1"; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10","$11","md5","sha1}' | sort -t',' -nr -k6

And there we have it, a directory listing with hashes sorted in decreasing file size. Note that were are now again in a root shell to avoid file access permissions.

Now, on to the final one-liner to do a full filesystem listing:

# echo "Inode,Permissions,HardLinks,GroupName,UserName,Size(Bytes),LastModified,LastAccess,Changed,Birth,Filename,MD5,SHA1" > OSX_Listing.csv

# find -x / -type f -print0 | xargs -0 stat -t "%Y-%m-%d %H:%M:%S" -f '%i^%Sp^%l^%Sg^%Su^%z^%Sm^%Sa^%Sc^%SB^"%N"' | awk -F"^" '{cmd="md5 "$11" | cut -d \" \" -f4"; cmd | getline md5; close(cmd); cmd="shasum "$11" | cut -d \" \" -f1"; cmd | getline sha1; close(cmd); print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10","$11","md5","sha1}' >> OSX_Listing.csv

Note that OS X find’s “-x” parameter is equivalent to GNU’s “-xdev”, meaning not to enumerate external disks/mounted filesystems.

When I ran this against my full system, I realized it choked on files containing “$”. So, I needed to add in some Awk substitution to escape the dollar sign with a leading “\” so that the shell wouldn’t attempt to interpret the “$” as (mis)indication of a variable when it was simply a dollar sign in a file name. Full disclosure: it may also choke on other files with special characters, but I’ve shown you how you can use Awk substitution as a way around it. So, update/augment this as needed.


Sooooo, Wow. That was a bit of hard work. Actually, it was A LOT of hard work, much of which was not captured in the blog post for the sake of brevity and everyone’s sanity, as it surely tested mine many a time. However, hopefully you can see the value of spending the time building effective and efficient processes on the front end so you are not always paying for it on the back end. Suffice to say, IMHO, sometimes it is ok to work harder and not smarter, when the process will help you become more of the latter.

If you wanted to run any of the above commands against a mounted evidence image, you’d simply specify its mount point in the find command, like so:

# find /mnt/point/ ...

Note that we don’t use the “-xdev” or “-x” parameter here as we do actually want it to enumerate an external filesystem (i.e. our mounted evidence image’s filesystem which is likely from an external disk or network share).

And, now that we’ve walked through doing all of that the hard way using native Linux utilities, I will say that another filesystem enumeration capability to include hashes has also been built in Python in Jim Clausing’s script. However, due to Python’s os.stat call limitations, this script does not/cannot pull the btime (aka crtime) attributes that we are able to identify and extract through our commands here. Nonetheless, it is another option, which is always great.

Thanks to everyone for hanging in there through this whole post. It obviously takes way more time to painstakingly walk through every step of a process; however, I feel it is well worth my time to teach people to fish, and hopefully you all do too.

Quick(er) Mounting and Dismounting of LVM’s on Forensic Images

I recently came across Int’l Man of Leisure’s blog posts here and here on “Mounting and Imaging Logical Volume Manager (LVM2)”. He does a great job of defining the problem statement (dealing with LVM’s in their various image formats in a DFIR investigation) and how to work through getting a set of logical images back into their intended LVM layout for appropriate mounting and analysis.

IMOL begins by going through the background of LVM, what it is, and how to install it to prepare your system for dealing with LVM’s. Once prepared, IMOL begins by presenting a set of two VMDK images that must be merged or “stitched together” in order to be interpreted and parsed by the Linux LVM. However, VMDK files are not something natively readable/mountable by a linux system. So, before we can even begin stitching these back together, these VMDK files must be converted into something natively readable by the system, such as a raw image/block device. In IMOL’s testing, he found that FTK Imager was one of a few tools that was able to read the VMDK files in order to convert (image) them to raw files. He then used FTK Imager to image the VMDK files into respective Raw DD format files for continued use. However, here is where I would like to branch into my process for mounting LVM’s that completely eliminates the need for converting an image to raw using a tool called “QEMU”, and thus saving you (potentially) hours of time.

We all know when dealing with forensic imaging/conversion that even the slightest hiccup can render an entire image useless and long-spent time wasted. The less time we spend imaging/converting, the faster we can get to analysis and toward our goals for the investigation. Enter QEMU, specifically “qemu-nbd”. I could go into a lot of detail of all the types of images it can convert and how useful it can be in various capacities (in fact, I may do another blog post about it). However, for this post, I will just stick to specifically how you can use it to perform on-the-fly image format translation (in real-time) between various formats – no need to spend time converting to another image file.

QEMU has a utility called “qemu-nbd” (nbd stands for Network Block Device) that essentially performs real-time translation betwixt (I have waited so long to use that word in a serious tone) various image formats. It’s as easy as the following:

Ensure you have an available NBD
# ls -l /dev/nbd*

If no device files (or not enough) exist as /dev/nbd*, create as many as needed
# for i in {0..<z>}; do mknod /dev/nbd$i b 43 $i; done
*Where <z> is the number of devices you need, minus one

Mount the image file
# qemu-nbd -r -c /dev/nbd<x> image.<extension>
* -r: read-only
* -c: connect image file to NBD device
*Where <x> is the appropriate block device number (typically starting at 0) and <extension> is a supported QEMU Image Type (raw, cloop, cow, qcow, qcow2, vmdk, vdi, vhdx, vpc)

Note that this will need to be done for each image that is a part of the LV group. For example, if there are 3 different VMDK files that together comprise one or more LV groups, you would do the following (ensuring the associated /dev/nbd devices have already been created before issuing the below commands):

# qemu-nbd -r -c /dev/nbd0 image_0.vmdk
# qemu-nbd -r -c /dev/nbd1 image_1.vmdk
# qemu-nbd -r -c /dev/nbd2 image_2.vmdk

That’s it. Each /dev/nbd<x> is immediately translated and available as a raw block device to be queried/mounted just as if it were a raw image to begin with. Pretty awesome, huh?

Now, if you were lucky enough to start with raw/DD images, you don’t need to perform any of the above. Instead, you can just skip to the below instructions for mounting and mapping the LVM’s.

At this point, my process to identify and load the LVM(s) mostly mirrors that as described by IMOL, with a few subtle differences. I won’t go into great detail of it all as IMOL gives great descriptions of each step in his walk-through. However, I will lay out my commands below for those who are looking for an easy copy/paste method to stick into their cheat sheets.

Keep in mind that the order of the below commands is critical to successful mounting of LVM’s.

Load the LVM module if not already loaded
# modprobe dm_mod

Ensure you have enough available loopback devices (one for each nbd device)
# ls -l /dev/loop*

If not enough loopback devices exist, create as many as needed
# for i in {0..<z>}; do mknod /dev/loop$i b 7 $i; done
*Where <z> is the number of devices you need, minus one

Set up a loopback device for each image that is part of the LV group
# losetup -r [-o <byte_offset>] -f [/dev/loop<x>] /dev/nbd<x>

Read partition tables from each loopback device to create mappings
# kpartx -a -v /dev/loop<x>

List the available Physical Volume Groups (VG’s)
# pvdisplay

List the available Logical Volume Groups (VG’s)
# lvdisplay

(Optional) If not listed, re-scan the mounted volumes to identify the associated VG’s
# pvscan
# lvscan
# vgscan

Activate the appropriate VG’s
# vgchange -a y <VG>
** The recombined LVM volume(s) will now be available at /dev/mapper/<VolumeGroup>-<VolumeName>

Mount the LVM Volume(s)
# mount [options] /dev/mapper/<VolumeGroup>-<VolumeName> /mnt/point

Congratulations. You (should) now have filesystem access to the given LVM(s)!

== Sidebar ==

Interested in why we use the given numbers of “43” and “7” for the mknod command?

The mknod command is structured like the following: mknod <device> <type> <major_#> <minor_#>

For our uses, we are creating device files of type “b” (block device), with major #’s of “43” (nbd) and “7” (loopback). The major number tells the system what type of device to expect and operate as. For a list of devices that your system is aware of and can dynamically assign when a major number is not specified, check out your /proc/devices file. IBM does a rather good job of explaining it all here. The minor number is more for reference and thus should probably, as best practice, reference/relate to the device number as well. Though, there is nothing requiring it to be that way.

For further information about the mknod command structure, just check out the man page.

== /Sidebar ==

Once you’re done with the images, the next logical step is to dismount the images which can at times become unnecessarily and illogically troublesome. To properly dismount LVM’s, perform the following steps (again, order is critical here!):

Dismount each mounted filesystem
# umount /mnt/point

De-activate each activated Volume Group
# vgchange -a n <VG>

Remove partition mappings for each loop device
# kpartx -d -v /dev/loop<x>

Remove each loopback device
# losetup -d /dev/loop<x>

(Optional) Force remove an LVM
# dmsetup remove -f <VG>

Keep in mind the dmsetup command above is considered deprecated and not suggested for use. However, I provide it as I have had to use it at times in the past when a VG simply would not detach using the appropriate commands. That said, if all else fails, reboot 🙂

Hopefully, this post will help with the often convoluted process of mounting LVM’s, especially when split across multiple images/devices.


Know Your Tools: Linux (GNU) vs. Mac (BSD) Command Line Utilities

Welcome to first post in the “Know Your Tools” series!

Without further ado…

Have you ever wondered if/how *nix command line utilities may differ across distributions? Perhaps it never even occurred to you that there was even a possibility the tools were any different. I mean, they’re basic command line tools. How and why could/would they possibly differ?

Well, I’m here to say… thy basic command line utilities art not the same across different distributions. And, the differences can range from those that can cause a simple nuisance to those that can cause oversight of critical data.

Rather than going into aspects of this discussion that have already been covered such as how Linux and BSD generally differ, I would instead like to focus on a few core utilities commonly used in/for DFIR artifact analysis and some caveats that may cause you some headache or even prevent you from getting the full set of results you’d expect. In highlighting the problems, I will also help you identify some workarounds I’ve learned and developed over the years in addressing these issues, along with an overarching solution at the end to install GNU core utilities on your Mac (should you want to go that route).

Let’s get to it.


Grep is one of the most useful command-line utilities for searching within files/content, particularly for the ability to use regular expressions for searching/matching. To some, this may be the first time you’ve even heard that term or “regex” (shortened version of it). Some of you may have been using it for a while. And, nearly everyone at some point feels like…


Regardless of whether this is your first time hearing about regular expressions or if you use them regularly albeit with some level of discomfort, I HIGHLY suggest you take the time to learn and/or get better at using them – they will be your most powerful and best friend for grep. Though there is a definite regex learning curve (it’s really not that bad), knowing how to use regular expressions translates directly to performing effective and efficient searches for/of artifacts during an investigation.

Nonetheless, even if you feel like a near master of regular expressions, equally critical to an expression’s success is how it is implemented within a given tool. Specifically for grep, you may or may not be aware that it uses two different methods of matching that can highly impact the usefulness (and more important, validity) of results returned – Greedy vs. Lazy Matching. Let’s explore what each of these means/does.

At a very high level, greedy matching attempts to find the last (or longest) possible match, and lazy matching attempts to find the first possible match (and stops there). More specifically, greedy matching employs what is called backtracking and look-behind’s but that is a separate discussion. Suffice to say, using an incorrect, unintended, and/or unexpected matching method can completely overlook critical data or at the very least provide an inefficient or invalid set of results.

Now having established some foundational knowledge about how grep searches can work, we will drop the knowledge bomb – the exact same grep expression on Linux (using GNU grep) may produce completely different or no results on Mac (using BSD grep), especially when using these different types of matching.

…What? Why?

The first time I found this out I spent an inordinate and unnecessary amount of time banging my head against a wall typing and re-typing the same expression across systems but seeing different results. I didn’t know what I didn’t know. And, well, now I hope to let you know what I didn’t know but painfully learned.

While there is an explanation of why, it doesn’t necessarily matter for this discussion. Rather, I will get straight to the point of what you need to know and consider when using this utility across systems to perform effective searches. While GREEDY searches execute pretty much the same across systems, the main difference comes when you are attempting to perform a LAZY search with grep.

We’ll start with GREEDY searches as there is essentially little to no difference between the systems. Let’s perform a greedy search (find the last/longest possible match) for any string/line ending in “is” using grep’s Extended Regular Expressions option (“-E”).

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+is'
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+is'

Both systems yield the same output using a completely transferrable command. Easy peasy.

Note: When specifying Extended Regular Expressions, you can (and I often do) just use “egrep” which implies the “-E” option.

Now, let’s look at LAZY searches. First, how do we even specify a lazy search? Well, to put it simply, you append a “?” to your matching sequence. Using the same search as before, we’ll instead use lazy matching (find the first/shortest match) for the string “is” on both the Linux (GNU) and Mac (BSD) versions of grep and see what both yield.

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+?is'
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+?is'

Here the fun begins. We did the exact same command on both systems and it returned different results.

Well, for LAZY searches, Linux (GNU) grep does NOT recognize lazy searches unless you specify the “-P” option (short for PCRE, which stands for Perl Compatible Regular Expressions). So, we’ll supply that this time:

(Linux GNU)$ echo “thisis” | grep -Po ‘.+?is'

There we go. That’s what we expected and hoped for.

*Note: You cannot use the implied Extended expression syntax of “egrep” here as you will get a “conflicting matchers specified” error. Extended regex and PCRE are mutually exclusive in GNU grep.

Note that Mac (BSD), on the other hand, WILL do a lazy search by default with Extended grep. No changes necessary there.

While not knowing this likely won’t lead to catastrophic misses of data, it can (and in my experience will very likely) lead to massive amounts of false positives due to greedy matches that you have to unnecessarily sift through. Ever performed a grep search and got a ton of very imprecise and unnecessarily large (though technically correct) results? This implementation difference and issue could certainly have been the cause. If only you knew then what you know now…

So, now that we know how these searches differ across systems (and what we need to modify to make them do what we want), let’s see a few examples where using lazy matching can significantly help us (note: I am using my Mac for these searches, thus the successful use of Extended expressions using “egrep” to allow for both greedy and lazy matching)…

User-Agent String Matching
Let’s say I want to identify and extract the OS version from Mozilla user-agent strings from a set of logs, the format of which I know starts with “Mozilla/“ and then contains the OS version in parenthesis. The following shows some examples:

  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
  • Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)

Lazy Matching
(Mac BSD)$ echo "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36" | egrep -o 'Mozilla.+?\)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)

Searching for Malicious Eval Statements
Let’s say I want to identify and extract all of the base64 eval statements from a possibly infected web page for analysis, so that I can then pipe it into sed to extract only the base64 element and decode it for plaintext analysis.

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+\)'
eval(base64_decode("DQplcnJvcl9yZ=")); var ua = navigator.userAgent.toLowerCase()

Lazy Matching (matches exactly what we want)
(Mac BSD)$ echo "date=new Date(); eval(base64_decode(\"DQplcnJvcl9yZ=\")); var ua = navigator.userAgent.toLowerCase();" | egrep -o 'eval\(base64_decode\(.+?\)'

There you have it. Hopefully you are now a bit more informed not only about the differences between Lazy and Greedy matching, but also about the difference in requirements across systems.


Strings is an important utility for use in extracting “human-readable” strings from files/binaries. It is particularly useful in extracting strings from (suspected) malicious binaries/files to attempt to acquire some insight into what may be contained within the file, its capabilities, hard-coded domains/URL’s, commands, … the list goes on.

However, not all strings are created equal. Sometimes, Unicode strings exist within a file/program/binary for various reasons, those of which are also important to identify and extract. By default, the GNU (Linux) strings utility searches for simple ASCII encoding, but also allows you to specify additional encodings for which to search, to include Unicode. Very useful.

By default, the Mac (BSD) strings utility also searches for simple ASCII encoding; however, I regret to inform you that the Mac (BSD) version of strings does NOT have the native capability to search for Unicode strings. Do not ask why. I highly encourage you to avoid the rabbit hole of lacking logic that I endured when I first found this out. Instead, we should move on and instead just be asking ourselves, “What does this mean to me?” Well, if you’ve only been using a Mac to perform string searches using the native BSD utility, you have been MISSING ALL UNICODE STRINGS. Of all the pandas, this is a very sad one.

So, what are our options?

There are several options, but I personally use one of the following (depending no the situation and my mood) when I need to extract both Unicode and ASCII strings from a file using a Mac (BSD) system:
1. Willi Ballenthin’s Python strings tool to extract both ASCII and Unicode strings from a file
2. FireEye’s FLOSS tool (though intended for binary analysis, it can also work against other types of files)
3. GNU strings*

*Wait a minute. I just went through saying how GNU strings isn’t available as a native utility on a Mac. So, how can I possibly use GNU strings on it? Well, my friends, at the end of this post I will revisit exactly how this can be achieved using a nearly irreplaceable third-party package manager.

Now, go back and re-run the above tools against various files and binaries from your previous investigations you performed from the Mac command line. You may be delighted at what new Unicode strings are now found 🙂


Sed (short for “Stream editor”) is another useful utility to perform all sorts of useful text transformations. Though there are many uses for it, I tend to use it mostly for substitutions, deletion, and permutation (switching the order of certain things), which can be incredibly useful for log files with a bunch of text.

For example, let’s say I have a messy IIS log file that somehow lost all of its newline separators and I want to extract just the HTTP status code, method, and URI from each line and output into its own separate line (restoring readability):


Looking at the pattern, we’d like to insert a newline before each instance of the date, beginning with “2016-…”. Lucky for us, we’re on a Linux box with GNU sed and it can easily handle this:

(Linux GNU)$ sed 's/ \(.+\?\)2016/\1\n2016/g' logfile.txt

You can see that it not only handles lazy matching, but also handles ANSI-C escape sequences (e.g., \n, \r, \t, …). This statement also utilizes sed variables, the understanding of which I will leave to the reader to explore.

Sweet. Let’s try that on a Mac…

(Mac BSD)$ sed 's/\(.+\?\)\(.+\)/\1\n2016/g' logfile.txt

… Ugh. No luck.

Believe it or not, there are actually two common problems here. The first is the lack of interpretation of ANSI-C escape sequences. BSD sed simply doesn’t recognize any (except for \n, but not within the replacement portion of the statement), which means we have to find a different way of getting a properly interpreted newline into the statement.

Below are a few options that will work around this issue (and there are more clever ways to do it as well).

1. Use the literal (i.e., for a newline, literally insert a new line in the expression)
(Mac BSD)$ sed ’s//\*Press Enter*
> /g'

2. Use bash ANSI-C Quoting (I find this the easiest and least effort, but YMMV)
(Mac BSD)$ sed 's//\'$'\n/g’
3. Use Perl
(Mac BSD)$ perl -pe ‘s||\n|g'

Unfortunately, this only solves the first of two problems, the second being that BSD sed still does not allow for lazy matching (from my testing, though I am possibly just missing something). So, even if you use #1 or #2 above, it will only match the last found pattern and not all the patterns we need it to.

“So, should I bother with using BSD sed or not?”

Well, I leave that up to your judgment. Sometimes yes, sometimes no. In cases like this where you need to use both lazy matching and ANSI-C escape sequences, it may just be easier to skip the drama and use Perl (or perhaps you know of another extremely clever solution to this issue). Options are always good.

Note: There are also other issues with BSD sed like line numbers and using the “-i” parameter. Should you be interested beyond the scope of this post, this StackExchange thread actually has some useful information on the differences between GNU and BSD sed. Though, I’ve found that YMMV on posts like this where the theory and “facts” may not necessarily match up to what you find in testing. So, when in doubt, always test for yourself.


Of all commands, you might wonder how something so basic as find could differ across *nix operating systems. I mean, what could possibly differ? It’s just find, the path, the type, the name… how or why could that even be complicated? Well, for the most part they are the same, except in one rather important use case – using find with regular expressions (regex).

Let’s take for example a regex to find all current (non-archived/rotated) log files.

On a GNU Linux system this is somewhat straight forward:

(Linux GNU)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"

You can see here that rather than using the standard “-name” parameter, we instead used the “-regextype” flag to enable extended expressions (remember egrep from earlier?) and then used the “-regex” flag to denote our expression to utilize. And, that’s it. Bless you, GNU!

Obviously, Mac BSD is not this straight forward, otherwise I wouldn’t be writing about it. It’s not exactly SUPER complicated, but it’s different enough to cause substantial frustration as your Google searches will show that the internet is very confused about how to do this properly. I know. Shocking. Nonetheless, there is value in traveling down the path of frustration here so that you don’t have to when it really matters. So, let’s just transfer the command verbatim over to a Mac and see what happens.

(Mac BSD)$ find /var/log -type f -regextype posix-extended -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -regextype: unknown primary or operator

Great, because why would BSD find use the same operators, right? That would be too easy. By doing a “man find” (on the terminal, not in Google, as that will produce very different results from what we are looking for here) you will see that BSD find does not use that operator. Though, it still does use the “-regex” operator. Easy enough, we’ll just remove that bad boy:

(Mac BSD)$ find /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*
(Mac BSD)$

No results. Ok. Let’s look at the manual again… ah ha, to enable extended regular expressions (brackets, parenthesis, etc.), we need to use the “-E” option. Easy enough:

(Mac BSD)$ find /var/log -E -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"
find: -E: unknown primary or operator

Huh? The manual says the “-E” parameter is needed, yet we get the same error message we got earlier about the parameter being an unknown option. I’ll spare you a bit of frustration and tell you that it is VERY picky about where this flag is put – it must be BEFORE the path, like so:

(Mac BSD) $> find -E /var/log -type f -regex "/var/log/[a-zA-Z\.]+(/[a-zA-Z\.]+)*"

Success. And, that’s that. Nothing earth shattering here, but different and unnecessarily difficult enough to be aware of in your switching amongst systems.

So, now what?

Are you now feeling a bit like you know too much about these little idiosyncrasies? Well, there’s no going back now. If for no other reason, maybe you can use them to sound super smart or win bets or something.

These are just a few examples relevant to the commands and utilities often used in performing DFIR. There are still plenty of other utilities that differ as well that can make life a pain. So, now that we know this, what can we do about it? Are we doomed to live in constant translation of GNU <—> BSD and live without certain GNU utility capabilities on our Macs? Fret not, there is a light at the end of the tunnel…

If you would like to not have to deal with many of these cross-platform issues on your Mac, you may be happy to know that the GNU core utilities can be rather easily installed on OS X. There are a few options to do this, but I will go with my personal favorite method (for a variety of reasons) called Homebrew.

Homebrew (or brew) has been termed “The missing package manager for OS X”, and rightfully so. It allows simple command-line installation of a huge set of incredibly useful utilities (using Formulas) that aren’t installed by default and/or easily installed via other means. And, the GNU core utilities are no exception.

As a resource, Hong’s Technology Blog provides a great walk-through of installation and considerations.

You may already be thinking, “Great! But wait… how will the system know which utility I want to run if both the BSD and GNU version are installed?” Great question! By default, homebrew installs the binaries to /usr/local/bin. So, you have a couple options, depending on which utility in particular you are using. Some GNU utilities (such as sed) are prepended with a “g” and can be run without conflict (e.g., “gsed” will launch GNU sed). Others may not have the “g” prepended. In those cases, you will need to make sure that /usr/local/bin is in your path (or has been added to it) AND that it precedes those of the standard BSD utilities’ locations of /usr/bin, /bin, etc. So, your path should look something like this:

$ echo $PATH

With that done, it will by default now launch the GNU version installed in /usr/local/bin instead of the standard system one located in /usr/bin. And, to use the native system utilities when there is a GNU version installed with the same name, you will just need to provide their full path (i.e., “/usr/bin/<utility>”).

Please feel free to sound off in the comments with any clever/ingenious solutions not covered here or stories of epic failure in switching between Linux and Mac systems 😃


Powered by WordPress & Theme by Anders Norén