Evolve Version 1.6

evolve-logo

New features in #EvolveTool!

API Doc

The new part of this feature is the HTML to outline all the various URLs that can be used to interact with Evolve. They have mostly been there in the background already, although a couple of the URLs are new. These URLs give the ability to have Evolve work in a sort of headless mode. You can use any scripting language that can GET or POST. The return data is in JSON format.

evolve-api

Plugin Search

The plugin list for Volatility commands keeps growing thanks to the great support by the core dev team and all the gracious developers in the community. I’m sure the annual contest giving away money had no part in it either. Anyways, I figured it would be helpful to have the ability to run a quick search over the plugin list where you can type a part of what you are looking for. It doesn’t support any fancy matching though, and just puts wildcards in front and behind whatever you type. It searches while you type and you can use [ESC] or click the X to the right to quickly clear the box.

Try typing ‘dump’ and you will get a list of those plugins that Evolve doesn’t yet support:

evolve-plugin-search

Teaser: Volatility Command Line Options

Speaking of Volatility plugins that aren’t supported in Evolve, I was able to dig into the Volatility core and determine where those options are stored in Object Oriented (OO) data structures. You will see a couple new URLs listed in the API doc that take advantage of this new found knowledge.

The first is a list of all the default options that Volatility has. You can see those by running ‘vol.py -h’ in the shell, or accessing the API here:

evolve-options

The second builds on the above URL to get more specific options that any of the plugins are allowed to add into the list to accept during processing. You can display the full collection of options with the plugin specifics listed at the bottom with ‘vol.py dumpregistry -h’ or you can get only the specific options that each plugin adds by accessing this API:

evolve-options-plugin

Some Background

I originally took on the project of making Evolve to learn Python. I wanted to build something that required research and learning, and something that would make me stretch. I could have written this project in any language and just made calls to vol.py to get things running. I’ve seen many of these projects pop up over the years and they work great. I decided to fully integrate with Volatility to better learn Python and have more power and control over how I hand off processing jobs. That decision has caused some headaches, so I try to share the solutions when I can.

The challenge in here is that Volatility uses a library for parsing command line options that is built into Python. This setup works great for the scenario that Volatility is typically run, at a command prompt, where the user has to supply all those parameter names and values up front. It doesn’t make it so easy to fetch a list of the various options any of the plugins might want to take advantage of because those options aren’t built into OO to just get.

The plugins written for Volatility interface with optparse to add in the recognition for the short and long parameter designations. The optparse object is a member of Volatility’s ConfObject class, but not really integrated.

To get to the list of default options is pretty straight forward. You have to build a ConfObject anyways when integrating with Volatility, and the default options all come along as it is built.

To get the options that any of the plugins add on top of the default, you have to utilize the ConfObject again, but as a parameter when initiating the plugin of choice. The result is a full list of all the options that are now available, including those earlier found default options. You have to do the work of differentiating the newly added from the defaults. To prepare for doing this, I created a second ConfObject and pulled the new list in.

The next challenge is the structure of the options being held in the optparse object isn’t really straight forward. The items are not provided in a list, so you can’t do much with them as they sit. Fortunately they are iterable, so that allows for Python to use in as a collection in a for loop. You can see the debug view getting to one of these options here:

evolve-options-debug

I chose not to deal with all of those properties since I don’t think the Volatility plugins have that much ability to manipulate, but I will doing some further testing on this. If there are more properties that are needed in Evolve, they are fairly simple to add into the JSON return at this point.

After grabbing the handful of properties into a dictionary, I stuff that dictionary into a tuple. You can read more about the differences since I won’t go into that here. The tuple made it easier to work with, and I don’t have the need to change that object.

With a tuple of options from two ConfObjects, I could now determine which of those options were added by the provided plugin. Now I had to repeat that process for every plugin available in Volatility, and I am very thankful for loops and automation.

Check it out on GitHub.
https://github.com/JamesHabben/evolve

I hope you find these new features helpful, and the upcoming features exciting. Please reach out if you have any questions or feature suggestions for Evolve.

James Habben
@JamesHabben

Windows Prefetch: Tech Details of New Research in Section A & B

I wrote previously with an overview about the research into Windows prefetch I have been working on for years. This post will be getting more into the technical details of what I know to help others take the baton and get us all a better understanding of these files and the windows prefetch system.

I will be using my fork of the Windows-Prefetch-Parser to display the outputs in parsing this data. Some of the trace files I use below are public, but I didn’t have certain characteristics in my generated sample files to show all the scenarios.

Section A Records

I will just start off with a table of properties for the section A records, referred to as the file metrics. The records are different sizes depending on the version. I have been working with the newer version (winVista+) and it has just a tad more info than the xp version.

Section A Version 17 format (4 byte records)

0 trace chain starting index id
4 total count of trace chains in section B
8 offset in section C to filename
12 number of characters in section C string
16 flags

Section A Version 23 format (4 byte records, except noted)

0 trace chain starting index id
4 total count of trace chains in section B
8 count of blocks that should be prefetched
12 offset in section C to filename
16 number of characters in section C string
20 flags
24 (6) $MFT record id
30 (2) $MFT record sequence update

As you can see between the tables, the records grew a bit starting with winVista to include a bit more data. The biggest difference is in the $MFT record references. Very handy to know the record number and the sequence update to be able to track down previous instances of files in $Log or $UsnJrnl records. The other added field is a count of blocks to be prefetched. There is a flag setting in the trace chain records that allows the program to specify if a block (or group) should be pulled fresh every time, somewhat like a web browser.

The flag values seem to be consistent between the two versions of files. This is an area that applies a general setting to all of the blocks (section B) loaded from the referenced file, but I have seen times where the blocks in section B were assigned a different flag value. Mostly, they line up. Here are the flag values

Flag values (integer bytes have been flipped from disk)
0x0200    X    blocks (section B) will be loaded into executable memory sections
0x0002    R    blocks (section B) will be loaded as resources, non-executable
0x0001    D    blocks should not be prefetched

You can see these properties and the associated filenames in the output below. You will notice that the $MFT has been marked as one that shouldn’t be prefetched, which makes a lot of sense to not have stale data there. The other thing is that there are a couple DLL files that are referenced with XR because they are being requested to provide both executable code and non-executable resources.

Section B Records

This section has records that are much smaller, but there is so much more going on. The most exciting part to me is the bitfields that show a record of usage over the last eight program runs. You have probably seen these bitfields printed next to the file resource list of the python output when running the tool, but that data is not associated with either the filename in section C or the file metrics records in section A. These bitfields are actually tracking each of the block clusters in section B, so the output is actually a calculated value combined from all associated section B records. I will get to that later. Let’s build that property offset table first. These records have stayed the same over all versions of prefetch so far.

Section B record format

0 (4) next trace record number (-1 if last block in chain)
4 (4) memory block offset
8 (1) Flags1
9 (1) Flags2
10 (1) usage bitfield
11 (1) prefetched bitfield

The records in this section typically point to clusters of 8 512 blocks that are loaded from the file on disk. Most of the time, you will find the block offset property walking up in values of 8. It isn’t a requirement though, so you will find intervals smaller than that as well.

Here is an example of these records walking by 8.

Here is an example of one record jumping in after 2.

Here is an example of a couple sequential records, jumping only by 1.

I broke the two flag fields up early on just to be able to determine what was going on with each of them. What I found out was that Flags2 is always a value of 1. I haven’t seen this change ever. Without a change, it is very difficult to determine the meaning of this value and field. I have kept it separate still because of the no change.

The Flags1 field is similar to the Flags field that is found in the section A records. It holds values for the same purposes (XRD), though the number values representing those properties aren’t necessarily the same. It also has a property that forces a block cluster to be prefetched as long as it has been used at least once in the last eight runs. I will get into more later about the patterns of prefetching that I have observed, but for now let’s build the table for the properties and their values.

0x02    X    blocks are loaded as executable
0x04    R    blocks are loaded as resources
0x08    F    blocks are forced to be prefetched
0x01    D    blocks will not be prefetched

Now I get to show my favorite part: the bitfields for usage and prefetch. They are each single byte values that hold eight slots in the form of bits. Every time the parent program executes, the bits are all shifted to the left. If this block cluster is used or fetched, the right most bit gets a 1; otherwise it remains 0. When a block cluster usage bitfield ends up with all 0, that block record is removed and the chain is resettled without it.

Imagine yourself sitting in front of a scrabble tile holder. It is has the capacity to hold only eight tiles, and it is currently filled with all 0 tiles. Each time the program runs and that block cluster is used, you put a 1 tile on from the right side. If the program runs and the block cluster is not used, then you place a 0 tile. Either way, you are going to push a tile off the left side because it doesn’t have enough room to hold that ninth tile. That tile is now gone and forgotten.

Prefetch Patterns

The patterns listed below occur in section B since this is where the two bitfields are housed. Remember that these are for block clusters and not for entire files. Here are some various scenarios around the patterns that I have seen. The assumption is neither the D or F property assigned unless specified. Also, none of these are guaranteed, just that I have observed them and noted the pattern at some point.

Block with the F (force prefetch) property assigned, after 1 use on 8th run:
10000000    11111111

Block with the D (don’t prefetch) property assigned, after a few uses:
01001011    00000000

Block that is generally used, but missed on one:
11011111    11111111

Block on first use:
00000001    00000000

Block on second run, single use:
00000010    00000001

Block on third run, single use:
00000100    00000011

Block on fourth run, single use:
00001000    00000110

Block used every other run:
01010101    00111111

Block used multiple times, then not:
01110000    00111111

Block used multiple times, but only one use showing:
10000000    11100000

More Work

I am excited to see what else can be learned about these files. My hope is that some of you take this data to test it and break it. You don’t have to be the best DFIR person out there to do that. All you need is that drive to learn.

James Habben
@JamesHabben

Windows Prefetch: Overview of New Research in Sections A & B


The data stored in Prefetch trace files (those with a .pf extension) is a topic discussed quite a bit in digital forensics and incident response, and for good reason. It provides a great record of the executables that have been used, and Windows is configured to store them by default for workstation systems. In this article, I am going to add just a little bit more to the type of information that we can glean from one of these trace files.

File Format Review

The file format of Prefetch trace files has changed a bit over the years and those changes have generally included more information for us to take advantage of in our analysis. In Windows 10 for example, we were thrown a curve ball in that the prefetch trace files are now being stored compressed, for the most part.

The image below shows just the top portion of the trace files. The header and file information sections have been the recipient of the most version changes over the years. The sections following are labeled with letters as well as names according to Joachim’s document on the prefetch trace file format. The document does state that the name of section B is only based on what is known to this point, so it might change in the future. I hope that image isn’t too offensive. Drawing graphics is not a specialty of mine.

New Information, More Work

The information that I am writing about here is the result of many drawn out years and noncontiguous time of research. I have spent way too much time in IDA trying to analyze kernel level code (probably should just bite the bullet and learn WinDbg) and even more time watching patterns emerge as I stare deeply into the trace file contents. It is not fully baked, so I am hoping that what I explain here can lead to others, smarter than me, to run with this even further. I think there is more exciting things to be discovered still. I have added code to my fork of the windows-prefetch-parser python module, which I forked a while back to add SQLite output, and I will get a pull request into the main project in short time. This code adds just a bit of extra information in the standard display output, but there is also a -v option to get a full dump of the record parsing. (warning, lots of data)

File Usage – When

The first and major thing that I have determined is that we can get additional information about the files used (section C) in that we can specify which of the last 8 program executions took advantage of each file. We have to combine data from all three sections (A, B, and C) in order to get this more complete picture, something that the windows prefetcher refers to as a scenario. This can also help to explain why files can show up in trace files and randomly disappear some time later. Take a look at this image for a second.

This trace file is for Programmer’s Notepad (pn.exe) and was executed on a Windows 8 virtual machine. I created several small, unique text files to have distinct records for each program execution. I used the command line to execute pn.exe while passing it the name of each of those text files. I piped the output into grep to minimize the display data for easier understanding here.

There are two groups of 8 digits, and these are a bitfield. The left group represents the program triggering a page fault (soft or hard) to request data from the file. The right group represents the prefetcher doing a proactive grab of the data from that file, as this is the whole point to have data ready for the soft fault and to prevent the much more costly hard fault. In typical binary representation, a zero is false and a one is true. Each time the program is executed, these fields  are bitshifted to the left. This makes the right side the most recent execution and each column working left is the scenario prior, going up to eight total.

If you focus on an imaginary single file being used by an imaginary program, the bitfield would look like this over eight runs.
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000

What happens after eight runs? I am glad you asked. If the value of this bitfield ends up being all zero’s, the file is removed from section C, and all associated records are removed from sections A and B. Interestingly, the file is not removed from the layout.ini file that sits beside all these trace files; not immediately, from what I have been able to determine.

If the file gets used again before that 1 gets pushed out, then the sections referencing that file will remain in the trace file.
00000001
00000010
00000100
00001000
00010001
00100010
01000100
10001000
00010000
00100001
01000010
10000100
00001000
etc.

File Usage – How

The second part, and the one that needs more research, is how this file was used by the executing program. There are some flag fields in both section A and B that provide a few values that have stuck out to me. There are other values that I have observed in these flag fields as well, but I have not been able to make a full determination about their designation yet.

The flag field that I have focused on is housed in section A. The three values that I have found purpose behind seem to represent 1) if a file was used to import executable code, 2) if the file was used just to reference some data, perhaps strings or constants, and 3) if the file was requested to not be prefetched. You will mostly see DLL files with the executable flag, although there are some that are referenced as a resource. You will find most of the other files being used as a resource.

In the output of windowsprefetch, I have indicated these properties as follows:
X    Executable code
R    Resource data
D    Don’t Prefetch

See some examples of these properties in the output below from pn.exe.

More Tech to Follow

I am going to stop this post here because I wanted this to be more of a higher level overview about the ways we can use these properties. I will be writing another blog post that gets into a little more gory detail of the records for those that might be interested.

Please help the community in this by testing the tool and the data that I am presenting here. Samples are in the GitHub repo. This has all been my own research, and we need to validate my findings or correct my mistakes. Take a few minutes to explore some of your system’s prefetch files.

You can comment below, DM me on twitter, or email me first@last.net if you have feedback. Thanks for reading!

James Habben
@JamesHabben

CCM_RecentlyUsedApps Properties & Forensics

UPDATE 2017-04-03: Unicode strings are used when needed. See the update post.

You can uncover an artifact from the deepest and darkest depths of an operating system and build a tool to rip it apart for analysis, but if everybody stares at it with a confused look on their faces it won’t gain acceptance and no one will use this new thing you did. Something about forensics, Daubert, Frye, etc., not to mention plain reasoning.

With that said, this post is a followup to my previous post about the Python and EnScript carving tools that can be used to analyze data from the WMI repository database, and more specifically, the class CCM_RecentlyUsedApps that is contained within. That post was about the structure of the records, and how to locate and then parse the meaningful data into property lists. This post is about what these properties mean and how they can be used.

Header Data

The indexing of the WMI repository uses hashes to better store and locate the various namespaces and classes in the file. These hashes are placed at the beginning of each of these records. The way the hashes are calculated are discussed in the previous post.

There are two date properties that are part of the record header, in the Microsoft FileTime format that occupies 8 bytes each. Both of these dates are stored in UTC. With these dates being part of the record header, they will be found on records in all types of classes, not just those being used with the CCM_RecentlyUsedApps tracking.

Timestamp1 indicates the last date the system had some sort of checkin or assessment from the SCCM server. It will be the same for all actively allocated records. You will very likely find previous dates on some records when using the carving method since there are records that get deallocated but not overwritten. The systems that I have analyzed these artifacts from have all had roughly a week between the various dates. I suspect this is a configuration setting that an SCCM admin would be able to modify.

Timestamp2 seems to indicate when the system was last initiated to join SCCM. This will be the same for all records, even with the carving method. The only reason this date would change on some records, was if the system was removed from being managed by SCCM and then joined again. This date has always lined up well, in my research and investigations, with other artifacts that support an action of joining an SCCM management group, such as services being created or drivers installed.

Numeric Record Data

There are 3 numeric properties stored in the record data: Filesize, ProductLanguage, and LaunchCount. None of these are going to sound any alarms on their own, but they can help paint the picture when combined with the rest of the properties.

Filesize is a four byte field that tracks the bytes of the executable for the record. Depending on if the developer used a signed type or unsigned, four bytes has a max value of 4GiB (unsigned) or 2GiB (signed). If you have a bunch of Adobe products on your systems, you might run into these size limitations, but every other program should be just fine for now. This field is end capped by other properties/offsets on both sides, so it’s not a question of reverse engineering (guessing) as how big it is. It is four bytes.

ProductLanguage is a four byte field that holds an integer related to the language designed by the developer. This sounds like a good possibility for filtering, but I have found tons of legitimate programs that have 0 for this field. I regularly see both 0 and 1033 on the systems I have analyzed.

LaunchCount is a four byte field that holds an integer representing the number of times this executable has been run on this system. I have seen programs with five digit decimal numbers on some systems! This won’t be common because one of the string fields tracked is the version of the binary. New version number, completely new record. Unlike Windows Prefetch, you won’t find a ton of articles written by idiots telling the world to delete all data associated with CCM_RecentlyUsedApps. Give it a couple months.

String Record Data

I don’t want to list out every one of the string properties here since many of them are really quite self-explanatory. I want to touch on a few that would either be very helpful or have some caveats that go with them. If any one of these properties were to change value for a binary, there will be a whole new record created for the new data.

ExplorerFilename is the name of the binary as it is seen by the filesystem. If this name changes, there will be a new record as stated above.

OriginalFilename is one of many strings that come from the properties contained in the binary data, usually towards the end of the file. You might think that comparing this field to the ExplorerFilename would be a good way of filtering your data down to those suspicious binaries, and I would applaud you for the thought process of getting there (that is getting into the threat hunting mindset). The reality is that there are a ton of legitimate programs distributed through legitimate channels that were compiled into a different filename than how it was packaged up before sending to you. (Slack, I am looking at you) It is one method of trying to digest this data that can lead to good findings, but it isn’t going to do your job for you. Many of the native Windows binaries have a ‘.mui’ appended after the ‘.exe’ in this field, just to throw us all off a bit.

LastUsedTime is a date time value stored as a string. The format is yyyyMMddHHmmss.000000+000, and I have not seen any timezones applied on any of the systems I have analyzed. There is a caveat with this property. The time recorded is the last time the program was running. Effectively, it is the last time the program was shutdown. I have confirmed this many times by multiple sources. One source is the log file created from our automated collection script, and I am able to lineup this timestamp with the end of the tool every time.

FilePropertiesHash is a great property when it exists. I haven’t been able to determine why, but some systems have a value filled in while others don’t. It is consistent within an environment in that all systems from a given customer either have it or don’t have it. The hash is in SHA1, and it is a hash of the binary data.

SoftwarePropertiesHash is a hash of something, but it is not the binary data. Also, it isn’t always there, though it tends to show up if the ‘msi’ prefix fields have values. I have had many records that have the FilePropertiesHash, but the SoftwarePropertiesHash is empty.

FolderPath has been an accurate property telling where the binary existed when it was executed. If the binary is moved, this record will become stale as a new one is created with the new path.

LastUserName tracks what appears to be the user account that was used to execute. I would still like to validate this a bit further, however. Every record that I have identified as critical to a case has been backed up by other artifacts showing this username executed the file. It may be the last user to have authenticated on the system before this executable was run, but I have not run into that scenario in order to dis/prove. Please let me know if you find this means otherwise.

Analysis Considerations

A few of my thoughts about analyzing this data. Please share your own.

Blanks

Many of the properties come from the section of the executable that stores properties about the program: CompanyName, FileDescription, FileVersion, etc. You might think that malware authors are lazy and leave these fields empty because they serve no purpose, and you would be correct part of the time. Looking for blanks can be one method, but it is not a guarantee. A few points:

Don’t assume all malware authors are lazy
Some malware these fields filled with legitimate looking data – #opsec
Remember that many attackers use the ‘Live off the land’ method of using what exists on the system
Many legitimate programs will leave these fields empty

Some legitimate programs I have run across in my analysis of this CCM_RecentlyUsedApps data that have blank fields are pretty surprising. These programs have been in categories across the board. I thought about providing a list of these executable names, but some are a bit sensitive. Instead, here is a list of some categorically.

Python binaries
Anti-virus main and secondary tools
Point Of Sale main and updater programs
Tons of DFIR tools
Java
Google Chrome secondary tools
Driver installers

On the opposite side, I have seen some advanced malware use these properties very strategically. There was one that even properly used the FileVersion field. I found records from different systems and places that showed 3 incriminating versions that were active on the network.

Name or Path

I noted this above, but keep in mind if after running an executable at least once that even a single character changes for either the name or path, the previous record is alienated and a new one is created. With the assumption that no data and only the name or path changed, the FilePropertiesHash can be used to find identical binaries.

Large Scale Aggregated Data

I designed the EnScript to be run against any number of systems and output the results to a single file. This gives the investigator the ability to perform analysis against the data in aggregation. Importing this data into a relational database (MSSQL, MySQL, SQLite, etc) gives a huge advantage when analyzing this data at scale. Outliers can be quickly identified through a number of different techniques.

For example, a simple ‘group by’ query that counts the number of systems that each executable has been run on can really jump start the findings.
Select distinct ExplorerFilename, FolderPath, count(EvFilename) as SystemCount
From tablename
group by ExplorerFilename, FolderPath
order by SystemCount

Excel pivot tables can provide similar analysis, though not quite as flexible.

I hope this is able to help some of you track things down a bit faster. We as an industry can use any help we can get to reduce the time between detection and remediation.

James Habben
@JamesHabben

Secret Archives of Execution Evidence: CCM_RecentlyUsedApps

UPDATE 2017-04-03: Unicode strings are used when needed. See the update post.

I seem to be running into more and more systems that have Windows Prefetch disabled for one reason or another. It is especially frustrating for me as a consultant since I cannot make the changes necessary to enforce the creation of the trace files nor can I implement any kind of central logging. Without this digital forensic artifact, it becomes increasingly difficult to build out a timeline of events across all the systems involved in an incident response.

One of the evidence sources that has shown itself over and over comes from a connection with a Microsoft SCCM server. SCCM has the ability to collect inventory data from many sources, and tracking executables launching is one. This feature isn’t turned on by default to have the SCCM server collect this data; however, the logging occurs on the endpoints regardless of the settings that are configured on the server.

If you search for CCM_RecentlyUsedApps, you will find tons of articles about configuring SCCM to collect this data or how to perform queries to extract the collected data. If you have the ability to push this in your organization, I say do it! If you can’t, then read on so I can show you how to take advantage of this data anyways.

Data Source

The records holding the information behind CCM_RecentlyUsedApps are stored in the collection of files that make up the database behind WMI. The locations are consistent from Windows XP through Windows 10, and you will find them here:
c:\windows\system32\wbem\repository\
c:\windows\system32\wbem\repository\fs\

I have even seen some systems that have what appears to be an old version of the WMI database. It seems to roll like the Windows Registry controlset keys. When the rebuild process kicks off, a new version of the database is built and it does not carry the previous information with it. I have seen up to 003, but it would likely go further. The previous versions look like this:
c:\windows\system32\wbem\repository.001\
c:\windows\system32\wbem\repository.001\fs\

This specific artifact was a very critical piece in a previous case. It allowed us to narrow the time window of the compromise to be much more specific. Even a single day of exposure can make a big difference in the fines against the victim company during a PCI Forensic Investigation (PFI).

You will see a handful of files in these locations. They are all used to link all the various records together to properly parse these. The guys at FireEye did some work on reverse engineering this database and released a python script to extract all of the available classes and namespaces. You can find their tool here:
https://github.com/fireeye/flare-wmi/tree/master/python-cim

Using this script, you can extract this data using these parameters:
Namespace: root\ccm\SoftwareMeteringAgent
Class: CCM_RecentlyUsedApps

This script was very helpful to me in a number of previous cases, although I have to mention that it is a bit of a pain to get installed properly. The other trouble that I ran into with this script, by no fault of the FireEye team, is that it can only parse the namespaces from the database if the data is not ‘corrupted’. I have found that imaging a live system can cause ‘corruption’ almost half of the time. It is frustrating to know that there are Indicator Of Compromise (IOC) hits inside that data blob, but the data won’t allow for the parsing.

Different Approach

As I manually looked over those seemingly lost IOC hits, I started to recognize patterns surrounding the hits. The fields holding all the property data seemed to be in the same order for all of the records of a certain system that I was reviewing at the time. I then pulled up a few systems with different OS’s from previous cases and found the same structure. YES!! The perfect setup for carving. Time to reverse engineer the record format.

The index uses a hash value in tracking and sorting structures that I won’t bore you with here. I mention though, because this hash is the piece that we will use to find these records. WinXP uses MD5 and newer uses SHA256. The hash in these records is generated from the class name CCM_RecentlyUsedApps, only the text needs to be upper cased as CCM_RECENTLYUSEDAPPS, and then converted to Unicode C\x00C\x00M\x00_\x00R\x00… (and you get the point).
WinXP MD5:
6FA62F462BEF740F820D72D9250D743C
WinVista+ SHA256:
7C261551B264D35E30A7FA29C75283DAE04BBA71DBE8F5E553F7AD381B406DD8

These hashes are what start the records. They are stored in Unicode themselves, for some reason. 128 bytes for the SHA256 and 64 bytes for the MD5.

The next 16 bytes following the hash are two 8 byte FileTimes.

After that will be 2 bytes to tell you the size of the data portion of this record. I have not seen any records using more than 2 bytes and the max size of 2 bytes is either 65,535 unsigned or 32,767 signed. Either of those provide plenty of space for this data, so I wouldn’t expect it to expand for size purposes. The data portion of the record includes these 2 bytes.

You can see on the right in the screenshot above that the size of the data is 432. You can then see at the bottom that I have highlighted 432 bytes (Sel 432 [1B0h]). You can also see another ‘7C261…’ starting immediately after my selection, although don’t let this fool you into thinking that these records will always be contiguous.

From here, the data is broken into 2 sections. The first section consists of various 4 byte fields with some being offsets and others being property values. The second section contains all the string based property values separated by double 0x00 bytes.

There are 3 values we can extract from the number section that are helpful.
Filesize
Offsets: Vista 178d (128+16+34), XP 114d (64+16+34)

ProductLanguage
Offsets: Vista 194d (128+16+50), XP 130d (64+16+50)

LaunchCount
Offsets: Vista 202d (128+16+58), XP 138d (64+16+58)

The string section always starts with ‘CCM_RecentlyUsedApps’ and is followed by the double 0x00 separator. If there are 4 bytes of 0x00 following, then the next string field is null. If there are 6 bytes of 0x00, then the next 2 string fields are null. Follow the pattern?

The string properties are listed in the following order:
ClassName (always “CCM_RecentlyUsedApps”)
AdditionalProductCodes
CompanyName
ExplorerFilename
FileDescription
FilePropertiesHash
FileVersion
FolderPath
LastUsedTime
LastUsername
MsiDisplayName
MsiPublisher
MsiVersion
OriginalFilename
ProductCode
ProductName
ProductVersion
SoftwarePropertiesHash

There will only be a single 0x00 at the very end of the record. Wasn’t that easy?

New Python Tool

After I determined these structures, I was chatting with Willi Ballenthin since he was involved in the research of the database structure. He said something like “that tool sounds pretty neat” and then followed up saying “possibly similar to this” and pointed me to a blog post by David Pany at FireEye.
https://www.fireeye.com/blog/threat-research/2016/12/do_you_see_what_icc.html

Sure enough, David beat me to it with a python script to search for the classname hashes and parse the record structure. The good news is that we arrived at the same basic approach and record structures. Validation is always nice. His python script is on GitHub here:
https://github.com/davidpany/WMI_Forensics/blob/master/CCM_RUA_Finder.py

I have had some trouble running this python script against my systems, but I haven’t spent the time to determine the cause. The output is a CSV file, but I don’t have any screenshots to show because of the errors I ran into.

New EnScript Tool

I decided to write this approach in EnScript. My cases have involved upwards of 500 systems for analysis. Using a python based approach would force me to either extract all those files, or use a mounting or parsing solution to expose the files. By using EnScript in EnCase v7 or v8, I can run the EnScript over all system images with one pass. I was able to successfully do this in testing on a recent case with 73 systems in the same case. EnCase proved to be a powerful tool in this specific scenario.

The EnScript starts off with a GUI to give you the option of running against all files in the case or a smaller subset designated by a blue check or tag selection.

I found records existing in OBJECTS.DATA and INDEX.BTR files. Some seem to be in areas of the file that have been deallocated from the active records of the database. Additionally, I have found quite a large number of records in the PAGEFILE.SYS file as well. You will see a selection option in the GUI for these common filenames.

The output of this EnScript is a CSV file. It includes a few columns in addition to the properties that were parsed from the records: evidence filename to indicate the system source, item path to show which file it was found in, and file offset to manually validate the data later if needed.

I encourage you to use Excel’s data deduplication function since I ran into a number of bugs in EnCase trying to make this EnScript work. There are some hacky workarounds in the code currently. Dedupe on all columns except item path and file offset. This will remove dupes that are found in both pagefile.sys and objects.data files.

I suspect we might be able to pull some of these records from unallocated clusters, but I haven’t found any there yet. Please let me know if you do!

You can grab the latest version of the EnScript on GitHub:
https://github.com/JamesHabben/ccm-rua-enscript

See the followup post about the forensic meanings.

James Habben
@JamesHabben

Know Your Network

Do you know what is on your network?  Do you have a record of truth like DHCP logs for connected devices?  How do you monitor for unauthorized devices?  What happens if none of this information is currently available?

Nathan Crews @crewsnw1 and Tanner Payne @payneman at the Security Onion Conference 2016 presented on Simplifying Home Security with CHIVE that will definitely help those with Security Onion deployed answer these questions.  Well worth the watch: https://youtu.be/zBDAjNnRiQI

My objective is to create a Python script that helps with the identification of devices on the network using Nmap with limited configuration.  I want to be able to drop a virtual machine or Raspberry Pi onto a network segment that will perform the discovery scans every minute using a cron job.  Generating output that can be easily consumed by a SIEM for monitoring.
I use the netifaces package to determine the network address that was assigned to the device for the discovery scans.
I use the netaddr package to generate the network cidr format that the Nmap syntax uses for scanning subnet ranges.
The script will be executed from cron thus running as the root account, so important to provide absolute paths.  Nmap also needs permission to listen to network responses that is possible at this permission level too.
I take the multi-line native Nmap output and consolidate it down to single lines.  The derived fields are defined by equals (=) for the labels and pipes (|) to separate the values.  I parse out the scan start date, scanner IP address, identified device IP address, identified device MAC address and the vendor associated with the MAC address.
I ship the export.txt file to Loggly (https://www.loggly.com) for parsing and alerting as that allows me to focus on the analysis not the administration.

The full script can be found on GitHub:  https://gist.github.com/jblukach/c67c8695033ad276b4836bea58669958

John Lukach
@jblukach

GUIs are Hard – Python to the Rescue – Part 1

I consider myself an equal opportunity user of tools, but in the same respect I am also an equal opportunity critic of tools. There are both commercial and open source digital forensic and security tools that do a lot of things well, and a lot of things not so well. What makes for a good DFIR examiner is the ability to sort through the marketing fluff to learn what these tools can truly do and also figure out what they can do very well.

One of the things that I find limiting in many of the tools is the Graphical User Interface (GUI). We deal with a huge amount of data, and we sometimes analyze it in ways we couldn’t have predicted ourselves. GUI tools make a lot of tasks easy, but they can also make some of the simplest tasks seem impossible.

My recommendation? Every tool should offer output in a few formats: CSV, JSON, SQLite. Give me the ability to go primal!

Tool of the Day

I have had a number of cases lately that have started as ‘malware’ cases. Evil traffic tripped an alarm, and that means there must be malware on the disk. It shouldn’t be surprising to you, as a DFIR examiner, that this is not always the case. Sometimes there is a bonehead user browsing stupid websites.

Internet Evidence Finder (IEF) is the best tool I have available right now to parse and carve the broad variety of internet activity artifacts from the drive images. It does a pretty good job searching over the disk to find browser, web app, and website artifacts (though I don’t know exactly which versions are supported due to documentation, but I will digress in a different post).

Let me cover some of IEF’s basic storage structure that I have worked out first. The artifacts are stored in SQLite format in the folder that you designate, and it is named ‘IEFv6.db’. Every artifact type that is found creates at least one table in the DB. Because each artifact has different properties, each of the tables have a different schema. Some good things that the dev team at Magnet seem to have decided on do allow for some kind of consistency. If the column has URL data in it, then the column name has ‘URL’ in it. Similarly for dates, the column name will have ‘date’ in it.

IEF provides a search function that allows you to do some basic string searching, or you can get a bit more advanced by providing a RegEx (GREP) pattern for the search. When you kick off this search, IEF creates a new SQLite db file named ‘Search.db’ and it is stored in the same folder. You can only have one search completed at a time since kicking off a new search will cause IEF to overwrite any previous db that was created. The search db, from what i can tell anyways, seems to have an identical schema structure as the main db, only it has been filtered down on the number of records that it holds based on the keywords or patterns that you provided.

There is another feature called filter, and I will admit that I have only recently found this. It allows you to apply various criteria to the dataset, with the major one being a date range. There are other things you can filter one, but I haven’t needed to explore those just yet. When you kick off this process, you end up with yet another SQLite database filled with a reduced number of records based on the criteria and again it seems identical in schema as the main db. This one is named ‘filter.db’ and indicates that the dev team doesn’t have much creativity. 😉

Problem of the Month

The major issue I have with the tool is in the way it presents data to me. The interface has a great way of digging into forensic artifacts as they are categorized and divided by the artifact type. You can dig into the nitty gritty details of each browser artifact. For the cases that I have used IEF for lately, and I suspect many of you in your Incident Response cases as well, I really don’t actually care *which* browser was the source of the traffic. I just need to know if that URL was browsed by bonehead so I can get him fired and move on. Too harsh? 🙂

IEF doesn’t give you the ability to have a view where all of the URLs are consolidated. You have to click, click, click down through the many artifacts, and look through tons of duplicate URLs. The problem behind this is on the design of the artifact storage in multiple tables with different schemas in a relational database. A document based database, such as MongoDB, would have provided an easier search approach, but there are trade-offs that I don’t need to tangent on here. I will just say that there is no 100% clear winner.

To perform a search over multiple tables in a SQL based DB, you have to implement it in some kind of program code because a SQL query is almost impossible to construct. SQLite makes it even more difficult with its reduced list of native functions and it’s lack of ability to create any user-defined functions or stored procedures. It just wasn’t meant for that. IEF handles this task for the search and filter process in c# code, and creates those new DB files as a sort of cache mechanism.

Solution of the Year

Alright, I am sensationalizing my own work a bit too much, but it is easy to do when it makes your work so much easier. That is the case with the python script I am showing you here. It was born out of necessity and tweaked to meet my needs for different cases. This one has saved me a lot of time, and I want to share it with you.

This python script can take an input (-i) of any 3 of the IEF database files that I mentioned above since they share schema structures. The output (-o) is another SQLite database file (I know, like you need another one in addition) in the location of your choosing.

The search (-s) parameter allow you to provide a string to filter the records on based upon that string being present in one of the URL fields of the record being transferred. I added this one because the search function of IEF doesn’t allow me to direct the keyword at a URL field. I had results from my keywords that were hitting on several other metadata fields that I had no interest in.

The limit (-l) parameter was added because of a bug I found in IEF with some of the artifacts. I think it was mainly in the carved artifacts so I really can’t fault too much, but it was causing a size and time issue for me. The bug is that the URL field for a number of records was pushing over 3 million characters long. Let me remind you that each character in ASCII is a byte, and having 3 million of those creates a URL that is 3 megabytes in size. Keep in mind that URLs are allowed to be Unicode, so go ahead and x2 that. I found that most browsers start choking if you give them a URL over 2000 characters, so I decided to cutoff the URL field at 4000 by default to give just a little wiggle room. Magnet is aware of this and will hopefully solve the issue in an upcoming version.

This python script will open the IEF DB file and work its way through each of the tables to look for any columns that have ‘URL’ in the name. If one is found, it will grab the type of the artifact and the value of the URL to create a new record in the new DB file. Some of the records in the IEF artifacts have multiple URL fields, and this will take each one of them into the new file as a simple URL value. The source column is the name of the table (artifact type) and the name of the column of where that value came from.

This post has gotten rather long, so this will be the end of part 1. In part 2, I will go through the new DB structure to explain the SQL views that are created and then walk through some of the code in Python to see how things are done.

In the meantime, you can download the ief-find-url.py Python script and take a look for yourself. You will have to supply your own IEF.

James Habben
@JamesHabben