Critical Stack provides a free threat intelligence aggregation feed through their Intel Market for consumption by the Bronetwork security monitoring platform. This is a fantastic service that is provided for free!! Special thanks to those who have contributed their feeds for all to take advantage of the benefits!! Installation is beyond the scope of this post as it is super easy with decent documentation available on their website. The feed updates run roughly hourly by default into a tab delimited file available on disk.
My goal was to make the IP address, domain and hash values accessible through a web interface for consumption by other tools in your security stack. Additionally, I didn’t want to create another database structure but be able to read the values into memory for comparison on script restarts. Decided to use Twisted Python by Twisted Matrix Labs to create the web server. Twisted is an event-driven networking engine written in Python. The script provides a basic foundation without entering into the format debate between STIX and JSON. Kept it simple…
Twisted Python Installation
The following installation steps work on Ubuntu 14.04 as that is my preference.
apt-get install build-essential python-setuptools python-dev python-pip
pip install service_identity
bzip2 -d Twisted-15.5.0.tar.bz2
tar -xvf Twisted-15.5.0.tar
python setup.py install
The PIP package installation allows for the future usage of SSL and SSH capabilities in Twisted.
The default installation file and path containing the Critical Stack Intel Feed artifacts.
The field separator on each line that gets loaded into the Python list in memory.
The output that gets displayed on the dynamically generated web page based on user input.
The port that the web server runs on for the end-user to access the web page.
The TwistedIntel.py script can be used after execution by browsing to the website with an IP address, domain or hash value provided in the path. If the result returns FOUND that means it is part of the Critical Stack Intel Feed.
Feel free to change the code to meet your needs and really appreciate any contributions back to the DFIR community.
- TwistedIntel2.py displays the feed that an IP address, domain, or hash originated.
- Upstart configuration file for running the Twisted Python script at startup.
- Crontab configuration that restarts the script hourly after Critical Stack Intel updates.
Last week, John Lukach put up a post about how to use some tools to find pieces of files left behind after being deleted, and even partially overwritten. I wanted to put together a short post to give a practical use of this technique for a recent case of mine.
The request came in as a result of an anomaly detected in traffic patterns. It looked like a user had uploaded a significant amount of data to a specific cloud storage website. My client identified the suspected machine, and got an image to preserve the data. Unfortunately, this anomaly wasn’t caught right away, so a couple months passed before action was taken. They took a pass on the image with some forensic tools, but they weren’t able to identify anything relating to this detection, so they asked me for a second look.
I started off with a standard pass of the forensic tools. Not because I don’t trust their team, but because tool versions can have an effect on the artifacts that get extracted and I wanted to be sure of my findings. Besides, if you don’t charge for machine time, then this really doesn’t add much in the way of cost in the end. This process did two things for me. The first thing I got from it was that I could not find any artifacts related to the designated website. The second thing was that the history and cache looked very normal. In other words, it didn’t look like the user had cleared any history or cache in an effort to clean up after themselves.
Of course, I am now thinking that they could have done a targeted cleanup and deleted very specific items from their browser. I get some reassurance that this was not the case by usage of some of the deep scan options available in the forensic tools that I used. These tools will scrounge through unallocated clusters, on command, in search of data patterns that are potential matches for deleted and lost internet artifacts. I found none, again, in this more extensive search.
Let’s put on our tin foil hats now, and really go for it. The user could have wiped and deleted individual files. Then gone into the history and cache lists to remove the pointers of those files. This would take care of the files that are still currently allocated. What about all those files that get cataloged in the cache, but quickly discarded because the server instructed this by sending some cache control headers? These files would have been lost to unallocated clusters at some point before the user thought about their cleanup actions. The only way to cover those track would be to use a tool that can do wiping of unallocated clusters. If the user went to this extent, dare I say they deserve to get away with it?
My client had a secondary thought in the event that we were not able to find traces of user action involved with this detection. They wanted to see if the system was infected with some malware that would have generated this traffic. I did the standard AV scans with multiple vendors and didn’t find anything. I followed that up with a review of the various registry locations that allow for malware to persist and autostart.
Verizon has a great intel team, and I asked them for information on possible campaigns involving the designated website. They gave me some great leads and IOCs to search for, but they did not pan out in this case. What a great resource to have!
After reviewing all of this, I was not able to find any indications of a malware infection, much less one that was capable of performing the suspected actions.
Points to Disprove
Here is the point where block hashing can make a real difference. I will show you how I used block hashing to disprove (as much as the available data will allow) three different points.
- Lost Files – User deleted individual history and cache entries to cleanup their tracks and the file system lost track of the related files
- Unallocated – User went psycho-nutjob-crazy and wiped unallocated clusters after deleting records
- Malware – User wasn’t involved and malware done the dirty deed
Disprove Lost Files
In order to prove the user did not delete these selective entries from the browser history and cache, I need to find the files on disk, most likely in unallocated clusters. The file system no longer has metadata about them, so the forensic tools will not discover them as deleted or orphaned files. I could carve files from unallocated clusters using the known headers of various file types, but this is a very broad approach. I want to identify specific files related to this website, not discover any and all pictures from unallocated.
I start by using HTTrack to crawl the designated website. I want to pull as many files down as I can. These are all the files that a browser would see, and download, during normal interactions with the website. I got a collection of a few hundred files of various types: JPEG, GIF, SWF, HTML, etc.
The next step is to split and block hash these files. On this step, I used EnCase and the File Block Hash Map Analysis (FBHMA) Enscript written by Simon Key of EnCase Training. FBHMA has the ability to do both sides of the block hashing technique and presents an awesome graphic for partially located files. I applied it against my collected files, and then applied those hashes against the rest of the disk. The result was zero matches.
This technique is not affected by the amount of fragmentation or the amount of overwriting, as long as there are some pieces left behind. The only way to beat this technique is to completely overwrite the entire set of blocks for all of the files in question. A very unlikely scenario in such short time without a deliberate action.
This point takes a little more work to disprove, and it might be subject to your own interpretation. The idea here is that files have been lost to unallocated clusters without control. The user was not able to do a targeted wipe of the file contents before deletion, so the entire area of unallocated clusters would have to be wiped to assure cleanliness.
The user could make a pass with a tool that overwrites every cluster with 0x00 data. This is enough to clear the data from forensic tools, but it can leave behind a pretty suspicious trail. If you used block hashing against this, it would quickly become apparent that the area was zeroed out. Bulk Extractor provides some entropy calculations that make quick work of this scenario.
The other option, and more likely to be used, is to have the wipe tool write random data to the cluster. Most casual and even ‘advanced’ computer users picture common files like zip and JPEG to be a messy clump of random data thrown together in some magic way that draws pictures or spells words. In DFIR, we learn early on that seemingly random data is not actually random. There are recognizable structures in most files, and unless the file involves some type of compression, it is not truly random when measured in terms of entropy. My point in all of this is that a wiping pass that writes random data would cause every cluster of unallocated to show high entropy values. This would raise eyebrows because this is not normal behavior. Some blocks contain compression or encryption and would show high entropy, but others are plain text which registers rather low on the entropy scale.
So with a single pass of Bulk Extractor and having it calculate hashes and entropy, I was able to determine that 1) unallocated clusters was not zeroed out and 2) unallocated clusters was not completely and truly random. It, again, looked very normal.
The last point to disprove was the existence of malware on the system. I already established that there was no indications showing of allocated malware, but I can’t ignore the possibility of there having been some malware installed which later got deleted or uninstalled, for some reason. Again, without a full and controlled wiping of this data off the drive, it would leave artifacts in unallocated clusters for me to find with block hashing.
This step is a much larger undertaking. It’s because I am employing the full 20+ million samples that are generously shared by VirusShare.com and users. I didn’t to do all of the work, though, because John has done that for us all. He has a GitHub repo with all the block hashes from VirusShare. Just be sure to remove the NSRL block hashes since those malware authors can be quite lazy in copying code from other executable files. It’s a beast to get up and running, but well worth the effort. Maintaining it is much easier after it’s built.
Now I have a huge data set of known malware, adware, and spyware sample files, and various payload transports like DOC and PDF files. If there was anything bad on this system, my data set is very likely to find it. The result was no samples identifying more than a single block on the disk. You have to allow for a threshold of matches with this size of sample data. A single matching block from a possible 1000 blocks for sample file of 500kb is not interesting.
Value of Blocks
This turned out to be a rather long post, but oh well. I hope this helps you to see the value in the various techniques of applying block level analysis in your cases. The incoming data in our cases is only getting larger, and we need to be smart about how we analyze it. Block hashing is just one way of letting our machines do the heavy work. Don’t be afraid to let your machine burn for a while.
You may be thinking to yourself that this is not a perfect process when it came to the malware point, and I may agree with you. We are advancing our tools, but we need to advance our understanding and interpretation of the results as well. For now, we handle that interpretation as investigators but the tools need to catch up. Harlan recently posted about this as a reflection on OSDFcon and it’s worth a read and consideration.
In DFIR practices, we use hash algorithms to identify and validate data of all types. The typical use is applying them against an entire file, and we get a value back that represents that file as a whole. We can then use those hash values to search for Indicators of Compromise (IOC) or even eliminate files that are known to be safe as indicated by collections such as National Software Reference Library (NSRL). In this post, however, I am going to apply these hashes in a different manner.
The complete file will be broken down into smaller chunks and hashed for identification. You will primarily have two types of blocks, a cluster and a sector. A cluster block will be tied to the operating system where sector blocks corresponds to the physical disk. For example Microsoft Windows by default has a cluster size of 4,096 that is made up of eight 512 sectors that is common across many operating systems. Sectors are the smallest area on the disk that can be used providing the most accuracy for block hunting.
Here are the block hunting techniques I will demonstrate:
- locate sectors holding identifiable data
- determine if a file has previously existed
I will walk you through the command line process, and then provide links to a super nice GUI. As an extra bonus, I will tell you about some pre-built sector block databases.
Empty Image or Not
If you haven’t already, at some point you will receive an image that appears to be nothing but zeroes. Who wants to scroll through terabytes of unallocated space looking for data? A quick way to triage the image is to use bulk_extractor to identify known artifacts such as internet history, network packets, carved files, keywords and much more. What happens if the artifacts are fragmented or unrecognizable?
This is where sector hashing with bulk_extractor in conjunction with hashdb comes in handy to quickly find identifiable data. A lot of great features are being added on a regular basis, so make sure you are always using the most current versions found at: http://digitalcorpora.org/downloads/hashdb/experimental/
The following command will be used for both block hunting techniques.
bulk_extractor -e hashdb -o Out -S hashdb_mode=import -S hashdb_import_repository_name=Unknown -S hashdb_block_size=512 -S hashdb_import_sector_size=512 USB.dd
- bulk_extractor – executed application
- -e hashdb – enables usage of the hashdb application
- -o Out – user defined output folder created by bulk_extractor
- -S hashdb_mode=import – generates the hashdb database
- -S hashdb_import_repository=Unknown – user defined hashdb repository name
- -S hashdb_block_size=512 – size of block data to read
- -S hahsdb_import_sector_size=512 – size of block hash to import
- USB.dd – disk image to process
Inside the Out folder that was declared by the -o option, you will find a hashdb.hdb database folder that is generated. Running the next command will extract the collected hashes into dfxml format for review.
hashdb export hashdb.hdb out.dfxml
Identifying Non-Zero Sectors
The dfxml output will provide the offset in the image where a non-low entropy sector block was identified. This is important to help limit to false positives where a low value block could appear across multiple good and evil files. Entropy is the measurement of randomness. An example of an low entropy block would be one containing all 0x00 or 0xFF data for the entire sector.
Here is what the dfxml file will contain for an identified block.
Use your favorite hex editor or forensic software to review the contents of the identified sectors for recognizable characteristics. Now we have identified that the drive image isn’t empty that didn’t require a large amount of manual effort. Just don’t tell my boss, and I won’t tell yours!
Deleted & Fragmented File Recovery
Occasionally, I receive a request to determine if a file has ever existed on a drive. This file could be intellectual property, customer list or a malicious executable. If the file is allocated, this can be done in short order. If the file doesn’t exist in the file system, it will be nearly impossible to find without a specialized technique. In order for this process to work, you must have a copy of the file that can be used to generate the sector hashdb database.
This command will generate a hashdb.hdb database of the BadFile.zip designated for recovery.
bulk_extractor -e hashdb -o BadFileOut -S hashdb_mode=import -S hashdb_import_repository_name=BadFile -S hashdb_block_size=512 -S hashdb_import_sector_size=512 BadFile.zip
The data will be used for scrubbing our drive of interest to run the comparisons. I am targeting a single file, but the command above can be applied to multiple files inside subfolders by using the -R option against a specific folder.
The technique will be able to identify blocks of a deleted file, as long as they haven’t been overwritten. It doesn’t even matter how fragmented the file was when it was allocated. In order to use the previously generated hashdb database to identify the file (or files) that we put into it, we need to switch the hashdb_mode from import to scan.
bulk_extractor -e hashdb -S hashdb_mode=scan -S hashdb_scan_path_or_socket=hashdb.hdb -S hashdb_block_size=512 -o USBOut USB.dd
Inside the USBOut output folder, there is a text file called identified_blocks.txt that records the matching hashes and image offset location. If the generated hashdb database contained multiple files, the count variable will tell you how many files contained a matching hash for each sector block hash.
Additional information can be obtained by using the expand_identified_blocks command option.
hashdb expand_identified_blocks hashdb.hdb identified_blocks.txt
Super Nice GUI
SectorScope is a Python 3 GUI interface for this same command line process that was presented at OSDFCon 2015 by Michael McCarrin and Bruce Allen. You definitely want to check it out: https://github.com/NPS-DEEP/NPS-SectorScope
Pre-Built Sector Block Databases
The last bit of this post are some goodies that would take you a long time to build on your own. I know, because I built one of these sets for you. The other set is provided by the same great folks at NIST that give us the NSRL hash databases. They went the extra step to provide us with a block hash list of every file contained in the NSRL that we have been using for years.
Subtracting the NSRL sector hashes from your hashdb will remove known blocks.
VirusShare.com collections is also available for evil sector block hunting too.
Big news happened at #OSDFcon this week. Volatility version 2.5 was dropped. There are quite a number of features that you can read about, but I wanted to take a few minutes to talk about one feature in particular. There have been a number of output options in the past versions of Volatility, but this release makes the different outputs so much easier to work with. The feature is called Unified Output.
This post is not intended to be a ‘How To’ of creating a Volatility plugin. Maybe another day. I just wanted to show the ease of using the unified output in these plugins. If you are feeling like taking on a challenge, take a look through the existing plugins and find out which of them do not yet use the unified output. Give yourself a task to jump into open source development and contribute to a project that has likely helped you to solve some of your cases!
Let me give you a quick rundown of a basic plugin for Volatility
Skeleton in the Plugin
The framework does all the hard work of mapping out the address space, so the code in the plugin has an easier job of discovery and breakdown of the targeted artifacts. It is similar to writing a script to parse data from a PDF file verses having to write code into your script that reads the MBR, VBR, $MFT, etc.
To make a plugin, you have to follow a few structural rules. You can file more details in this document, but here is a quick rundown.
- You need to create a class that inherits from the base plugin class. This gives your plugin structure that Volatility knows about. It molds it into a shape that fits into the framework.
- You need a function called calculate. This is the main function that the framework is going to call. You can certainly create many more functions and name them however you wish, but Volatility is not going to call them since it won’t even know about them.
- You need to generate output.
Number 3 above is where the big change is for version 2.5. In the past versions, you would have to build a function for each format of output.
For example, you would have a render_text to have the results of your plugin output basic text to stdout (console). Here is the render_text function from the iehistory.py plugin file. The formatting of the data has to be handled by the code in the plugin.
Output without the Work
- You need to have a function called unified_output which defines the columns
- You need to have a function called generator which fills the rows with data
Work the Frame
Experience the Difference
Allow me to pick on the guys that won 1st place in the recent 2015 Volatility plugin contest for a minute. Especially since I got 2nd place behind them. Nope, I am not bitter… All in fun! They did some great research and made a greatplugin.
When you run their plugin against a memory image, you will get the default output to stdout.
The reason for this is because they used the previous version rendering. The nice part is if they change the code to add JSON output, the unified output would also support SQLite and XLS or any other rendering format provided by the framework. Thanks to the Fireye guys for being good sports!
Now, I will use one of the standard plugins to display a couple different formats. PSList gives us a basic list of all the processes running on the computer at the time the memory image was acquired.
Here is the standard text output.
Hear My Plea
Allow me to get a shameless plug for my own project now. Evolve is a web based front end GUI that I created to interface with Volatility. In order for it to work with the plugins, they have to support SQLite output. The easiest way of supporting SQLite is to use the new unified output feature. The best part is that it works for a ton of other functions as well, with all the different formats that are supported.
If you have written, or are writing, a plugin for Volatility, make it loads better by using the unified format. We have to rely on automation with the amount of data that we get in our cases today, so let’s all do our part!
Thanks to the Volatility team for all of their hard work in the past, now, and in the future to come. It is hard to support a framework like this without being a for-profit organization. We all appreciate it!
You have all run into this dreaded case, I’m sure. Maybe a few times. I am talking about that case where you are asked to prove or disprove a given action of a user, but the evidence just doesn’t give in. You poke, you prod, you turn it inside-out. Nothing you do seems to be getting you any closer to a conclusion. These are the cases that haunt me for a long time. I still have some rumbling around in my head from over 10 years ago.
Evidence of Logic
The absence of evidence is not evidence of absence
Going the Distance
Releasing the Burden
I haven’t spent much time looking at crash dumps, but I ran into one recently. I had a case that presented me with little in the way of sources for artifacts. It forced me to get creative, and I ended up finding a user-mode crash dump that gave me the extra information I needed. It wasn’t anything earth-shattering, but it could be helpful for some of you in the future.
Crash Dump Files
Forcing a Crash Dump
Analyzing the Dump File
“F” indicates uncompressed
“C” indicates a zlib compressed SWF (SWF 6 and later only)
“Z” indicates a LZMA compressed SWF (SWF 13 and later only)
Signature byte always “W”
Signature byte always “S”
Single byte file version (for example, 0x06 for SWF 6)
Length of entire file in bytes
Carving SWF Files
Collecting the Sample
Analyzing the Sample
The Last Piece
Putting It All Together
What do you think? Enough to satisfy?
Here is the sample SWF I used for this walk through.