GUIs are Hard – Python to the Rescue – Part 1

I consider myself an equal opportunity user of tools, but in the same respect I am also an equal opportunity critic of tools. There are both commercial and open source digital forensic and security tools that do a lot of things well, and a lot of things not so well. What makes for a good DFIR examiner is the ability to sort through the marketing fluff to learn what these tools can truly do and also figure out what they can do very well.

One of the things that I find limiting in many of the tools is the Graphical User Interface (GUI). We deal with a huge amount of data, and we sometimes analyze it in ways we couldn’t have predicted ourselves. GUI tools make a lot of tasks easy, but they can also make some of the simplest tasks seem impossible.

My recommendation? Every tool should offer output in a few formats: CSV, JSON, SQLite. Give me the ability to go primal!

Tool of the Day

I have had a number of cases lately that have started as ‘malware’ cases. Evil traffic tripped an alarm, and that means there must be malware on the disk. It shouldn’t be surprising to you, as a DFIR examiner, that this is not always the case. Sometimes there is a bonehead user browsing stupid websites.

Internet Evidence Finder (IEF) is the best tool I have available right now to parse and carve the broad variety of internet activity artifacts from the drive images. It does a pretty good job searching over the disk to find browser, web app, and website artifacts (though I don’t know exactly which versions are supported due to documentation, but I will digress in a different post).

Let me cover some of IEF’s basic storage structure that I have worked out first. The artifacts are stored in SQLite format in the folder that you designate, and it is named ‘IEFv6.db’. Every artifact type that is found creates at least one table in the DB. Because each artifact has different properties, each of the tables have a different schema. Some good things that the dev team at Magnet seem to have decided on do allow for some kind of consistency. If the column has URL data in it, then the column name has ‘URL’ in it. Similarly for dates, the column name will have ‘date’ in it.

IEF provides a search function that allows you to do some basic string searching, or you can get a bit more advanced by providing a RegEx (GREP) pattern for the search. When you kick off this search, IEF creates a new SQLite db file named ‘Search.db’ and it is stored in the same folder. You can only have one search completed at a time since kicking off a new search will cause IEF to overwrite any previous db that was created. The search db, from what i can tell anyways, seems to have an identical schema structure as the main db, only it has been filtered down on the number of records that it holds based on the keywords or patterns that you provided.

There is another feature called filter, and I will admit that I have only recently found this. It allows you to apply various criteria to the dataset, with the major one being a date range. There are other things you can filter one, but I haven’t needed to explore those just yet. When you kick off this process, you end up with yet another SQLite database filled with a reduced number of records based on the criteria and again it seems identical in schema as the main db. This one is named ‘filter.db’ and indicates that the dev team doesn’t have much creativity. 😉

Problem of the Month

The major issue I have with the tool is in the way it presents data to me. The interface has a great way of digging into forensic artifacts as they are categorized and divided by the artifact type. You can dig into the nitty gritty details of each browser artifact. For the cases that I have used IEF for lately, and I suspect many of you in your Incident Response cases as well, I really don’t actually care *which* browser was the source of the traffic. I just need to know if that URL was browsed by bonehead so I can get him fired and move on. Too harsh? 🙂

IEF doesn’t give you the ability to have a view where all of the URLs are consolidated. You have to click, click, click down through the many artifacts, and look through tons of duplicate URLs. The problem behind this is on the design of the artifact storage in multiple tables with different schemas in a relational database. A document based database, such as MongoDB, would have provided an easier search approach, but there are trade-offs that I don’t need to tangent on here. I will just say that there is no 100% clear winner.

To perform a search over multiple tables in a SQL based DB, you have to implement it in some kind of program code because a SQL query is almost impossible to construct. SQLite makes it even more difficult with its reduced list of native functions and it’s lack of ability to create any user-defined functions or stored procedures. It just wasn’t meant for that. IEF handles this task for the search and filter process in c# code, and creates those new DB files as a sort of cache mechanism.

Solution of the Year

Alright, I am sensationalizing my own work a bit too much, but it is easy to do when it makes your work so much easier. That is the case with the python script I am showing you here. It was born out of necessity and tweaked to meet my needs for different cases. This one has saved me a lot of time, and I want to share it with you.

This python script can take an input (-i) of any 3 of the IEF database files that I mentioned above since they share schema structures. The output (-o) is another SQLite database file (I know, like you need another one in addition) in the location of your choosing.

The search (-s) parameter allow you to provide a string to filter the records on based upon that string being present in one of the URL fields of the record being transferred. I added this one because the search function of IEF doesn’t allow me to direct the keyword at a URL field. I had results from my keywords that were hitting on several other metadata fields that I had no interest in.

The limit (-l) parameter was added because of a bug I found in IEF with some of the artifacts. I think it was mainly in the carved artifacts so I really can’t fault too much, but it was causing a size and time issue for me. The bug is that the URL field for a number of records was pushing over 3 million characters long. Let me remind you that each character in ASCII is a byte, and having 3 million of those creates a URL that is 3 megabytes in size. Keep in mind that URLs are allowed to be Unicode, so go ahead and x2 that. I found that most browsers start choking if you give them a URL over 2000 characters, so I decided to cutoff the URL field at 4000 by default to give just a little wiggle room. Magnet is aware of this and will hopefully solve the issue in an upcoming version.

This python script will open the IEF DB file and work its way through each of the tables to look for any columns that have ‘URL’ in the name. If one is found, it will grab the type of the artifact and the value of the URL to create a new record in the new DB file. Some of the records in the IEF artifacts have multiple URL fields, and this will take each one of them into the new file as a simple URL value. The source column is the name of the table (artifact type) and the name of the column of where that value came from.

This post has gotten rather long, so this will be the end of part 1. In part 2, I will go through the new DB structure to explain the SQL views that are created and then walk through some of the code in Python to see how things are done.

In the meantime, you can download the Python script and take a look for yourself. You will have to supply your own IEF.

James Habben

Reporting: Benefits of Peer Reviews

Now that you are writing reports to get a personal and professional benefit, let’s look at some other ways that you can get benefits from these time suckers. You need the help of others on this one, since you will be giving your reports to them in seeking a review. You need this help from outside of your little bubble to ensure that you are pushing yourself adequately.

You need a minimum of 2 reviews on the reports you write. The first review is a peer review, and the other is a manager review. You can throw additional reviews on top of these if you have the time and resources available, and that is icing on the cake.

Your employer benefits from reviews for these reasons:

  • Reduced risk and liability
  • Improved quality and accuracy
  • Thorough documentation

There are more personal benefits here too:

  • Being held to a higher standard
  • Gauge on your writing improvement
  • You get noticed

Let me explain more about these benefits in the following sections.

Personal Benefits

Because the main intention of this post is to show the personal benefits and improvements, I will start here.

Higher Standards

The phrase ‘You are your own worst critic’ gets used a lot, and I do agree with it for the most part. For those of us with a desire to learn and improve, we have that internal drive to be perfect. We want to be able to bust out a certain task and nail it 110% all of the time. When we don’t meet our high standards we get disappointed in ourselves and note the flaws to do better next time

Here is where I disagree with that statement just a bit. We can’t hold ourselves to a standard that we don’t understand or even have knowledge about. If you don’t know proper grammar, it is very difficult for you to expect better. Similarly in DFIR, if you don’t know a technique to find or parse an artifact, you don’t know that you are missing out on it.

Having a peer examiner review your report is a great way of getting a second pair of eyes on the techniques you used and the processes you performed. They can review all of your steps and ask you questions to cover any potential gaps. In doing this, you then learn how the other examiners think and approach these scenarios, and can take pieces of that into your own thinking process.

Gauging Your Improvement

Your first few rounds of peer review will likely be rough with a lot of suggestions from your peers. Don’t get discouraged, even if the peer is not being positive or kind about the improvements. Accept the challenge, and keep copies of these reviews. As time goes on, you should find yourself with fewer corrections and suggestions. You now have a metric to gauge your improvement.

Getting Noticed

This is one of the top benefits, in my opinion. Being on a team with more experienced examiners can be intimidating and frustrating when you are trying to prove your worth. This is especially hard if you are socially awkward or shy since you won’t have the personality to show off your skills.

Getting your reports reviewed by peers gives you the chance to covertly show off your skills. It’s not boasting. It’s not bragging. It’s asking for a check and suggestions on improvements. Your peers will review your cases and they will notice the effort and skill you apply, even if they don’t overtly acknowledge it. This will build the respect between examiners on the team.

Having your boss as a required part of the review process ensures that they see all the work you put in. All those professional benefits I wrote about in my previous post on reporting go to /dev/null if your boss doesn’t see your work output. If your boss doesn’t want to be a part of it, maybe its a sign that you should start shopping for a new boss.

Employer Benefits

You are part of a team, even if you are a solo examiner. You should have pride in your work, and pride in the work of your team. Being a part of the team means that you support other examiners in their personal goals, and you support the department and its business goals as well. Here are some reasons why your department will benefit as a whole from having a review process.

Reduced Risk and Liability

I want to hit the biggest one first. Business operations break down to assets and liabilities. Our biggest role in the eyes of our employers is to be an asset to reduce risk and liability. Employees in general introduce a lot of liability to a company and we do a lot to help in that area, but we also introduce some amount of risk ourselves in a different way.

We are trusted to be an unbiased authority when something has gone wrong, be it an internal HR issue or an attack on the infrastructure. Who are we really to be that authority? Have you personally examined every DLL in that Windows OS to know what is normal and what is bad? Not likely! We have tools (assets) that our employers invest in to reduce the risk of us missing that hidden malicious file. Have you browsed every website on the internet to determine which are malicious, inappropriate for work, a waste of time, or valid for business purposes? Again, not a chance. Our employers invest in proxy servers and filters (assets) from companies that specialize in exactly that to reduce the risk of us missing one of those URLs. Why shouldn’t your employer put a small investment in a process (asset) that brings another layer of protection against the risk of us potentially missing something because we haven’t experienced that specific scenario before?

Improved Accuracy and Quality

This is a no brainer really. It is embarrassing to show a report that is full of spelling, grammar, or factual errors. Your entire management chain will be judged when people outside of that chain are reading through your reports. The best conclusions and recommendations in the world can be thrown out like yesterdays garbage if they are filled with easy to find errors. It happens though, because of the amount of time it takes to write these reports. You can become blind to some of those errors, and a fresh set of eyes can spot things much quicker and easier. Having your report reviewed gives both you and your boss that extra assurance of the reduced risk of sending out errors.

Thorough Documentation

We have another one of those ‘reducing risk’ things on this one. Having your report reviewed doesn’t give you any extra documentation in itself, but it helps to ensure that the documentation given in the report is thorough.

You are typically writing the report for the investigation because you were leading it, or at least involved in some way. Because you were involved, you know the timeline of events and the various twists and turns that you inevitably had to take. It is easy to leave out what seems like pretty minor events in your own mind, because they don’t seem to make much difference in the story. With a report review, you will get someone else’s understanding of the timeline. Even better is someone who wasn’t involved in that case at all. They can identify any holes that were left by leaving out those minor events and help you to build a more comprehensive story. It can also help to identify unnecessary pieces of the timeline that only bring in complexity by giving too much detail.

Part of the Process

Report reviews need to be a standard part of your report writing process. They benefit both you and your employer in many ways. The only reason against having your reports reviewed is the extra time required by everyone involved in that process. The time is worth it, I promise you. Everyone will benefit and grow as a team.

If you have any additional thoughts on helping others sell the benefits of report reviews, feel free to leave them in the comments. Good luck!

James Habben

Report Rapport

Let me just state this right at the top. You need to be writing reports. I don’t care what type of investigation you are doing or what the findings are. You need to be writing reports.

There are plenty of reasons that your management will tell you about why you have to write a report. There are even more reasons for you to write these reports, for your own benefit. Here is a quick list of a few that I thought of, and I will discuss a bit about each in sections below.

  •  Documenting your findings
  •  Justification of your time
  • CYA
  •  Detail the thoroughness of your work
  • Show history of specific user or group
  • Justification for shiny tools
  • Measure personal growth

Documenting Your Findings

Your boss will share the recommendation for this because it’s a pretty solid one. You need to document what you have found. As DFIR investigators, security specialists, infosec analysts, etc., we are more technical in nature than the average computer user. We know the inner most workings of these computers, and often times how to exploit them in ways they weren’t designed. We dig through systems on an intimate level, and with this knowledge we can make some incorrect assumptions that others understand the most basic of things.

Take an example of a word document. A current generation word document has an extension of ‘docx’ when saved to disk. So many things fly through my mind when I see those letters. I know that because of the ‘x’, that it is a current generation. The current generation use the PK zip file format. It contains metadata, and in the form of XML. It has document data, and is also in the form of XML. It can have attachments, and those are always placed in a specific directory. I know you can keep going too. How many of your executives know this?

The people making decisions to investigate incidents and pay your salary do not need to know these things, but they do need to understand them in the context of your investigation. Document the details like your job depends on it. Use pictures and screen shots if you have to, since that helps display data in a friendlier way to less technical people. Go to town with it and be proud of what you discovered. The next time you have a similar case, you will have this as a reference to help spur thoughts and ensure completeness.

Justification of your time

We are a bunch of professionals that get paid very well, and we work hard for it. How many times in the last month have you thought or said to yourself that you do not have enough time in the day to complete all the work that is being placed in your queue?

 When you report on your work, you are providing documentation of your work. The pile of hard drives on your desk makes it seem to others that you can’t keep up. That could mean that they are asking too much of you, or it could mean that you aren’t capable enough. You don’t want to leave that kind of question in the minds of your management. Write the reports to show the time you are spending. Show them how much work is required for a ‘quick check into this ransomware email’ and that it isn’t actually just a quick check. If you do this right, you might just find yourself with a new partner to help ease that workload.


People like to place blame on others to make sure they are clear. Your reports should document the facts how they are laid out, and let it speak for itself. You should include information about when data was requested and when it was collected. Document the state of the data and what was needed to make it usable, if that was required. Track information about your security devices and how they detected or didn’t detect pieces of the threat. You should be serving as a neutral party in the investigation to find the answers, not place the blame.

Detail the thoroughness of your work

So many investigations are opened with a broad objective, and that is to find the malware. Depending on the system and other security devices, it could be as easy as running an AV scan on the disk. Most times, in my experience at least, this is going to come up clean since it didn’t get detected in the first place anyways.

You are an expert. Show it in your reports. Give those gritty details that you love to dig into, and not just those about what you found. The findings are important, but you should also document the things you did that resulted in no findings. You spend a lot of time and some people don’t understand what’s required beyond an AV scan.

Show history of specific user or group

If you are an investigator working for a company, you are guaranteed to find those users that always get infected. They are frustrating because it causes more work for you, and they are usually some little Possibly Unwanted Program (PUP) or ransomware. They are the type of person that falls for everything, and you have probably thought or said some things about them that don’t need to be repeated.

Document your investigations, and you will be able to show that Thurston Howell III has a pattern of clicking on things he shouldn’t. Don’t target these people though. Document everything. As a proactive measure, you could start building a report summarizing your reports. Similar to the industry reports about attack trends, you can show internal trends and patterns that indicate things like a training program is needed keep users from clicking on those dang links. This can also support justification to restrict permissions for higher risk people and groups, and now you have data to back up the fact of being high risk. There can be loads of data at your disposal, and it’s limited by your imagination on how to effectively use it.

Justification for shiny tools

Have you asked for a new security tool and been turned down because it costs too much? What if you could provide facts showing that it is actually costing more to NOT have this tool?

Your reports provide documentation of facts and time. You can use these to easily show a cost analysis. Do the math on the number of investigations related to this tool, the hours involved in those investigations by everyone, not just you. You will have to put together a little extra on showing how much time the new fanciness will save, but you should have done the hard part by already writing reports.

Measure personal growth

This one is completely about you. We all grow as people, and we change the way we write and think. We do this because of our experiences, and our understanding that we can evolve to be better. Do you write like you did in 1st grade? Hope not! How about 12th grade? Unless you are a freshman in college, you have probably improved from there also.

When you write reports, you give yourself the ability to measure your growth. This can be very motivating, but it takes personal drive. If you have any reports from even just 6 months ago, go back and read them. You might even ask yourself who actually wrote that report, and I don’t think that’s a bad thing!

Final report

Reports can be a rather tedious part of our job, but if you embrace the personal benefits it can really become a fun part. Take pride in your investigation and display that in your reports. It will show. It works similar to smiling when you talk on the phone. People can tell the difference.

If you are writing reports today, good for you! Push yourself further and make it fun.

If you are not writing reports today, DO IT!

I am starting a mini series of posts on reporting. Future posts will be on structure and various sections of an investigative report. These are all my experiences and opinions, and I welcome your comments as well. Let’s all improve our reports together!

James Habben


Filenames are trivial to being changed.  It is still important to know what ones are common during your investigation.  You can’t remember every filename as there are already twenty-four million plus in the NSRL data set alone.  MatchMeta.Info is my way of automating these comparisons into the analysis process.  Not all investigators have Internet access on their lab machines so I wanted to share the steps to build your own internal site.    
Server Specifications
Twisted Python Installation
I prefer using Ubuntu but feel free to use whatever operating system that your most comfortable using.  The installation process has become very simple!!
apt-get install python-dev python-pip
pip install service_identity twisted
Twisted Python Validation

NSRL Filenames
I download the NSRL data set direct from NIST than parse out the filenames with a Python script that I have hosted on the GitHub project site.
Or feel free to download the already precompiled list of filenames that I have posted here. 
MatchMeta.Info Setup
First create a folder that will contain the file from the GitHub site and the uncompressed nsrl251.txt file in the previous section.  One example is a www folder can be created in the opt directory for these files.  
Second make the two files read only to limit permissions.
chmod 400 nsrl251.txt
Third make the two files owned by the webserver user and group.
chown www-data:www-data nsrl251.txt
Fourth make only the www folder capable of executing the Twisted Python script.
chmod 500 www
Sixth make the www folder owned by the webserver user and group.
chown www-data:www-data www
MatchMeta.Info Service
Upstart on Ubuntu will allow the Twisted Python script to be run as a service by creating the /etc/init/mmi.conf file.  Paste these commands into the newly created file.  Its critical to make sure you use exact absoulute paths in the and mmi.conf files or the service will not start.
start on runlevel [2345]
stop on runlevel [016]

setuid www-data
setgid www-data

exec /usr/bin/python /opt/www/
MatchMeta.Info Port Forwarding
Port 80 is privileged and we don’t want to run the service as root so port forwarding can be used.  This will allow us to run the Python service as the www-data user by appending the following to the bottom of the /etc/ufw/before.rules file.
-A PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8080
Thanks to @awhitehatter  for the tip on their GitHub site.
Configure Firewall
Please setup the firewall rules to meet your environments requirements.  Ports 80 and 8080 are currently setup to be used for the MatchMeta.Info service.  Don’t forget SSH for system access.
ufw allow 80/tcp
ufw allow 8080/tcp
ufw allow ssh
ufw enable
MatchMeta.Info Validation
Finally, all set to start the MatchMeta.Info Service!!
start mmi
Browsing to these sites should return the word OK on the website.
Browsing to these sites should return the phrase NA on the website.       
I plan to keep moving MatchMeta.Info features from the command line version into the web interface in the future.  A morph for James Habben’s evolve project a web interface for Volatility has already been submitted to incorporate the analysis process.
John Lukach

Building Python Packages, By a Novice

I am excited to see that Evolve has been getting some use by more and more people. It has gained enough use and attention to even get the attention of SANS. They want to include Evolve in their SIFT workstation build. This is by no means an endorsement by SANS, but it means a lot to an open source developer to know that their tools are being used and helpful.

The requirement of Evolve making it into SIFT is that it needs to be installed from the Python Package Index (PyPI). This is a very reasonable requirement since it makes the maintenance of SIFT a much more reasonable project. A project, may I add, that is also available for free and maintained in the free time of the volunteers. Thanks!

I started reading about Python packages and distribution, and found that it is very capable and very flexible, almost too much. It has been a challenge for me to squeeze in the reading and testing between the demands of my full time job, but I finally stumbled my way through it to a final version. There are many tutorials available that explain the basics of Python packages, but I had a little different problem than what most Python projects face, I guess. I wanted to write this all down to share my experience for any of you that might benefit.

El Problemo

First of all, I am not a full time developer, nor am I a master of Python. I think this may be the root of my problem! I started Python, like many others, with assembling pieces of others scripts to make a solution to the problem I was facing at the time. Because of my background with so many other languages, it was a fairly short phase of learn the environment and intricacies of Python before I was able to start from scratch and StackOverflow my way through. I took on Evolve as a way to expand further, and to solve another problem. That is the mother of invention, after all. It has been a fun and rewarding experience, with many thanks to all of you supporting it!

Put me aside now, and let’s talk about the technical problem with packaging Evolve. A typical package available from PyPI is python code. You get a little more exotic when you find the author including some code written in C or C++ to make for a more efficient function. This code requires compiling, but the PyPI can handle it, and does it well (except for on Windows). Where Evolve presented the problem is in the HTML, CSS, JS, and images used in the web interface. These aren’t considered code by the Python packager, so it was a challenge for me to get them included.

Basic Building

I don’t want to rehash the basic build process since there are already many very well written tutorials out there to explain that. Instead, I will include a short list of some links that were helpful during my journey here.

Creating the file to start it all off
This tutorial was very helpful in building the basics of the file. It explains most of the properties well. The part that I found lacking was in the sections for including extra non-python code in the package.

The trouble with including non-python files,-more-lies-and-python-packaging-documentation-on–package_data-/


This was a great help in truly understanding what I thought I understood after reading the Python docs. It also helped keep me sane and moving forward!

Including some other files
This helped me somewhat. I was able to get the folder of HTML files included, but it was a rather manual process. I knew I could fall back on this, but I figured that there must be a better way. We are doing programmer things, after all.

More on including other files
Another good tutorial on the packaging process, but it didn’t fully register with what I needed.

Yet another on the process of building

This is the article that finally made things fit together. In fairness to the others, I think it just took some time to sink in.

Highlighted Points

Here are some points I thought I would share. Some of these may be obvious to you already, but I had trouble getting a full grasp of the exact requirements. These are not listed in any particular order of importance.

sdist and bdist use different properties

As stated in one of the articles, there are different ways to package your project. You can distribute the source code with sdist, or you can build it into a binary with bdist. Each of these methods uses different properties from inside the file. Be aware of which method you are using to distribute, and which properties are associated with each. is a critical file, even if it’s blank

In building the Morph feature of Evolve, I found that I had to have an file in the morphs directory for proper Python function. You can look at both of these files that are in the project (project root and the morphs folder), and you will see they are both essentially blank. There is function behind having them, and they provide more function by placing code inside. I don’t need that function though, so they remain empty. In the process of making this package, I found that the file is needed for the build process to recognize that the folder includes other Python files of code that need to be included in the package.

I am using classes in Evolve, but only for the Morphs. It’s something I want to address in the future, but moving the main code into classes will require time in refactoring and testing that I just don’t have quite yet.


Bonehead move on my part, but this was the final blocker to me getting this venture to work. I read through many articles talking about modifying the file to search for other files to include. I made the bad assumption that MANIFEST was the file being specified. WRONG. The file is a template that is used during the build process. The MANIFEST file is written by the build process, as a log of the files included in the distribution package. To make me look even dumber, the first line in the file says ‘# file GENERATED by distutils, do NOT edit’…

Use distutils instead of setuptools

There are a few shortcomings in setuptools that I read about at the start of this project that were addressed in distutils. In typical Python and open source fashion, a problematic library was fixed, but moved to be named differently. Rants aside, just use the newer distutils and things will be much smoother.

That’s It

Again, enough tutorials out there already to explain this process. They do a pretty good job, but I ran into troubles including the extra files in Evolve. I hope this helps one of you to not pull your hair out when trying to build your Python package. Share any other tips you have below in the comments!

James Habben

Block Building Checklist

It is important to understand how artifacts are created that you use during an investigation. Thus I wanted to provide my block building checklist to help others recreate the process. I will walk through the commands used to prepare the blocks for distribution and how to build the block libraries with the removal of a whitelist.
Block Preparation
I have used Windows, Linux and Mac OS X over the course of this project. I recommend using the operating system that your most comfortable with for downloading and unpacking the torrents. The best performance will come from using solid state drives during the block building steps. The more available memory during whitelisting the better. A lot less system resources are necessary when just doing hash searches and comparisons during block hunting.
We saw this command previously in the Block Huntingpost with a new option. The -x option disables parsers so that bulk_extractor only generates the block sector hashes reducing the necessary generation time.
bulk_extractor -x accts -x aes -x base64 -x elf -x email -x exif -x find -x gps -x gzip -x hiberfile -x httplogs -x json -x kml -x msxml -x net -x pdf -x rar -x sqlite -x vcard -x windirs -x winlnk -x winpe -x winprefetch -x zip -e hashdb -o VxShare199_Out -S hashdb_mode=import -S hashdb_import_repository_name=VxShare199 -S hashdb_block_size=512 -S hashdb_import_sector_size=512 -R VirusShare_00199
The following steps help with the reduction of disk storage requirements and reporting cleanliness for the sector block hash database.  It is also a similar process for migrating from hashdb version one to two.  One improvement that I need to make is to use JSON instead of DFXML that was released at OSDFCon2015 by Bruce Allen.
We need to export the sector block hashes out of the database so that the suggested modifications can be made to the flat file output.
hashdb export VxShare199_Out/hashdb.hdb VxShare199.out
·      hashdb – executed application
·      export – export sector block hashes as a dfxml file
·      VxShare185_Out/ – relative folder path to the hashdb
·      hashdb.hdb – default hashdb name created by bulk_extractor
·      VxShare199.out – flat file output in dfxml format
Copy the first two lines of the VxShare199.out file into a new VxShare199.tmp flat file.
head -n 2 VxShare199.out > VxShare199.tmp
Start copying the contents of VxShare199.out file at line twenty-two that are appended to the existing VxShare199.tmp file. The below image indicates what lines will be removed by this command. The line count may vary depending on the operating system or the version of bulk_extractor and hashdb installed.
tail -n +22 VxShare199.out >> VxShare199.tmp
The sed command will read the VxShare199.tmp file than remove the path and beginning of the file name prior to writing into the new VxShare199.dfxml file. The highlighted text in the image below indicates what will be removed.
sed ‘s/VirusShare_00199\/VirusShare\_//g’ VxShare199.tmp > VxShare199.dfxml
Create an empty hashdb with the sector size of 512 using the -p option. The default size is 4096 if no option is provided.
hashdb create -p 512 VxShare199
Import the processed VxShare199.dfxml file into the newly created VxShare199 hashdb database.
hashdb import VxShare199 VxShare199.dfxml
I compress and upload the hashdb database for distribution saving these steps for everyone.
Building Block Libraries
The links to these previously generated hashdb databases can be found at the following link.
Create an empty hashdb called FileBlock.VxShare for the collection.
hashdb create -p 512 FileBlock.VxShare
Add the VxShare199 database to the FileBlock.VxShare database.  This step will need to be repeated for each database. Upkeep is easier when you keep the completely built FileBlock.VxShare database for ongoing additions of new sector hashes.
hashdb add VxShare199 FileBlock.VxShare
Download the sector hashes of the NSRL from the following link.
Create an empty hashdb called FileBlock.NSRL for the NSRL collection.
hashdb create -p 512 FileBlock.NSRL
The NSRL block hashes are stored in a tab delimited flat file format.  The import_tab option is used to import each file that are split by the first character of the hash value, 0-9 and A-F.  I also keep a copy of the built FileBlock.NSRL for future updates too.
hashdb import_tab FileBlock.NSRL
Remove NSRL Blocks
Create an empty hashdb called FileBlock.Info for the removal of the whitelist.
hashdb create -p 512 FileBlock.Info
This command will remove the NSRL sector hashes from the collection creating the final FileBlock.Info database for block hunting.
hashdb subtract FileBlock.VxShare FileBlock.NSRL FileBlock.Info
The initial build is machine time intensive but once done the maintenance is a walk in the park.
Happy Block Hunting!!
John Lukach

Critical Stack Intel Feed Consumption

Critical Stack provides a free threat intelligence aggregation feed through their Intel Market for consumption by the Bronetwork security monitoring platform. This is a fantastic service that is provided for free!! Special thanks to those who have contributed their feeds for all to take advantage of the benefits!! Installation is beyond the scope of this post as it is super easy with decent documentation available on their website. The feed updates run roughly hourly by default into a tab delimited file available on disk.

My goal was to make the IP address, domain and hash values accessible through a web interface for consumption by other tools in your security stack. Additionally, I didn’t want to create another database structure but be able to read the values into memory for comparison on script restarts. Decided to use Twisted Python by Twisted Matrix Labs to create the web server. Twisted is an event-driven networking engine written in Python. The script provides a basic foundation without entering into the format debate between STIX and JSON.  Kept it simple…

Twisted Python Installation

The following installation steps work on Ubuntu 14.04 as that is my preference.

apt-get install build-essential python-setuptools python-dev python-pip

pip install service_identity


bzip2 -d Twisted-15.5.0.tar.bz2

tar -xvf Twisted-15.5.0.tar

cd Twisted-15.5.0/

python install

The PIP package installation allows for the future usage of SSL and SSH capabilities in Twisted. Script

The default installation file and path containing the Critical Stack Intel Feed artifacts.




The field separator on each line that gets loaded into the Python list in memory.




The output that gets displayed on the dynamically generated web page based on user input.





The port that the web server runs on for the end-user to access the web page. Usage

The script can be used after execution by browsing to the website with an IP address, domain or hash value provided in the path.  If the result returns FOUND that means it is part of the Critical Stack Intel Feed.


Feel free to change the code to meet your needs and really appreciate any contributions back to the DFIR community.

 Happy Coding!!
John Lukach

Updated 12/15/2015

  • displays the feed that an IP address, domain, or hash originated.
  •       Upstart configuration file for running the Twisted Python script at startup.
  •       Crontab configuration that restarts the script hourly after Critical Stack Intel updates.