Faster API Calls

I have been building out APIs to automate digital forensics and incident response analysis, so artifacts are quickly ready for review during an investigation. I was using the ‘requests’ library from python but wanted to speed up the process. I found that using the ‘asyncio’ library provided me with improved performance to read a line delimited file, make a web request and write the results to disk. It was super useful, so I wanted to share a code snippet in case it could help others!


I had to use ‘aiohttp.TCPConnector(ssl=False)’ on my MacBook but not my Ubuntu system for a certificate verification error, FYI.

import aiofiles
import aiohttp
import asyncio
import async_timeout

inputfile = 'input.txt'
outputfile = 'output.txt'

key = '<key>'
connection = aiohttp.TCPConnector(ssl=False)

async def fetch(session, url):
  with async_timeout.timeout(10):
    async with session.get(url,headers={'x-api-key':key}) as response:
      return await response.text()

async def main(loop):
  async with aiohttp.ClientSession(loop=loop,connector=connection) as session:
    async with,'r') as f:
      async with,'w') as w:
        async for line in f:
          url = ''+str(line[:-1])
          jsonouput = await fetch(session,url)
          await w.write(jsonouput+'\n')

loop = asyncio.get_event_loop()

Happy Coding!
John Lukach

MatchMeta.Info v2

I have been using full paths from gold builds since the Fall of 2014 to match meta information. Files and directories that are hidden in plain sight by character substitutions, misspelling, etc. are more visible to the investigator by this analysis. The project started as a Python script with lookups against a stand-alone SQLite database containing Microsoft Windows installs from Technet. In the Spring of 2016, I switched to an API written in Twisted Python for filename comparisons from the NIST NSRL.

Recently, cloud computing has allowed me to reduce API infrastructure management and expand MatchMeta.Info to additional operating systems: Amazon Linux, CentOS, Debian, Red Hat Enterprise, Suse, Ubuntu, and Windows.

The full path of a file or directory first needs to be normalized before generating a SHA256 hash. First I test if the full path is from a Unix or Windows operating system. If it is from a Microsoft system, the drive letter is forced to ‘C’ as default. Next, I test for home directories so the username can be standardized. A SHA256 hash is generated from the final full path and compared with an API request.

import hashlib
import requests

### USERS ###
# admin = DEB8 DEB9
# Administrator = WIN2K3 WIN2K8 WIN2K12 WIN2K16
# centos = CENTOS6 CENTOS7
# ec2-user = AMZN1 AMZN2 RHEL7 SUSE12 SUSE15
# ubuntu = UBUNTU14 UBUNTU16 UBUNTU18

### PATHS ###
# DIR  = C:\Users\Administrator
# DIR  = /home/ubuntu
# FILE = C:\Users\Administrator\NTUSER.DAT
# FILE = /home/ubuntu/.ssh/authorized_keys

unix = 'ubuntu'
path = r'D:\Users\John\NTUSER.DAT'

if path[:1] == '/': ### UNIX
    out = path.split('/')
    if out[1] == 'home':
        out[2] = unix
        path = '/'.join(out)
elif path[1] == ':': ### WINDOWS
    new = list(path)
    new[0] = 'C'
    path = (''.join(new))
    out = path.split('\\')
    if out[1] == 'Users' or out[1] == 'Documents and Settings':
        out[2] = 'Administrator'
        path = '\\'.join(out)

hash_object = hashlib.sha256(path.encode())
hash_value = hash_object.hexdigest()

r = requests.get(''+hash_value.upper(), 
                  headers={'x-api-key': '<key>'})


In the future, I hope to manage the API keys through the AWS Marketplace to allow others access to MatchMeta.Info.

Happy Coding!
John Lukach

AWS Pseudo Pipeline

I have been running my Forensic Artifact API on Ubuntu with a Nginx, Flask Python, and MariaDB stack. I wanted to get out of the infrastructure administration business by moving to the AWS Cloud. I decided to start with the migration of my SHA256 hash library. My goals were to improve availability, allow collaboration and keep the costs down. I wound up having an expensive learning experience while importing the data into DynamoDB!

Forensic Artifact API Diagram

I decided to use the Amazon Web Services (AWS) Boto3 SDK for Python so I could read from an S3 bucket with an EC2 instance that inserts into a DynamoDB table. I was able to read the line-delimited text file of SHA256 hashes as a stream minimizing the amount of memory required on the EC2 instance for Python. Batch writing of items into the DynamoDB table can use a maximum set of twenty-five. I set the batch volume with ‘range’ in the for loop that must match the minimum provisioned capacity for auto-scaling startup. Global tables being used to replicate DynamoDB across regions needs to match ‘range’ until the first auto-scale completes.

import boto3

def import_hash(hashlist,hashtype,hashsrc,hashdesc):

    client = boto3.client('s3')
    resource = boto3.resource('s3')
    matchmeta = resource.Bucket('bucketname')
    obj = client.get_object(Bucket='bucketname', Key=hashlist)

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('sha256')

    while True:
        with table.batch_writer() as batch:
            for i in range(25):
                item = obj['Body']._raw_stream.readline()[:-2].decode('utf-8')
                if not item: break 
        if not item: break


DynamoDB has an issue if read/writes go to ‘zero’ that auto-scaling will not reduce down to the minimum provisioned capacity. I needed to use a time-based CloudWatch event to execute a Lambda function to generate regular database activity.

import boto3

dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):

    table = dynamodb.Table('sha256')


Happy Coding!

John Lukach

Building Ubuntu Packages

Bruce Allen with the Navy Postgraduate School released hashdb 3.0 adding some great improvements for block hashing. My block hunting is mainly done on virtualized Ubuntu so I decided it was time to build a hashdb package. Figured I would document the steps as they could be used for the SANS SIFT, REMnux and many other great Ubuntu distributions too. 

1) Ubuntu 64-bit Server 16.04.1 hashdb Package Requirements

sudo apt-get install git autoconf build-essential libtool swig devscripts dh-make python-dev zlib1g-dev libssl-dev libewf-dev libbz2-dev libtool-bin

2) Download hashdb from GitHub

git clone

3) Verify hashdb Version

cat hashdb/ | more





4) Rename hashdb Folder with Version Number

mv hashdb hashdb-3.0.0

5) Enter hashdb Folder

cd hashdb-3.0.0

6) Bootstrap GitHub Download


7) Configure hashdb Package


8) Make hashdb Package with a Valid Email Address for the Maintainer

dh_make -s -e –packagename hashdb –createorig

9) Build hashdb Package

debuild -us -uc

10) Install hashdb

dpkg -i hashdb_3.0.0-1_amd64.deb

John Lukach

Know Your Network

Do you know what is on your network?  Do you have a record of truth like DHCP logs for connected devices?  How do you monitor for unauthorized devices?  What happens if none of this information is currently available?

Nathan Crews @crewsnw1 and Tanner Payne @payneman at the Security Onion Conference 2016 presented on Simplifying Home Security with CHIVE that will definitely help those with Security Onion deployed answer these questions.  Well worth the watch:

My objective is to create a Python script that helps with the identification of devices on the network using Nmap with limited configuration.  I want to be able to drop a virtual machine or Raspberry Pi onto a network segment that will perform the discovery scans every minute using a cron job.  Generating output that can be easily consumed by a SIEM for monitoring.
I use the netifaces package to determine the network address that was assigned to the device for the discovery scans.
I use the netaddr package to generate the network cidr format that the Nmap syntax uses for scanning subnet ranges.
The script will be executed from cron thus running as the root account, so important to provide absolute paths.  Nmap also needs permission to listen to network responses that is possible at this permission level too.
I take the multi-line native Nmap output and consolidate it down to single lines.  The derived fields are defined by equals (=) for the labels and pipes (|) to separate the values.  I parse out the scan start date, scanner IP address, identified device IP address, identified device MAC address and the vendor associated with the MAC address.
I ship the export.txt file to Loggly ( for parsing and alerting as that allows me to focus on the analysis not the administration.

The full script can be found on GitHub:

John Lukach


Filenames are trivial to being changed.  It is still important to know what ones are common during your investigation.  You can’t remember every filename as there are already twenty-four million plus in the NSRL data set alone.  MatchMeta.Info is my way of automating these comparisons into the analysis process.  Not all investigators have Internet access on their lab machines so I wanted to share the steps to build your own internal site.    
Server Specifications
Twisted Python Installation
I prefer using Ubuntu but feel free to use whatever operating system that your most comfortable using.  The installation process has become very simple!!
apt-get install python-dev python-pip
pip install service_identity twisted
Twisted Python Validation

NSRL Filenames
I download the NSRL data set direct from NIST than parse out the filenames with a Python script that I have hosted on the GitHub project site.
Or feel free to download the already precompiled list of filenames that I have posted here. 
MatchMeta.Info Setup
First create a folder that will contain the file from the GitHub site and the uncompressed nsrl251.txt file in the previous section.  One example is a www folder can be created in the opt directory for these files.  
Second make the two files read only to limit permissions.
chmod 400 nsrl251.txt
Third make the two files owned by the webserver user and group.
chown www-data:www-data nsrl251.txt
Fourth make only the www folder capable of executing the Twisted Python script.
chmod 500 www
Sixth make the www folder owned by the webserver user and group.
chown www-data:www-data www
MatchMeta.Info Service
Upstart on Ubuntu will allow the Twisted Python script to be run as a service by creating the /etc/init/mmi.conf file.  Paste these commands into the newly created file.  Its critical to make sure you use exact absoulute paths in the and mmi.conf files or the service will not start.
start on runlevel [2345]
stop on runlevel [016]

setuid www-data
setgid www-data

exec /usr/bin/python /opt/www/
MatchMeta.Info Port Forwarding
Port 80 is privileged and we don’t want to run the service as root so port forwarding can be used.  This will allow us to run the Python service as the www-data user by appending the following to the bottom of the /etc/ufw/before.rules file.
-A PREROUTING -p tcp –dport 80 -j REDIRECT –to-port 8080
Thanks to @awhitehatter  for the tip on their GitHub site.
Configure Firewall
Please setup the firewall rules to meet your environments requirements.  Ports 80 and 8080 are currently setup to be used for the MatchMeta.Info service.  Don’t forget SSH for system access.
ufw allow 80/tcp
ufw allow 8080/tcp
ufw allow ssh
ufw enable
MatchMeta.Info Validation
Finally, all set to start the MatchMeta.Info Service!!
start mmi
Browsing to these sites should return the word OK on the website.
Browsing to these sites should return the phrase NA on the website.       
I plan to keep moving MatchMeta.Info features from the command line version into the web interface in the future.  A morph for James Habben’s evolve project a web interface for Volatility has already been submitted to incorporate the analysis process.
John Lukach

Block Building Checklist

It is important to understand how artifacts are created that you use during an investigation. Thus I wanted to provide my block building checklist to help others recreate the process. I will walk through the commands used to prepare the blocks for distribution and how to build the block libraries with the removal of a whitelist.
Block Preparation
I have used Windows, Linux and Mac OS X over the course of this project. I recommend using the operating system that your most comfortable with for downloading and unpacking the torrents. The best performance will come from using solid state drives during the block building steps. The more available memory during whitelisting the better. A lot less system resources are necessary when just doing hash searches and comparisons during block hunting.
We saw this command previously in the Block Huntingpost with a new option. The -x option disables parsers so that bulk_extractor only generates the block sector hashes reducing the necessary generation time.
bulk_extractor -x accts -x aes -x base64 -x elf -x email -x exif -x find -x gps -x gzip -x hiberfile -x httplogs -x json -x kml -x msxml -x net -x pdf -x rar -x sqlite -x vcard -x windirs -x winlnk -x winpe -x winprefetch -x zip -e hashdb -o VxShare199_Out -S hashdb_mode=import -S hashdb_import_repository_name=VxShare199 -S hashdb_block_size=512 -S hashdb_import_sector_size=512 -R VirusShare_00199
The following steps help with the reduction of disk storage requirements and reporting cleanliness for the sector block hash database.  It is also a similar process for migrating from hashdb version one to two.  One improvement that I need to make is to use JSON instead of DFXML that was released at OSDFCon2015 by Bruce Allen.
We need to export the sector block hashes out of the database so that the suggested modifications can be made to the flat file output.
hashdb export VxShare199_Out/hashdb.hdb VxShare199.out
·      hashdb – executed application
·      export – export sector block hashes as a dfxml file
·      VxShare185_Out/ – relative folder path to the hashdb
·      hashdb.hdb – default hashdb name created by bulk_extractor
·      VxShare199.out – flat file output in dfxml format
Copy the first two lines of the VxShare199.out file into a new VxShare199.tmp flat file.
head -n 2 VxShare199.out > VxShare199.tmp
Start copying the contents of VxShare199.out file at line twenty-two that are appended to the existing VxShare199.tmp file. The below image indicates what lines will be removed by this command. The line count may vary depending on the operating system or the version of bulk_extractor and hashdb installed.
tail -n +22 VxShare199.out >> VxShare199.tmp
The sed command will read the VxShare199.tmp file than remove the path and beginning of the file name prior to writing into the new VxShare199.dfxml file. The highlighted text in the image below indicates what will be removed.
sed ‘s/VirusShare_00199\/VirusShare\_//g’ VxShare199.tmp > VxShare199.dfxml
Create an empty hashdb with the sector size of 512 using the -p option. The default size is 4096 if no option is provided.
hashdb create -p 512 VxShare199
Import the processed VxShare199.dfxml file into the newly created VxShare199 hashdb database.
hashdb import VxShare199 VxShare199.dfxml
I compress and upload the hashdb database for distribution saving these steps for everyone.
Building Block Libraries
The links to these previously generated hashdb databases can be found at the following link.
Create an empty hashdb called FileBlock.VxShare for the collection.
hashdb create -p 512 FileBlock.VxShare
Add the VxShare199 database to the FileBlock.VxShare database.  This step will need to be repeated for each database. Upkeep is easier when you keep the completely built FileBlock.VxShare database for ongoing additions of new sector hashes.
hashdb add VxShare199 FileBlock.VxShare
Download the sector hashes of the NSRL from the following link.
Create an empty hashdb called FileBlock.NSRL for the NSRL collection.
hashdb create -p 512 FileBlock.NSRL
The NSRL block hashes are stored in a tab delimited flat file format.  The import_tab option is used to import each file that are split by the first character of the hash value, 0-9 and A-F.  I also keep a copy of the built FileBlock.NSRL for future updates too.
hashdb import_tab FileBlock.NSRL
Remove NSRL Blocks
Create an empty hashdb called FileBlock.Info for the removal of the whitelist.
hashdb create -p 512 FileBlock.Info
This command will remove the NSRL sector hashes from the collection creating the final FileBlock.Info database for block hunting.
hashdb subtract FileBlock.VxShare FileBlock.NSRL FileBlock.Info
The initial build is machine time intensive but once done the maintenance is a walk in the park.
Happy Block Hunting!!
John Lukach