AWS Pseudo Pipeline

I have been running my Forensic Artifact API on Ubuntu with a Nginx, Flask Python, and MariaDB stack. I wanted to get out of the infrastructure administration business by moving to the AWS Cloud. I decided to start with the migration of my SHA256 hash library. My goals were to improve availability, allow collaboration and keep the costs down. I wound up having an expensive learning experience while importing the data into DynamoDB!

Forensic Artifact API Diagram

I decided to use the Amazon Web Services (AWS) Boto3 SDK for Python so I could read from an S3 bucket with an EC2 instance that inserts into a DynamoDB table. I was able to read the line-delimited text file of SHA256 hashes as a stream minimizing the amount of memory required on the EC2 instance for Python. Batch writing of items into the DynamoDB table can use a maximum set of twenty-five. I set the batch volume with ‘range’ in the for loop that must match the minimum provisioned capacity for auto-scaling startup. Global tables being used to replicate DynamoDB across regions needs to match ‘range’ until the first auto-scale completes.

import boto3

def import_hash(hashlist,hashtype,hashsrc,hashdesc):

    client = boto3.client('s3')
    resource = boto3.resource('s3')
    matchmeta = resource.Bucket('bucketname')
    obj = client.get_object(Bucket='bucketname', Key=hashlist)

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('sha256')

    while True:
        with table.batch_writer() as batch:
            for i in range(25):
                item = obj['Body']._raw_stream.readline()[:-2].decode('utf-8')
                if not item: break 
                batch.put_item(Item={'sha256':item.upper(),'type':hashtype,'source':hashsrc,'desc':hashdesc})
        if not item: break

import_hash('Folder/File.txt','Known','HashSets.com','Windows')

DynamoDB has an issue if read/writes go to ‘zero’ that auto-scaling will not reduce down to the minimum provisioned capacity. I needed to use a time-based CloudWatch event to execute a Lambda function to generate regular database activity.

import boto3

dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):

    table = dynamodb.Table('sha256')
    table.get_item(Key={'sha256':'0000000000000000000000000000000000000000000000000000000000000000','type':'TEST'})
    table.put_item(Item={'sha256':'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF','type':'TEST','source':'JOHN','desc':'PING'})

    return

Happy Coding!

John Lukach
@jblukach

Processing Progress in Axiom

I recently got access to a license to Magent’s Axiom product and have been using it periodically to explore and learn the features. I have used Magnet’s Internet Evidence Finder (IEF) for more years than I really want to admit because it will make me sound really old. I have relied on IEF as a solid platform and tool to investigate the internet and app related artifacts. The Magnet team has done a great job over the years keeping up with artifact storage changes.

With all that flattery done and aside, I am writing this to share with others that don’t have access to Axiom and commend Magnet on the progress feedback displayed during evidence processing. Some forensic suites try to give a simple progress bar. Some of those progress bars seem to move randomly. Some give time estimates. Some estimates increase as time goes on, and some decrease as work is done. This is apparently a hard problem to tackle since there are so many variations and many that leave me feeling like it is effort applied in the wrong places.

For those who haven’t been exposed to Axiom yet, here is what the processing screen looks like:

axiom-progress

The part I love most about the Axiom screen is the right amount of information which tells me that the process is still in motion. I don’t need some inaccurate estimate on timing, whether it’s an over or under estimation.

Keep up the good work guys!

James Habben
@JamesHabben