AWS Pseudo Pipeline

I have been running my Forensic Artifact API on Ubuntu with a Nginx, Flask Python, and MariaDB stack. I wanted to get out of the infrastructure administration business by moving to the AWS Cloud. I decided to start with the migration of my SHA256 hash library. My goals were to improve availability, allow collaboration and keep the costs down. I wound up having an expensive learning experience while importing the data into DynamoDB!

Forensic Artifact API Diagram

I decided to use the Amazon Web Services (AWS) Boto3 SDK for Python so I could read from an S3 bucket with an EC2 instance that inserts into a DynamoDB table. I was able to read the line-delimited text file of SHA256 hashes as a stream minimizing the amount of memory required on the EC2 instance for Python. Batch writing of items into the DynamoDB table can use a maximum set of twenty-five. I set the batch volume with ‘range’ in the for loop that must match the minimum provisioned capacity for auto-scaling startup. Global tables being used to replicate DynamoDB across regions needs to match ‘range’ until the first auto-scale completes.

import boto3

def import_hash(hashlist,hashtype,hashsrc,hashdesc):

    client = boto3.client('s3')
    resource = boto3.resource('s3')
    matchmeta = resource.Bucket('bucketname')
    obj = client.get_object(Bucket='bucketname', Key=hashlist)

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('sha256')

    while True:
        with table.batch_writer() as batch:
            for i in range(25):
                item = obj['Body']._raw_stream.readline()[:-2].decode('utf-8')
                if not item: break 
                batch.put_item(Item={'sha256':item.upper(),'type':hashtype,'source':hashsrc,'desc':hashdesc})
        if not item: break

import_hash('Folder/File.txt','Known','HashSets.com','Windows')

DynamoDB has an issue if read/writes go to ‘zero’ that auto-scaling will not reduce down to the minimum provisioned capacity. I needed to use a time-based CloudWatch event to execute a Lambda function to generate regular database activity.

import boto3

dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):

    table = dynamodb.Table('sha256')
    table.get_item(Key={'sha256':'0000000000000000000000000000000000000000000000000000000000000000','type':'TEST'})
    table.put_item(Item={'sha256':'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF','type':'TEST','source':'JOHN','desc':'PING'})

    return

Happy Coding!

John Lukach
@jblukach

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s