Faster API Calls

I have been building out APIs to automate digital forensics and incident response analysis, so artifacts are quickly ready for review during an investigation. I was using the ‘requests’ library from python but wanted to speed up the process. I found that using the ‘asyncio’ library provided me with improved performance to read a line delimited file, make a web request and write the results to disk. It was super useful, so I wanted to share a code snippet in case it could help others!

asyncio

I had to use ‘aiohttp.TCPConnector(ssl=False)’ on my MacBook but not my Ubuntu system for a certificate verification error, FYI.

import aiofiles
import aiohttp
import asyncio
import async_timeout

inputfile = 'input.txt'
outputfile = 'output.txt'

key = '<key>'
connection = aiohttp.TCPConnector(ssl=False)

async def fetch(session, url):
  with async_timeout.timeout(10):
    async with session.get(url,headers={'x-api-key':key}) as response:
      return await response.text()

async def main(loop):
  async with aiohttp.ClientSession(loop=loop,connector=connection) as session:
    async with aiofiles.open(inputfile,'r') as f:
      async with aiofiles.open(outputfile,'w') as w:
        async for line in f:
          url = 'https://sha256.tundralabs.net/'+str(line[:-1])
          jsonouput = await fetch(session,url)
          await w.write(jsonouput+'\n')

loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))

Happy Coding!
John Lukach

MatchMeta.Info v2

I have been using full paths from gold builds since the Fall of 2014 to match meta information. Files and directories that are hidden in plain sight by character substitutions, misspelling, etc. are more visible to the investigator by this analysis. The project started as a Python script with lookups against a stand-alone SQLite database containing Microsoft Windows installs from Technet. In the Spring of 2016, I switched to an API written in Twisted Python for filename comparisons from the NIST NSRL.

Recently, cloud computing has allowed me to reduce API infrastructure management and expand MatchMeta.Info to additional operating systems: Amazon Linux, CentOS, Debian, Red Hat Enterprise, Suse, Ubuntu, and Windows.

The full path of a file or directory first needs to be normalized before generating a SHA256 hash. First I test if the full path is from a Unix or Windows operating system. If it is from a Microsoft system, the drive letter is forced to ‘C’ as default. Next, I test for home directories so the username can be standardized. A SHA256 hash is generated from the final full path and compared with an API request.

import hashlib
import requests

### USERS ###
# admin = DEB8 DEB9
# Administrator = WIN2K3 WIN2K8 WIN2K12 WIN2K16
# centos = CENTOS6 CENTOS7
# ec2-user = AMZN1 AMZN2 RHEL7 SUSE12 SUSE15
# ubuntu = UBUNTU14 UBUNTU16 UBUNTU18

### PATHS ###
# DIR  = C:\Users\Administrator
# DIR  = /home/ubuntu
# FILE = C:\Users\Administrator\NTUSER.DAT
# FILE = /home/ubuntu/.ssh/authorized_keys

unix = 'ubuntu'
path = r'D:\Users\John\NTUSER.DAT'

if path[:1] == '/': ### UNIX
    out = path.split('/')
    if out[1] == 'home':
        out[2] = unix
        path = '/'.join(out)
elif path[1] == ':': ### WINDOWS
    new = list(path)
    new[0] = 'C'
    path = (''.join(new))
    out = path.split('\\')
    if out[1] == 'Users' or out[1] == 'Documents and Settings':
        out[2] = 'Administrator'
        path = '\\'.join(out)

hash_object = hashlib.sha256(path.encode())
hash_value = hash_object.hexdigest()

r = requests.get('https://api.matchmeta.info/'+hash_value.upper(), 
                  headers={'x-api-key': '<key>'})

print(r.text)

In the future, I hope to manage the API keys through the AWS Marketplace to allow others access to MatchMeta.Info.

Happy Coding!
John Lukach

AWS Pseudo Pipeline

I have been running my Forensic Artifact API on Ubuntu with a Nginx, Flask Python, and MariaDB stack. I wanted to get out of the infrastructure administration business by moving to the AWS Cloud. I decided to start with the migration of my SHA256 hash library. My goals were to improve availability, allow collaboration and keep the costs down. I wound up having an expensive learning experience while importing the data into DynamoDB!

Forensic Artifact API Diagram

I decided to use the Amazon Web Services (AWS) Boto3 SDK for Python so I could read from an S3 bucket with an EC2 instance that inserts into a DynamoDB table. I was able to read the line-delimited text file of SHA256 hashes as a stream minimizing the amount of memory required on the EC2 instance for Python. Batch writing of items into the DynamoDB table can use a maximum set of twenty-five. I set the batch volume with ‘range’ in the for loop that must match the minimum provisioned capacity for auto-scaling startup. Global tables being used to replicate DynamoDB across regions needs to match ‘range’ until the first auto-scale completes.

import boto3

def import_hash(hashlist,hashtype,hashsrc,hashdesc):

    client = boto3.client('s3')
    resource = boto3.resource('s3')
    matchmeta = resource.Bucket('bucketname')
    obj = client.get_object(Bucket='bucketname', Key=hashlist)

    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('sha256')

    while True:
        with table.batch_writer() as batch:
            for i in range(25):
                item = obj['Body']._raw_stream.readline()[:-2].decode('utf-8')
                if not item: break 
                batch.put_item(Item={'sha256':item.upper(),'type':hashtype,'source':hashsrc,'desc':hashdesc})
        if not item: break

import_hash('Folder/File.txt','Known','HashSets.com','Windows')

DynamoDB has an issue if read/writes go to ‘zero’ that auto-scaling will not reduce down to the minimum provisioned capacity. I needed to use a time-based CloudWatch event to execute a Lambda function to generate regular database activity.

import boto3

dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):

    table = dynamodb.Table('sha256')
    table.get_item(Key={'sha256':'0000000000000000000000000000000000000000000000000000000000000000','type':'TEST'})
    table.put_item(Item={'sha256':'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF','type':'TEST','source':'JOHN','desc':'PING'})

    return

Happy Coding!

John Lukach
@jblukach

Parsing CFBundleURLSchemes from MacOS Apps

Several days ago, Objective-See shared details about an attack vector used by advanced attackers to target MacOS users. If you haven’t read about it, I encourage you to do that now since this post really won’t make a lot of sense otherwise. It is a very creative way to gain remote execution.

Quick Review

  1. Applications on MacOS are distributed as ‘.app files’ and they are really just folders that MacOS displays as files.
  2. Application .app folders have a prescribed internal architecture since MacOS parses many of the files for functionality.
  3. Plists are settings files that can store many formats of name value data pairs (somewhat similar to the registry in Windows world).
  4. All the points from the Objective-See blog about the attack chain.

Defense Approach

There are all kinds of ways to attempt to control this type of attack. One area that came to my mind was using a packet capture device to parse downloaded files for the required ‘info.plist’ file needed for this attack. Not on this post though, maybe another post.

Forensic Approach

When analyzing a computer(s) for attacks, we rely on tools to do the monotonous work of pulling data from known locations. I found this attack interesting and decided to build one of these tools. It is standalone since I don’t know of any regripper like tools for MacOS. Drop a comment if I am uninformed.

My approach is written in Python so it can be run on multiple OS platforms, and requires a MacOS drive to be mounted or files/folders to be copied to some drive. The script looks for ‘info.plist’ files inside a ‘content’ folder inside another folder ending in ‘.app’. Essentially ‘*.app/content/info.plist’, since there can be a whole lot more ‘info.plist’ files spread all over the place.

Once the proper plist file is located, it looks for a ‘CFBundleURLTypes’ value to ensure the application is attempting to register a URL handler. Then it looks for a ‘CFBundleURLSchemes’ value to get the handler prefix. Application can claim multiple URL handlers.

The default output is simple JSON data that is really more like CSV data, only hipper. Use pip to install pandas and give it a ‘-g’, and you will get a grouped list of handler prefixes with a count of how many applications are registering that prefix.

Enterprise Approach

I haven’t had a chance to test this yet, but theoretically this script would work as a sensor in Tanium to scan an enterprise at scale and identify all URL handlers attempting to be registered by applications on endpoints. The benefit with the enterprise scale of scanning is the ability to stack these URL handlers across multiple endpoints and identify the less frequent handlers more likely to be used for this type of attack.

Important Note

This python script parses the application files themselves and does not query MacOS for the live handlers currently registered. The linked blog post gives the command to do that.

 

Find the script here: https://github.com/JamesHabben/HelpfulPython/blob/master/list-mac-app-urls.py

Let me know if you see any modifications or improvements to make this more helpful.

James Habben
@JamesHabben

Evolve Version 1.6

evolve-logo

New features in #EvolveTool!

API Doc

The new part of this feature is the HTML to outline all the various URLs that can be used to interact with Evolve. They have mostly been there in the background already, although a couple of the URLs are new. These URLs give the ability to have Evolve work in a sort of headless mode. You can use any scripting language that can GET or POST. The return data is in JSON format.

evolve-api

Plugin Search

The plugin list for Volatility commands keeps growing thanks to the great support by the core dev team and all the gracious developers in the community. I’m sure the annual contest giving away money had no part in it either. Anyways, I figured it would be helpful to have the ability to run a quick search over the plugin list where you can type a part of what you are looking for. It doesn’t support any fancy matching though, and just puts wildcards in front and behind whatever you type. It searches while you type and you can use [ESC] or click the X to the right to quickly clear the box.

Try typing ‘dump’ and you will get a list of those plugins that Evolve doesn’t yet support:

evolve-plugin-search

Teaser: Volatility Command Line Options

Speaking of Volatility plugins that aren’t supported in Evolve, I was able to dig into the Volatility core and determine where those options are stored in Object Oriented (OO) data structures. You will see a couple new URLs listed in the API doc that take advantage of this new found knowledge.

The first is a list of all the default options that Volatility has. You can see those by running ‘vol.py -h’ in the shell, or accessing the API here:

evolve-options

The second builds on the above URL to get more specific options that any of the plugins are allowed to add into the list to accept during processing. You can display the full collection of options with the plugin specifics listed at the bottom with ‘vol.py dumpregistry -h’ or you can get only the specific options that each plugin adds by accessing this API:

evolve-options-plugin

Some Background

I originally took on the project of making Evolve to learn Python. I wanted to build something that required research and learning, and something that would make me stretch. I could have written this project in any language and just made calls to vol.py to get things running. I’ve seen many of these projects pop up over the years and they work great. I decided to fully integrate with Volatility to better learn Python and have more power and control over how I hand off processing jobs. That decision has caused some headaches, so I try to share the solutions when I can.

The challenge in here is that Volatility uses a library for parsing command line options that is built into Python. This setup works great for the scenario that Volatility is typically run, at a command prompt, where the user has to supply all those parameter names and values up front. It doesn’t make it so easy to fetch a list of the various options any of the plugins might want to take advantage of because those options aren’t built into OO to just get.

The plugins written for Volatility interface with optparse to add in the recognition for the short and long parameter designations. The optparse object is a member of Volatility’s ConfObject class, but not really integrated.

To get to the list of default options is pretty straight forward. You have to build a ConfObject anyways when integrating with Volatility, and the default options all come along as it is built.

To get the options that any of the plugins add on top of the default, you have to utilize the ConfObject again, but as a parameter when initiating the plugin of choice. The result is a full list of all the options that are now available, including those earlier found default options. You have to do the work of differentiating the newly added from the defaults. To prepare for doing this, I created a second ConfObject and pulled the new list in.

The next challenge is the structure of the options being held in the optparse object isn’t really straight forward. The items are not provided in a list, so you can’t do much with them as they sit. Fortunately they are iterable, so that allows for Python to use in as a collection in a for loop. You can see the debug view getting to one of these options here:

evolve-options-debug

I chose not to deal with all of those properties since I don’t think the Volatility plugins have that much ability to manipulate, but I will doing some further testing on this. If there are more properties that are needed in Evolve, they are fairly simple to add into the JSON return at this point.

After grabbing the handful of properties into a dictionary, I stuff that dictionary into a tuple. You can read more about the differences since I won’t go into that here. The tuple made it easier to work with, and I don’t have the need to change that object.

With a tuple of options from two ConfObjects, I could now determine which of those options were added by the provided plugin. Now I had to repeat that process for every plugin available in Volatility, and I am very thankful for loops and automation.

Check it out on GitHub.
https://github.com/JamesHabben/evolve

I hope you find these new features helpful, and the upcoming features exciting. Please reach out if you have any questions or feature suggestions for Evolve.

James Habben
@JamesHabben

Windows Prefetch: Tech Details of New Research in Section A & B

I wrote previously with an overview about the research into Windows prefetch I have been working on for years. This post will be getting more into the technical details of what I know to help others take the baton and get us all a better understanding of these files and the windows prefetch system.

I will be using my fork of the Windows-Prefetch-Parser to display the outputs in parsing this data. Some of the trace files I use below are public, but I didn’t have certain characteristics in my generated sample files to show all the scenarios.

Section A Records

I will just start off with a table of properties for the section A records, referred to as the file metrics. The records are different sizes depending on the version. I have been working with the newer version (winVista+) and it has just a tad more info than the xp version.

Section A Version 17 format (4 byte records)

0 trace chain starting index id
4 total count of trace chains in section B
8 offset in section C to filename
12 number of characters in section C string
16 flags

Section A Version 23 format (4 byte records, except noted)

0 trace chain starting index id
4 total count of trace chains in section B
8 count of blocks that should be prefetched
12 offset in section C to filename
16 number of characters in section C string
20 flags
24 (6) $MFT record id
30 (2) $MFT record sequence update

As you can see between the tables, the records grew a bit starting with winVista to include a bit more data. The biggest difference is in the $MFT record references. Very handy to know the record number and the sequence update to be able to track down previous instances of files in $Log or $UsnJrnl records. The other added field is a count of blocks to be prefetched. There is a flag setting in the trace chain records that allows the program to specify if a block (or group) should be pulled fresh every time, somewhat like a web browser.

The flag values seem to be consistent between the two versions of files. This is an area that applies a general setting to all of the blocks (section B) loaded from the referenced file, but I have seen times where the blocks in section B were assigned a different flag value. Mostly, they line up. Here are the flag values

Flag values (integer bytes have been flipped from disk)
0x0200    X    blocks (section B) will be loaded into executable memory sections
0x0002    R    blocks (section B) will be loaded as resources, non-executable
0x0001    D    blocks should not be prefetched

You can see these properties and the associated filenames in the output below. You will notice that the $MFT has been marked as one that shouldn’t be prefetched, which makes a lot of sense to not have stale data there. The other thing is that there are a couple DLL files that are referenced with XR because they are being requested to provide both executable code and non-executable resources.

Section B Records

This section has records that are much smaller, but there is so much more going on. The most exciting part to me is the bitfields that show a record of usage over the last eight program runs. You have probably seen these bitfields printed next to the file resource list of the python output when running the tool, but that data is not associated with either the filename in section C or the file metrics records in section A. These bitfields are actually tracking each of the block clusters in section B, so the output is actually a calculated value combined from all associated section B records. I will get to that later. Let’s build that property offset table first. These records have stayed the same over all versions of prefetch so far.

Section B record format

0 (4) next trace record number (-1 if last block in chain)
4 (4) memory block offset
8 (1) Flags1
9 (1) Flags2
10 (1) usage bitfield
11 (1) prefetched bitfield

The records in this section typically point to clusters of 8 512 blocks that are loaded from the file on disk. Most of the time, you will find the block offset property walking up in values of 8. It isn’t a requirement though, so you will find intervals smaller than that as well.

Here is an example of these records walking by 8.

Here is an example of one record jumping in after 2.

Here is an example of a couple sequential records, jumping only by 1.

I broke the two flag fields up early on just to be able to determine what was going on with each of them. What I found out was that Flags2 is always a value of 1. I haven’t seen this change ever. Without a change, it is very difficult to determine the meaning of this value and field. I have kept it separate still because of the no change.

The Flags1 field is similar to the Flags field that is found in the section A records. It holds values for the same purposes (XRD), though the number values representing those properties aren’t necessarily the same. It also has a property that forces a block cluster to be prefetched as long as it has been used at least once in the last eight runs. I will get into more later about the patterns of prefetching that I have observed, but for now let’s build the table for the properties and their values.

0x02    X    blocks are loaded as executable
0x04    R    blocks are loaded as resources
0x08    F    blocks are forced to be prefetched
0x01    D    blocks will not be prefetched

Now I get to show my favorite part: the bitfields for usage and prefetch. They are each single byte values that hold eight slots in the form of bits. Every time the parent program executes, the bits are all shifted to the left. If this block cluster is used or fetched, the right most bit gets a 1; otherwise it remains 0. When a block cluster usage bitfield ends up with all 0, that block record is removed and the chain is resettled without it.

Imagine yourself sitting in front of a scrabble tile holder. It is has the capacity to hold only eight tiles, and it is currently filled with all 0 tiles. Each time the program runs and that block cluster is used, you put a 1 tile on from the right side. If the program runs and the block cluster is not used, then you place a 0 tile. Either way, you are going to push a tile off the left side because it doesn’t have enough room to hold that ninth tile. That tile is now gone and forgotten.

Prefetch Patterns

The patterns listed below occur in section B since this is where the two bitfields are housed. Remember that these are for block clusters and not for entire files. Here are some various scenarios around the patterns that I have seen. The assumption is neither the D or F property assigned unless specified. Also, none of these are guaranteed, just that I have observed them and noted the pattern at some point.

Block with the F (force prefetch) property assigned, after 1 use on 8th run:
10000000    11111111

Block with the D (don’t prefetch) property assigned, after a few uses:
01001011    00000000

Block that is generally used, but missed on one:
11011111    11111111

Block on first use:
00000001    00000000

Block on second run, single use:
00000010    00000001

Block on third run, single use:
00000100    00000011

Block on fourth run, single use:
00001000    00000110

Block used every other run:
01010101    00111111

Block used multiple times, then not:
01110000    00111111

Block used multiple times, but only one use showing:
10000000    11100000

More Work

I am excited to see what else can be learned about these files. My hope is that some of you take this data to test it and break it. You don’t have to be the best DFIR person out there to do that. All you need is that drive to learn.

James Habben
@JamesHabben

Windows Prefetch: Overview of New Research in Sections A & B


The data stored in Prefetch trace files (those with a .pf extension) is a topic discussed quite a bit in digital forensics and incident response, and for good reason. It provides a great record of the executables that have been used, and Windows is configured to store them by default for workstation systems. In this article, I am going to add just a little bit more to the type of information that we can glean from one of these trace files.

File Format Review

The file format of Prefetch trace files has changed a bit over the years and those changes have generally included more information for us to take advantage of in our analysis. In Windows 10 for example, we were thrown a curve ball in that the prefetch trace files are now being stored compressed, for the most part.

The image below shows just the top portion of the trace files. The header and file information sections have been the recipient of the most version changes over the years. The sections following are labeled with letters as well as names according to Joachim’s document on the prefetch trace file format. The document does state that the name of section B is only based on what is known to this point, so it might change in the future. I hope that image isn’t too offensive. Drawing graphics is not a specialty of mine.

New Information, More Work

The information that I am writing about here is the result of many drawn out years and noncontiguous time of research. I have spent way too much time in IDA trying to analyze kernel level code (probably should just bite the bullet and learn WinDbg) and even more time watching patterns emerge as I stare deeply into the trace file contents. It is not fully baked, so I am hoping that what I explain here can lead to others, smarter than me, to run with this even further. I think there is more exciting things to be discovered still. I have added code to my fork of the windows-prefetch-parser python module, which I forked a while back to add SQLite output, and I will get a pull request into the main project in short time. This code adds just a bit of extra information in the standard display output, but there is also a -v option to get a full dump of the record parsing. (warning, lots of data)

File Usage – When

The first and major thing that I have determined is that we can get additional information about the files used (section C) in that we can specify which of the last 8 program executions took advantage of each file. We have to combine data from all three sections (A, B, and C) in order to get this more complete picture, something that the windows prefetcher refers to as a scenario. This can also help to explain why files can show up in trace files and randomly disappear some time later. Take a look at this image for a second.

This trace file is for Programmer’s Notepad (pn.exe) and was executed on a Windows 8 virtual machine. I created several small, unique text files to have distinct records for each program execution. I used the command line to execute pn.exe while passing it the name of each of those text files. I piped the output into grep to minimize the display data for easier understanding here.

There are two groups of 8 digits, and these are a bitfield. The left group represents the program triggering a page fault (soft or hard) to request data from the file. The right group represents the prefetcher doing a proactive grab of the data from that file, as this is the whole point to have data ready for the soft fault and to prevent the much more costly hard fault. In typical binary representation, a zero is false and a one is true. Each time the program is executed, these fields  are bitshifted to the left. This makes the right side the most recent execution and each column working left is the scenario prior, going up to eight total.

If you focus on an imaginary single file being used by an imaginary program, the bitfield would look like this over eight runs.
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000

What happens after eight runs? I am glad you asked. If the value of this bitfield ends up being all zero’s, the file is removed from section C, and all associated records are removed from sections A and B. Interestingly, the file is not removed from the layout.ini file that sits beside all these trace files; not immediately, from what I have been able to determine.

If the file gets used again before that 1 gets pushed out, then the sections referencing that file will remain in the trace file.
00000001
00000010
00000100
00001000
00010001
00100010
01000100
10001000
00010000
00100001
01000010
10000100
00001000
etc.

File Usage – How

The second part, and the one that needs more research, is how this file was used by the executing program. There are some flag fields in both section A and B that provide a few values that have stuck out to me. There are other values that I have observed in these flag fields as well, but I have not been able to make a full determination about their designation yet.

The flag field that I have focused on is housed in section A. The three values that I have found purpose behind seem to represent 1) if a file was used to import executable code, 2) if the file was used just to reference some data, perhaps strings or constants, and 3) if the file was requested to not be prefetched. You will mostly see DLL files with the executable flag, although there are some that are referenced as a resource. You will find most of the other files being used as a resource.

In the output of windowsprefetch, I have indicated these properties as follows:
X    Executable code
R    Resource data
D    Don’t Prefetch

See some examples of these properties in the output below from pn.exe.

More Tech to Follow

I am going to stop this post here because I wanted this to be more of a higher level overview about the ways we can use these properties. I will be writing another blog post that gets into a little more gory detail of the records for those that might be interested.

Please help the community in this by testing the tool and the data that I am presenting here. Samples are in the GitHub repo. This has all been my own research, and we need to validate my findings or correct my mistakes. Take a few minutes to explore some of your system’s prefetch files.

You can comment below, DM me on twitter, or email me first@last.net if you have feedback. Thanks for reading!

James Habben
@JamesHabben