Malicious USB Devices

I put together some slides over a year ago after working several cases involving suspicious USB devices. The slides cover some studies that threw USB devices on the ground, and a couple scenarios from the Verizon Data Breach Digest (shameless promo). There is a lot of significance in these links, and presenting these slides has shown me that this threat is very widely underappreciated.

We, and InfoSec and even more specifically as DFIR specialists, are not immune to an attack conducted through a USB device. Some of us hold a more traditional stance that the forensic workstation remains unconnected from any network connection, but this is an increasingly difficult stance to hold in more recent years. The source evidence drives are getting larger and larger, and this is forcing examiners to take advantage of network storage solutions. We are also seeing many tools with increasing reliance on network for collection and analysis through integrations with other products.

I worked with some others on my team to develop a forensic methodology that has been tried and true. It gives us the best chance of preserving any data that may exist on this USB device and protects our forensic infrastructure from any potential attack. I am putting this methodology in this blog post in an effort to get it out to a wider audience than the folks that have sat through one of my talks. You can see a video from BsidesSLC if you would like, or feel free to contact me and I would be very happy to head out somewhere to give the talk live.

The High Level Methodology

  • Collect image
  • Collect volatile data
  • Analyze file contents
  • Analyze volatile data
  • Collect firmware

The Methodology Steps

  1. Collect Image
    1. Physical machine
    2. Linux forensic boot cd
    3. Hardware USB write-blocker
    4. dd, dcfldd, linen, etc
  2. Collect Volatile Data
    1. Physical machine
    2. Windows OS – Small HDD, forensic wipe (0x00)
    3. Software USB write-blocker
    4. Collect before images: HDD & RAM
    5. Prep volatile collection tools & scripts
    6. Start PowerShell diff-pnp-devices.ps1*
    7. Insert USB, wait for a minute
    8. Finish diff-pnp-devices.ps1
    9. Finish volatile tools & scripts
    10. Collect image: HDD & RAM
  3. Analyze File Contents
    1. Automated AV scans
    2. IOC Searches
    3. File format specific tools
  4. Analyze Volatile Data
    1. Compare disk images
    2. Compare RAM images
    3. Review new devices from diff-pnp-devices.ps1
    4. Look for evil
  5. Collect Firmware
    1. Only needed if device has additional device
    2. Identify controller chip – ChipEasy
    3. Acquire correct tool to dump firmware
    4. Reverse engineer firmware

The PowerShell Tool

The script mentioned in this methodology is in my GitHub. It does a very simple thing.

https://gist.github.com/JamesHabben/

  1. Get a list of PnP devices
  2. Wait for user to continue after inserting USB
  3. Get another list of PnP devices
  4. Compare the two lists and print the differences

Takeaway

These USB devices can do a log of damage, and I continue to see a lot of very surprised faces during my talks. It is a real threat (Stuxnet!) and it needs to be accounted for in your incident response.

Here are the slides with a bit more information, and please reach out if you have questions.

James Habben
@JamesHabben

Compile Time Analysis of NotPetya

I had a thought the other day about some of the NotPetya / M.E.Doc (Medoc) initial infection vector that was released last week. I wondered if the attackers had full and complete access to the Medoc network and even source code. Did they have the ability to inject the malicious code into the source repository? Then just sit back while it got included into the build that was released to the customers. Or did they have more restrictive access to, say, the FTP server holding the update files?

Here are a couple references for those that didn’t keep up with the fast moving data related:

ESET wrote about the code that was injected into the ZvitPublishedObjects.DLL file

Cisco Talos wrote about their investigation on behalf of Medoc with access to internal servers

I wanted to see if I had some data to support the level of access the attackers had, even though I don’t have access to the internal systems. It is very possible that Talos has already made this determination and just not made the statement public.

Executable (exe) and Dynamic Link Library (DLL) files have a timestamp in the PE header known as the compile time. This is often used to identify characteristics of attackers, as many times they forget to cover their tracks with this field.

In my inspection, it appears to me that the attackers had full and complete access inside the network with the ability to inject their malicious code into the source repository (whatever Medoc uses).

Single File

The first thing I did was use PEStudio to analyze the affect DLL file. The compile time in this view appeared to be within a normal time period based on other information that has been provided surrounding this attack.

The timestamp is shown in PT since my system has that set. The UTC time is 2017-06-21 14:58:42. I checked a couple other PE files and found it to be within the same time range.

Multiple Files

There are hundreds of files in the Medoc program folder and a large percentage of them are EXE or DLL. It makes no sense to check things one by one when we have the power of automation. Check out AdamB’s post on clustering analysis using compile times.

I took a similar approach and wrote up a quick EnScript to identify PE files and then parse the 4-byte compile time value and dump it out. I used this approach because EnCase allows me to analyze and extract data without having to mount or copy files out. Also, I have been writing EnScripts for a couple years. 😉

The result tells me that the attackers got the code injected at the source. Look in the following screen shot and you won’t find any times that are overlapping. It also appears to have a reasonable amount of time between files based on the sizes.

Unknown

What I can’t answer with my own analysis based on the data available is whether the attackers stole the source code and compiled it on their own. According to Talos, they had access to change the NGINX configurations to have the internal server proxy connections out to a server under their control. Did the attackers inject the code into the internal repository? Did the attackers steal the source code and compile it outside of the Medoc network?

Hope you found this as a nice little interesting tangent from your daily tasks. i did!

James Habben
@JamesHabben

Fileless Application Whitelist Bypass and Powershell Obfuscation

Organizations are making the move to better security with application whitelisting. It is shown in the offensive side of the computer security industry. The frameworks, such as Metasploit, PowerSploit, BeEF and Empire, are making it very easy to build and deploy obfuscated payloads in all sorts of ways. It has become so easy that I am frequently seeing attackers using these techniques on systems that do not employ the added security measures.

There are plenty of solutions to mitigate these types of attacks, however I find they are not always configured properly. Take a read through @subTee’s Twitter feed and GitHub for many of the more creative ways he has shared. The attackers have raised the bar with the use of these techniques. If defenders aren’t deploying appropriate defenses, shame on them.

It Works

I wanted to share with you a few things from a recent engagement. The attacker had installed the backdoor almost a year before detection. They got in through a phishing attack, as in most cases. The detection? A kind and friendly letter from a law enforcement agency that had taken control of the command and control (C2) and was observing traffic to identify victims. The beaconing was surprisingly frequent for as careful as the attacker was in some other areas.

Can you confidently say that your endpoints are safe from these types of attacks? You don’t have to deploy prevention or detection tools for every part of the kill-chain, but you would be best served to have at least one. Or not, YOLO.

Persistence

In order for any malware to be effective, it has to run. I know, it is a revolutionary statement. It is a concept that is missed by some and it is a very critical piece. There are a finite number of places that provide malware the ability to get started after a system has been rebooted. Keep in mind that the user login process is a perfectly acceptable trigger mechanism as well, and there are a finite number of places related there too.

Just like the various creative and new application whitelist bypass techniques, there are creative and new persistence mechanisms found periodically. Adam has posted quite a few of them on his blog. The good news is that the majority of attacks don’t get that creative because they don’t have to.

The run mechanism in this system was HKCU\Software\Microsoft\Windows\CurrentVersion\Run

You can see that the attacker has chosen to use cmd to start mshta. The code following that command is javascript that when run creates an ActiveX object that loads more code from a registry path. So many layers!

Obfuscation

The run mechanism loads in code that has been obfuscated by the attacker. It starts off creating another ActiveX object and then using powershell.exe to interpret the code following. The obfuscation is enough to prevent keyword searches from hitting on some of the known API function involved with these attacks, but it is not a difficult one to break. All you need is a base64 decoder. I recommend that you use a local application based since you never know what kind of thing will be showing up and an online javascript based decoder is susceptible to getting attacked, whether intended by the attacker or not.

The path referenced in the run value and pictured below is HKCU\Software\Licenses. I have blurred some code and value names in an abundance of caution for potential unique identifiers.

Decoding

My preferred tool for decoding this is 010 Editor. It is not free, but it is worth its license cost for so many things.

First thing to do is copy the text inside those quote marks. Don’t include the quotes since that will throw off your base64 decoding.

Now you just create a new document in 010 and use edit > paste from > paste from base64.

Magically you have some evil looking PowerShell code.

Take a look over at this powershell code from @mattifestation and you will hopefully notice that it follows the same flow. It looks like someone simplified the code from the blog post by removing the comments and shortening the names of the variables. Otherwise it is identical.

Payload

Line 2 of the PowerShell code loads the registry data from a different value in the same path. Line 14 then copies the binary data from the variable into the memory space for the process that was created, about 15kb of it. Line 15 then kicks it off, and the binary code takes over.

The binary is a shell code that decompresses a DLL image with aPLib and writes it into the same process space. The resulting DLL has not been identified by any public resources, so I can’t share it with you here. It is very similar to Powersniff and Veil, for those interested in the deeper analysis.

Raise Your Bar

Defenders, the bar has been raised by the attackers. Make sure that you are following suit, or better yet, raising it even higher.

James Habben
@JamesHabben

Blocks in Practical Use

Last week, John Lukach put up a post about how to use some tools to find pieces of files left behind after being deleted, and even partially overwritten. I wanted to put together a short post to give a practical use of this technique for a recent case of mine.

The Case

The request came in as a result of an anomaly detected in traffic patterns. It looked like a user had uploaded a significant amount of data to a specific cloud storage website. My client identified the suspected machine, and got an image to preserve the data. Unfortunately, this anomaly wasn’t caught right away, so a couple months passed before action was taken. They took a pass on the image with some forensic tools, but they weren’t able to identify anything relating to this detection, so they asked me for a second look.

My Approach

I started off with a standard pass of the forensic tools. Not because I don’t trust their team, but because tool versions can have an effect on the artifacts that get extracted and I wanted to be sure of my findings. Besides, if you don’t charge for machine time, then this really doesn’t add much in the way of cost in the end. This process did two things for me. The first thing I got from it was that I could not find any artifacts related to the designated website. The second thing was that the history and cache looked very normal. In other words, it didn’t look like the user had cleared any history or cache in an effort to clean up after themselves.

Of course, I am now thinking that they could have done a targeted cleanup and deleted very specific items from their browser. I get some reassurance that this was not the case by usage of some of the deep scan options available in the forensic tools that I used. These tools will scrounge through unallocated clusters, on command, in search of data patterns that are potential matches for deleted and lost internet artifacts. I found none, again, in this more extensive search.

Let’s put on our tin foil hats now, and really go for it. The user could have wiped and deleted individual files. Then gone into the history and cache lists to remove the pointers of those files. This would take care of the files that are still currently allocated. What about all those files that get cataloged in the cache, but quickly discarded because the server instructed this by sending some cache control headers? These files would have been lost to unallocated clusters at some point before the user thought about their cleanup actions. The only way to cover those track would be to use a tool that can do wiping of unallocated clusters. If the user went to this extent, dare I say they deserve to get away with it?

What If?

My client had a secondary thought in the event that we were not able to find traces of user action involved with this detection. They wanted to see if the system was infected with some malware that would have generated this traffic. I did the standard AV scans with multiple vendors and didn’t find anything. I followed that up with a review of the various registry locations that allow for malware to persist and autostart.

Verizon has a great intel team, and I asked them for information on possible campaigns involving the designated website. They gave me some great leads and IOCs to search for, but they did not pan out in this case. What a great resource to have!

After reviewing all of this, I was not able to find any indications of a malware infection, much less one that was capable of performing the suspected actions.

Points to Disprove

Here is the point where block hashing can make a real difference. I will show you how I used block hashing to disprove (as much as the available data will allow) three different points.

  1. Lost Files – User deleted individual history and cache entries to cleanup their tracks and the file system lost track of the related files
  2. Unallocated – User went psycho-nutjob-crazy and wiped unallocated clusters after deleting records
  3. Malware – User wasn’t involved and malware done the dirty deed

Disprove Lost Files

In order to prove the user did not delete these selective entries from the browser history and cache, I need to find the files on disk, most likely in unallocated clusters. The file system no longer has metadata about them, so the forensic tools will not discover them as deleted or orphaned files. I could carve files from unallocated clusters using the known headers of various file types, but this is a very broad approach. I want to identify specific files related to this website, not discover any and all pictures from unallocated.

I start by using HTTrack to crawl the designated website. I want to pull as many files down as I can. These are all the files that a browser would see, and download, during normal interactions with the website. I got a collection of a few hundred files of various types: JPEG, GIF, SWF, HTML, etc.

The next step is to split and block hash these files. On this step, I used EnCase and the File Block Hash Map Analysis (FBHMA) Enscript written by Simon Key of EnCase Training. FBHMA has the ability to do both sides of the block hashing technique and presents an awesome graphic for partially located files. I applied it against my collected files, and then applied those hashes against the rest of the disk. The result was zero matches.

This technique is not affected by the amount of fragmentation or the amount of overwriting, as long as there are some pieces left behind. The only way to beat this technique is to completely overwrite the entire set of blocks for all of the files in question. A very unlikely scenario in such short time without a deliberate action.

Disprove Unallocated

This point takes a little more work to disprove, and it might be subject to your own interpretation. The idea here is that files have been lost to unallocated clusters without control. The user was not able to do a targeted wipe of the file contents before deletion, so the entire area of unallocated clusters would have to be wiped to assure cleanliness.

The user could make a pass with a tool that overwrites every cluster with 0x00 data. This is enough to clear the data from forensic tools, but it can leave behind a pretty suspicious trail. If you used block hashing against this, it would quickly become apparent that the area was zeroed out. Bulk Extractor provides some entropy calculations that make quick work of this scenario.

The other option, and more likely to be used, is to have the wipe tool write random data to the cluster. Most casual and even ‘advanced’ computer users picture common files like zip and JPEG to be a messy clump of random data thrown together in some magic way that draws pictures or spells words. In DFIR, we learn early on that seemingly random data is not actually random. There are recognizable structures in most files, and unless the file involves some type of compression, it is not truly random when measured in terms of entropy. My point in all of this is that a wiping pass that writes random data would cause every cluster of unallocated to show high entropy values. This would raise eyebrows because this is not normal behavior. Some blocks contain compression or encryption and would show high entropy, but others are plain text which registers rather low on the entropy scale.

So with a single pass of Bulk Extractor and having it calculate hashes and entropy, I was able to determine that 1) unallocated clusters was not zeroed out and 2) unallocated clusters was not completely and truly random. It, again, looked very normal.

Disprove Malware

The last point to disprove was the existence of malware on the system. I already established that there was no indications showing of allocated malware, but I can’t ignore the possibility of there having been some malware installed which later got deleted or uninstalled, for some reason. Again, without a full and controlled wiping of this data off the drive, it would leave artifacts in unallocated clusters for me to find with block hashing.

This step is a much larger undertaking. It’s because I am employing the full 20+ million samples that are generously shared by VirusShare.com and users. I didn’t to do all of the work, though, because John has done that for us all. He has a GitHub repo with all the block hashes from VirusShare. Just be sure to remove the NSRL block hashes since those malware authors can be quite lazy in copying code from other executable files. It’s a beast to get up and running, but well worth the effort. Maintaining it is much easier after it’s built.

Now I have a huge data set of known malware, adware, and spyware sample files, and various payload transports like DOC and PDF files. If there was anything bad on this system, my data set is very likely to find it. The result was no samples identifying more than a single block on the disk. You have to allow for a threshold of matches with this size of sample data. A single matching block from a possible 1000 blocks for sample file of 500kb is not interesting.

Value of Blocks

This turned out to be a rather long post, but oh well. I hope this helps you to see the value in the various techniques of applying block level analysis in your cases. The incoming data in our cases is only getting larger, and we need to be smart about how we analyze it. Block hashing is just one way of letting our machines do the heavy work. Don’t be afraid to let your machine burn for a while.

You may be thinking to yourself that this is not a perfect process when it came to the malware point, and I may agree with you. We are advancing our tools, but we need to advance our understanding and interpretation of the results as well. For now, we handle that interpretation as investigators but the tools need to catch up. Harlan recently posted about this as a reflection on OSDFcon and it’s worth a read and consideration.

Happy Hunting!
James Habben
@JamesHabben