URLhaus File Grab — Lauren Proehl

For October, I’m back at it again with leveraging public submission sites for malware samples. This month I am using results from URLhaus, which is self described as ‘a project from abuse.ch with the goal of sharing malicious URLs that are being used for malware distribution.’

https://github.com/triw0lf/urlhaus_scripts/blob/master/urlhaus_badfiles.sh

URLhaus has a wealth of research and analysis features that they make available for free. The most exciting part? As long as you don’t want to submit URLs, you don’t need an API key! Once again we find there are amazing free resources with a relatively low barrier to entry for API usage and research purposes. URLhaus provides fantastic documentation that I would recommend reviewing outside of this script.

This script will be using the database dumps that URLhaus publishes every five minutes. These come in CSV format and you can preview what the data looks like here. When you are testing, make sure you don’t pull this CSV more than once every five minutes!

Once the script grabs all online submissions from the URLhaus Online Database Dump, it will format the CSV using the delimiters and check for entries from the current date. The script will also pull out any submissions ending with files, and not directories. This reduces the number of false positive index files you will receive. Once those results are matched, the script will sort by unique submissions and pass the date and raw submission URL to a new holding file. The holding file will then be read into a while loop and attempt to wget the file. The sample download portion of this script has been optimized to emulate standard endpoints, instead of looking like an obvious research server. Specific wget headers have been added to make the user agent appear to be Google Chrome on Windows 10, and be an English speaking host. If you run into malware that claims to be online but can’t be downloaded, I recommend playing around with other common header variations, which I included in the script.

For resources on changing your header values, I recommend the following:

Before using this script, make sure you have the following set up:

A research server for collection
Some sort of malware analysis lab or automated malware ingestion framework
(NOT REQUIRED) Alias set for script

A sneak peak into what running the script looks like:

Double checking the alias I set to make background execution and cron jobs cleaner and easier.

Once you have an alias set (or not, I’m not your mom) it is very easy to execute. You will get the job ID back upon execution.

Malicious files start coming in super quick in their native download naming convention.

Lots of Windows focused malware, to be expected, but always interesting to see what is flying around the internet. To check things that try to be sneaky by only doing a file extension as the name (looking at you ‘.i’) you can run ‘file .*’ as well.

You can find me at my Contact page or on Twitter - @jotunvillur. I’m open for questions, feedback, or general chitchat about security!