Tag Archives: Malware

Malicious PDF Triage

Today was the first time I was able to analyze malicious PDFs. I previously knew nothing about how to treat these potential infections, but learned the tactics through research.

Malicious PDFs usually spread through spam emails, depending on uneducated users to open the PDF attachment. The PDFs will generally execute malicious code when opened, exploiting a vulnerability in an outdated version of Adobe Reader or Java to open a backdoor into the system. From there, the infection will call home every so often, waiting for instructions by an attacker.

Therefore, handling these files are a bit different than playing around with a malicious executable or dll. Standard static analysis tools (like PE Explorer, PEiD) do not support PDF files. Uploading the file to VirusTotal also showed no results. Everything I previously had known about static analysis did not apply in this case. Fortunately, there are a series of free tools out there that will help identify what kind of PDF you’re looking at. Disclaimer: Tools used by Didier Stevens can be found here

I started off by using a tool called PDFiD by Didier Stevens. This tool is really helpful in determining the strings within a PDF. It’s a python script, so it has to be run in command line (Python also needs to be installed). In command prompt, navigating to the directory where the script is stored, running “pdfid.py filename.pdf” will give you the output (for Mac python is pre-installed, so the command would be “python pdfid.py filename.pdf“)

The output will look similar to what you see above — the object string on the left and the number of instances it’s found on the right. The objects/strings to pay attention to are boxed in red. The /Page string tells you how many pages the PDF is. *Most malicious PDFs are only one page in length* The /JS and /JavaScript strings will tell you if there’s JavaScript embedded. *In this case, it’s 0, but if there is an instance of JavaScript, this is a red flag and requires further investigation* The /AA and /OpenAction functions are equally as important, because an instance of this would allow the JavaScript within the PDF to execute without user interaction.

Now, lets say there was an instance of JavaScript embedded with the PDF. How do we pull this out? Fortunately, Didier Stevens also has another python tool called PDF-Parser, which will pull apart the objects that make up a PDF file and display them (I would suggest having the output save to a text file for easier viewing. The results can be a bit overwhelming, but if you know what you’re looking for, a simple Control+F in the text file for /JavaScript will make life easier) Running the command “pdf-parser.py filename.pdf > filename.txt” will do this for you. You will then be able to determine what the JavaScript does, and how that relates to the opening of the PDF file.

A tool I found later down the road called PDFScope will combine the functionalities of pdfid and pdf-parser into one GUI that separates results by tabbed functions. It’s pretty nifty and easy to use to quickly get everything you want to know into one place.
Another interesting tool is part of the PDF Toolkit (Pdftk). Using the switch data_dump with the tool will pull all the metadata for the file.


The timestamps are represented as yyyymmddhhmmss (so in this case, the file was created on 7/24/2012 at 00:28:27 [12:28:27am]) The PDFID’s are MD5 hash values for the information within the metadata to help identify the data. Also, in this case the Title is different than the Filename — something to keep in mind.

In the case I dealt with today, there was no fun JavaScript or other red flags of a malicious document. The only off-setting qualities was that it was one page in length, and was discovered in a series of spam emails. In this case, this was just a phishing email. It basically said that so-and-so’s email was pulled in an international drawing, the ONLY won $2,000,000, and to claim the prize you needed to send all this information (including copies of passport and license) to a guy in Belgium (he clearly stated he’s in Belgium…email actually came from the Netherlands).

Even though the file was “boring”..it was still a great day learning about all this!