As many Facebook users discovered this week when a partial outage revealed the hidden image tags attached to users’ pictures, images can carry a lot of data that’s normally invisible to the human eye. The kind of metadata associated with Facebook and Instagram pictures, though, is nothing compared to the sophisticated methods threat actors use to craft images that can deliver malicious code or exfiltrate user data. Over the past few years, there has been a noticable increase of in-the-wild malware campaigns using the art of steganography and steganographic-like tricks to embed hidden messages in pictures and other “carrier”? files. In this post, we take a look at what steganography is and how it is being used by threat actors.
Steganography is a technique that can hide code in plain sight, such as within an image file. Often just referred to as stego, the practice of concealing messages or information within other non-secret text — inside a “carrier”? message — means malicious actors can use this technique to compromise devices just by hosting an image on a website or sending an image via email.
While it’s not necessary that either the hidden data or the carrier file have to be images, the fact that digital images are just streams of bytes like any other file makes them a particularly effective medium for concealing secret text and other data. When they open a picture on a device, few people ever have reason to look beyond the visual presentation displayed to what lies hidden inside the
.bmp or other image file format.
Steganography is a form of obfuscation that is quite different from cryptography, which is the practice of writing coded or encrypted messages. Cryptographic messages are obviously hiding something: they typically look like gibberish and require specialist methods to decode.
Steganographic messages, on the other hand, look like ordinary messages but artfully conceal something unexpected. A simple example using a familiar technique illustrates the basic idea behind steganography:
The secret message, “HelLo, worlD’ is not encoded, the viewer only has to know to look at the message in a certain way to reveal it, and we didn’t have to add any extra data to the “carrier”? in order to transmit it. Although the implementation of image steganography is far more technical, it’s basically the same idea at a lower level.
In this trivial example, it is the human brain that decodes the concealed message in the plain text, but computer programs read bytes, not natural language, and this turns out to make it possible to conceal messages in plain sight that are easy for computers to parse and simultaneously almost impossible for humans to detect without assistance.
In fact, given the nature of image file formats, it’s possible to conceal not just text strings but to also hide entire files in
.jpg and other image formats. Depending on the technique used, this can also be done without inflating the overall file size of the original image.
To understand how image steganography works, let’s take a look at some basic ways you can hide text in an image file.
One simple method is simply to append a string to the end of the file. Doing so does not prevent the image from being displayed normally, nor does it change the image’s visual appearance. Here, we simply append “hello world” to the end of the file. The output from
hexdump shows us the extra bytes added.
The plain text string can easily be dumped out or read by a program. In this case, we’ll just use the
xxd utility to reverse the hexadecimal and print it out in plain text.
echo 68 65 6c 6c 6f 20 77 6f 72 6c 64 0a | xxd -r -p
The same idea can be used to attach a complete file to an image using the RAR archive format. An image viewer only reads the code that relates to displaying the image and ignores any other files contained within the archive. A malicious actor or program, though, can easily extract the appended file.
In this example, the file
new.jpg displays a picture when opened in an image viewer application, but when inspected using the WinRaR archiving utility, we can see that the unpacked
.jpg file contains a secret 28 byte text file,
These kinds of simple techniques may be useful for exfiltrating user data, perhaps, but they suffer drawbacks. First, they inflate the file size and secondly, they change the file’s hash. Also, they are pretty easy for security software to detect because of their unexpected format.
A better approach is to get down into the code at a binary level and manipulate the least significant bits (LSB) of individual pixels. Pixels in a color image can be represented by 3 bytes, one each for RGB (Red, Green, Blue). Suppose we have three bytes representing one particular color, in this case orange:
The least significant bits – the last four if we’re reading left-to-right – do not make much of an impact on the color’s visual appearance.
1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1
We can change those to anything we like and the pixel will still look pretty much the same. So, let’s take a completely different color, turquoise, say :
100 1 100 1
1 100 1 100
1 100 1 100
And replace the last 4 bits in the code for orange with the first four bits of the code for turquoise, to produce this composite RGB:
1 1 1 1 100 1
0 1 1 1 1 100
0000 1 100
There’s no discernible impact on the appearance of the color this generates.
But if we construct a program to read and extract these last 4 bits separately, we have effectively hidden the code for turquoise inside the code for orange. Two pixels for the price of one, since there’s no increase in the file size. We can transmit our hidden message without increasing the bandwidth of the original message and without manipulating the file format, so there’s nothing for simple detection methods that rely on file scanning to find. Indeed, the code is completely obfuscated until it is reassembled by the attacker.
In short, this means an attacker can use the last four bits of encoded RGB data to write other data without significantly degrading the visual presentation of the image or inflating the file size. The hidden data can then be read-off by another program and used to reconstruct a malicious file or to exfiltrate user data.
LSB manipulation is only one of a number of steganographic techniques. There are actually a number of other methods by which images and other kinds of files can be manipulated to hide secret code. Attackers have even used steganography in network protocols, so-called ‘network steganography’, to carry concealed messages. In all cases, the principle remains the same: hide in plain sight by piggybacking an invisible message on a visible carrier.
Cerber – embeds malicious code in image files
DNSChanger – uses PNG LSBs to hide malware AES encryption key
Stegano – PNG formatted banner ads containing malicious code
Stegoloadr (aka ‘Lurk’) – this malware uses both steganography and cryptography to conceal an encrypted URL to deliver later stage payloads
Sundown – white PNG files are used to conceal exploit code or exfiltrate user data
SyncCrypt – ransomware that hides part of its core code in image files
TeslaCrypt – HTML comment tags in an HTTP 404 error page contain C2 server commands
Vawtrak (aka ‘Neverquest’) – hides a URL in the LSBs of favicons in order to download a malicious payload
Zbot – appends data to the end of a JPEG file containing hidden data
ZeroT – Chinese malware that uses steganography to hide malware in an image of Britney Spears
Hiding malicious code in images and other carriers is just one of the many techniques threat actors leverage in their attempts to bypass AV security suites. Regardless of the techniques used, malware authors always have the same aims: to persist on the endpoint, traverse the network, and collect and exfiltrate user data. To achieve these objectives, malware authors leave footprints that can be detected by behavioral AI solutions.
Hiding a file, picture, message or even a video within another file can be an effective way for malware authors to obscure either their own payload or to exfiltrate user data. Given the popularity of image sharing on social media sites and the prevalance of image-based advertisements, we expect the recent trend of using steganography in malware to continue. Combined with how difficult it is for end users to spot a maliciously crafted image file, it’s vital that enterprises are using behavioral AI software to detect the execution of malicious code, regardless of whether it originates from an image or other file, or even if it is fileless malware. If you are not already protected by SentinelOne’s autonomous endpoint solution, contact us for a free demo today to see how it works.