PDF Malware IOC Extraction

  • Home
  • PDF Malware IOC Extraction

We’re going to talk about how to address infected PDF files and extract malicious indicators from within them without endangering yourself or your PC. Dealing with infected pdf and doc files happens nearly all day in IT security operations centers. The users may report phishing, and our job as security analysts will be to figure out if those files are malicious or not. We have to do that safely, quickly, and accurately.

Let’s consider a scenario where we have received a pdf file in the mail or other methods. We don’t know whether the pdf file is infected or not, and we are not able to use most of the soc tools. So we have to manually check if the pdf contains any phishing attack.

The first thing we need for malware analysis is to get ourselves a virtual machine. It is highly advisable never to play with any malware files on your computer or any computer you care about.

The easiest way to seclude malicious files in a safe space is to put them in a virtual machine and then isolate that virtual machine from the rest of the network and our computer.

The virtual machine that is going to be discussed in this blog is REMnux.

It is a virtual machine full of reverse engineering tools ready to go premade for the users.

I would highly recommend downloading this first. On the home page, you can click on the distro and download REMnux VM.

When you click the download option, it will take you to the next page, and there we can download the ova file from a primary or mirror source.

I am using a VMware workstation, so I have downloaded the ova file from the general category. If someone uses VirtualBox (Oracle), they can download the ova file from that category.

In the VMware workstation, we can directly import the ova file.

Once the installation is finished, we will be moved to the virtual machine with Linux and many pre-loaded malware analyzing tools.

In this blog, we will see only the basics of dealing with phishing in PDF and office doc files.

So first, we need an infected file to work with. For that, we are going to download a file from any.run it’s an online malware sandbox. There are other online sandboxes available too. To download the sample file from the sandbox, we are required to have an account on the site. It’s applicable to almost all sandboxes, and most of it will be free accounts only.

On the webpage, go to services and select public tasks, which will be on the left side of the page.

There will be multiple files available on that page. We are looking only for the malicious PDF file, so using the filter option to search for the pdf file with the verdict malicious will give the results we needed. I have downloaded a random pdf file named VR-009.pdf.

In the REMnux terminal, we are going to check this file without opening it. The file will be available in the download folder.

It can be seen that the file is in zip format, so first, we need to unzip it. These kinds of sample files will usually be password-protected so that we won’t accidentally open them and infect our system.

The password for the file was given when the sample was downloaded. Usually, it will be infected for all the samples.

Now the pdf file is extracted, and we need to check for any threats without opening it. There is an easy way to avoid opening this and still kind of extracting what might be inside.

In most pdf files, the threat will be some kind of link, so when you open the pdf, it will direct you to the web link with the file. Our job is to extract that link without opening the pdf as it might have caused some exploits.

The easiest way is to use the strings command, and strings is a command that’s basically in every Linux. So now we use the strings command with the name of the pdf; it is also recommended to pipe into less command.

strings VR-009.pdf | less

The output of strings is going to be every sequence of printable ASCII characters in that file.

It is easy to find the URL in the pdf by using the strings command as printable text. So the easiest way to look for that URL without having a sandbox or anything like that is by looking for the word http in the pdf file. For that, we are going to use the grep command.

strings VR-009.pdf | grep http

The results show the URL found in the pdf. This might be a phishing attack where the attacker wants the user to go to that link in that file.

Now, let’s check how the malicious pdf file might look in the sandbox tool. If we click the pdf file in any.run website, it’ll show all the reports below.

The pdf file opens the link to some webpage, and it continues to YouTube channel and asking to change some settings.

So, working as SOC analyst, if we get this pdf file for verification and don’t have a sandbox or other tools, this is the easiest manual method to extract the link without exposing ourselves to the risk.