Analysis of an Evasive Backdoor

December 17, 2013, 8:51 pm

Authored by: Roman Vasilenko, Kyle Creyts

The Initial Infection Vector - Nuclear Pack

A number of other posts have covered the Nuclear Pack Exploit Kit . While this EK (exploit kit) may not be as popular as kits such as g01pack or BlackHole, and may not contain nearly as many exploits as CoolEK or Phoenix, it still sees use.

Among other things, Nuclear Pack is known for using layers of redirection, first checking for user activity before delivering malicious code.

Nuclear Pack is known to be used by a number of different criminals for distribution of a variety of malware families, mostly crimeware; in this post, we detail an unusual incident wherein a customer gets hit by the Nuclear Pack exploit kit.

Here you can see the detection in the Previct UI:

This is how the event detail looks:

This Java payload successfully exploited the victim, immediately downloading the malicious executable load2.exe (78cfa36112cd0797690327a9a07d5890) as seen here:

A Curious Executable, A Detailed Analysis

When we first looked at the initial dropper, for a moment, we thought that it was a false positive. The sample didn’t look like malware at all. A quick check over at VirusTotal didn’t help us to make a decision; according to the VT result, the sample also looked like a probable false positive.

The Dropper

The initial dropper is a NullSoft Installer (78cfa36112cd0797690327a9a07d5890), which drops 6 files to temp directory:

As you can see, the dropped executable has an icon resembling a “Settings” or “Updates” symbol, and one of the dropped DLLs has a description of “Google Chrome Patches”

Then it starts the executable file, wobiqacaxa.exe

The Second Stage

The second stage executable does some math computations like:

imports functions from other DLLs:

and calls them in the main function:

it also calculates the remaining battery life, and that’s the extent of the capability present in the executable. There is nothing suspicious so far.

Let’s look at the DLLs. First of all, they have very small size, and each have one export function. The exports don’t look very suspicious, and after examining a few of them, one is tempted to say “it is something, but not malware, this feels like just wasted time”, and you might consider stopping here... Then you remind yourself of the utter oddness of the behavior you have examined in this “installer” so far, and continue the analysis.

Have a look:

womajejunuc.dll exports Nalexavo

tapevacanop.dll exports Miqudigob

horikipusac.dll exports humolu

None of this seems even a little bit malicious!

But something nags at you, and you don’t want to stop digging. Why?

The Juxtaposition

Because the code doesn’t make any sense. It doesn’t make sense for a legitimate binary to:

solve a math function such as “y = 10x - cos (2x)”
get the local currency format, and apply it to the string “-3.80” and then do nothing with it
calculate the remaining battery life
enumerate list of the drives

All of these are just common routines which a legitimate program might make use of, but probably not together and probably not in this order... In this case, these are fake functionality which seems to have only one aim: to confuse an automated detection system, an anti-virus, or even an analyst who attempts to manually analyse the sample.

All doubts are gone when we examine the file we identified earlier as “binary data.”

It definitely bears a strong resemblance to the pattern of data encrypted by stream cipher with a short key.

The Unpack

So, besides aforementioned fake logic, this malware

calculates a decryption key (provided by function humolu from horikipusac.dll)

reads the binary file to memory ( provided by function Nalexavo from womajejunuc.dll)
decrypts the data in the memory (provided by function Miqudigob from tapevacanop.dll)

And now onto the last DLL:

Qotowokami.dll exports the function Siwusonivin which implements the quite standard pattern of injecting decrypted data to a new process:

GetCommandLine and PathGetArgs to get the path of the original installer. This works because the installer starts the dropped executable with itself (the installer) as a parameter in command line
CreateProcessA to create a new process of the installer, but with suspended status.
ZwUnmapViewOfSection and VirtualAllocEx to free original process memory and to allocate memory for decrypted data inside the created process.
WriteProcessMemory to copy decrypted data to the process space.
GetThreadContext + SetThreadContext + ResumeThread to change the entry point and resume the thread in the process.

But the function gets all of its API functions dynamically by calling LoadLibrary and GetProcAddress , and Qotowokami.dll itself doesn’t contain any of the functions’ names. They are located in decrypted memory. Thus, it is impossible to see real payload of this DLL without decrypted data.

Sandbox And AV Evasion

The malware is written in such a way that if the components were analyzed separately, or if any of components is lost, it is very difficult to identify what the sample is supposed to do, and if it is malicious or not.

Since most antivirus products do not really analyze binary files (of unknown format) except by checking hashes of a whole file, it is becoming trendy among malware authors to keep malicious payloads encrypted in non-executable binary files, and extracting the encrypted contents using a separate file, a loader/decrypter which doesn’t look suspicious. We already wrote about a similar technique in a previous blog post: http://www.lastline.com/an-analysis-of-plugx

Our analysis platform detected an evasion attempt: the sample tries to check if it is working in a sandbox environment by reading the registry key

HKLM\SYSTEM\CURRENTCONTROLSET\SERVICES\DISK\ENUM

and comparing the value data with the substrings:

“vbox” - (would appear in a VirtualBox VM)

“qemu” - (would appear in a QEMU VM)
“vmwa” - (would appear in a VmWare VM)

The decrypted data is not an extracted payload yet, but rather another wrapper which first checks if it is running in a sandbox environment:

Such sandbox evasion attempts are becoming increasingly common.

After that, the wrapper resolves the API functions.

It should be noted that the wrapper is looking for addresses of DLLs and API functions by hashes. Malware authors definitely do not want the malware being detected simply by searching for suspicious strings, such as suspicious API function names!

The wrapper finally decrypts the malicious payload , then builds an import table for it in an interesting way:

For each API function it checks the first instruction.
If the instruction is jump (0xEB, 0xE9) the wrapper gets the target address and repeats the check recursively.
When the first non-jump instruction is found, the wrapper copies the instruction to allocated memory and adds a jump after it to the second instruction of the real API function. With this technique, the malware evades hooks of security components (such as HIPS).

Below is an example of malware’s “imports”.

Note that it uses jumps (usual imports construction call [x] / jmp addr) :

But instead of addresses of API functions the jumps lead to specifically generated code (described above)

Unwrapped At Last

The malicious payload is position-independent code, and after reconstruction, it looks like this:

It is a typical “backdoor” or “bind shell” which listens to a port and executes received commands. And for all the work put into evasion and anti-AV, the connection is not encrypted and is not protected by any authentication methods.

The Conclusion

A brief graphical summary of the incident:

Name of dropped file	apparent filetype	md5	role
Qotowokami.dll	PE32 executable (DLL)	33a3d73982ff030150429f9e50ff9a00	injector of the decrypted routines
horikipusac.dll	PE32 executable (DLL)	2551a6ed18384e6dececb5fc45bc839f	decryption key calculator
jepuculoguh.vat	binary data	3c641ccae35380feebe9d2d53ece8da9	encrypted binary data
tapevacanop.dll	PE32 executable (DLL)	2bb51cf2091d124d15a28b19f9fd5326	decryptor of encrypted memory
wobiqacaxa.exe	PE32 executable	ddb6d2b7a78e71522893c349ddee5195	second stage executable
womajejunuc.dll	PE32 executable (DLL)	5b1ff7476bf1787e4df9c9b74b05ba16	reader of encrypted file

Has all of this effort been to protect a simple TCP bind shell from detection?

Why would an attacker deploy this with an exploit kit hosted on the Internet, outside of the victim’s network, where an attacker cannot be sure they will be able to call in to the host they have compromised? The victim’s IP was in RFC1918 space, and would not be reachable by an attacker over the Internet.

One possible explanation could be that the attacker is coming from inside the house (network)! An attacker inside of the network would probably be less concerned about encrypting their connections to other local machines than they would about traffic they were moving through network edges.

This appears to be a lateral movement technique used by an attacker who has already established a beachhead inside the victim’s network, and wants to get access to another host on the same network. Using information gathered from the first machine, they will craft a believable message (a spear-phish), complete with a compelling reason for the intended victim to visit an included URL, taking the victim to an exploit pack, such as one described at the beginning of this article.

↧

An Analysis of PlugX

December 17, 2013, 9:09 pm

≫ Next: Automatically Detecting Evasive Malware

≪ Previous: Analysis of an Evasive Backdoor

Authored by: Roman Vasilenko, Kyle Creyts

Introduction

There are a number of articles recently written about a Remote Access Trojan called PlugX or Korplug (with older variants known as Sogu, Thoper, TVT, or Destory RAT ) which has recently seen increasing use in targeted attacks. These articles

This article is our contribution to the publicly available knowledge about:

development of this RAT
the design and function of the malware itself,
and the C&C protocols and infrastructure which have been used to control infected hosts.

INSIGHTS ON PLUGX DEVELOPMENT

Other blogs have offered some insight into the development of this project , and we would like to extend their findings.

THE DEMO MESSAGE

The malware has a special demo message inside:

This message could be shown in two procedures:

An installation procedure - when the malware copies its files and registers them in the registry (for autorun or as a service)
A procedure for creating a new pipe - it creates a named pipe to communicate with other instances of the malware that work in the same system. The malware uses this functionality to start the processes of its instances as different users.

By default (in a default constructor) a variable ‘is_demo_version’ is set as 1, but it could easily be changed by loading new settings (from a file or from the Internet).

Probably, the author intended to use this option to protect the malware from theft during demonstrations. Also, it is obvious that after unpacking the malware, it is very easy to change this option. One can infer that the people for whom this malware was demonstrated probably weren’t malware developers, as this appears to be a very simple protection.

THE LOGGING

This malicious program is a rather complex software project. We analyzed several samples, and some of them (probably older ones) have a logging function:

plugx_log(source_name, line_number, message_id)

We analyzed the parameters of this function and determined that the source code of this malware project consists of at least 35 different cpp files, most seeming to have more than 200 lines of code.

Format: file_name estimated_number_of_lines

XBuffer.cpp 252

XPlgLoader.cpp 1087

XPlugService.cpp 505

XPlug.cpp 391

XPlugShell.cpp 603

XHide.cpp 416

XPlugTelnet.cpp 623

XPlugRegedit.cpp 719

XSetting.cpp 645

XPlugDisk.cpp 966

XPlugOption.cpp 229

XPlugPortMap.cpp 237

XJoin.cpp 820

XInstallUAC.cpp 181

XSo.cpp 174

XSoUdp.cpp 206

XDList.cpp 101

XException.cpp 39

XInstall.cpp 451

XSessionImpersonate.cpp 432

dllmain.cpp 56

XOnline.cpp 1184

XThreadManager.cpp 122

XBase64.cpp 36

XPlugSQL.cpp 480

XPlugKeyLogger.cpp 702

XPlugScreen.cpp 1376

XBoot.cpp 733

XRTL.cpp 1444

XSoTcpHttp.cpp 1061

XPlugNetstat.cpp 492

XPacket.cpp 333

XSoPipe.cpp 240

XPlugNethood.cpp 213

XSoTcp.cpp 502

XPlugProcess.cpp 546

The estimated total number of code lines is ~19,000

Impressive.

DESIGN OF PLUGX

This is a very well-designed, well-written software project; it has modular plugins which, rather than having their own routines for tasks such as external communication, use functionality provided by PlugX internal APIs. This design choice allows plugins or APIs to be updated independently and in a backward-compatible way, without interrupting the execution of the malware or requiring it to be reinstalled. It can also be run in thread-safe and non-thread-safe environments. Care was taken to protect this malware from being easily identified by antivirus software or forensic analysts; at no time is any malicious code on disk in decrypted and decompressed form.

In summary, the project appears to have been developed by a skilled programmer or team of programmers, in an iterative fashion, starting with a clear set of features, which have been expanded and updated over time.

THE PLUGX PLUGINS

This malware comes with 13 default plugins. One of the parameters of plugin initialization appears to be a date. It could be creation date of the plugin, or the date of last modification. These dates range from 2012/01/17 to 2012/03/25, with some dates being the same.

Plugin	Date-like parameter	Functionality supplied
Disk	20120325	create, read, delete files, change env strings, create/write new files from C&C to disk, run .exe/tools
Process	20120204	create, kill, enum processes
Service	20120117	create, change, enum, start, delete services
RegEdit	20120315	create, change, enum, delete registry keys
Netstat	20120215	collect some network usage statistics
Nethood	20120213	enumerate computers and shared resources in the local network
Option	20120128	reboot, logoff, shutdown the system
PortMap	20120325	(the analysis of this component is still in progress)
Screen	20120220	take screenshots
Shell	20120305	create a new cmd.exe process; communicate with it via named pipes; relay input from/output to the C&C connection
Telnet	20120225	create new cmd.exe process with /Q option, turning off echo; communicate with process via sockets; relay input from/output to the C&C connection
SQL	20120323	connect and make queries to a SQL databases
Keylog	20120324	keylogger (writes to file NvSmart.hlp)

PROCESS OF INFECTION

We observe an infection process very similar to that described in other posts:

The rarsfx archive
- drops three files into the temp directory:
  - hkcmd.exe - a benign file with a valid digital signature.
  - hccutils.dll - an auxiliary dll, which has fake exports which are required by hkcmd.exe.
  - hccutils.dll.res - not a PE file, but a base-independent code, which consists of a decryptor and an encrypted malicious image. It also contains encrypted settings.
- then starts hkcmd.exe

During the hkcmd.exe loading process, the Windows loader looks for “hccutils.dll” in the current directory, finds it, and loads the DLL which was dropped by the rarsfx archive. This is sometimes known as a dll-load-order-hijack, where a local DLL supplants a system-supplied library.

hkcmd.exe imports three functions from hccutils.dll:
- FindResources
- LoadSTRING
- LoadSTRINGFromHKCU

hccutils.dll (in the dllmain procedure) patches the entry point of the hkcmd.exe image in memory (it hasn’t executed yet). Also, the dllmain has a date check. If the current date is before 2012/01/01, the malware just terminates, as seen in the reconstructed code below:

After this code is executed, instead of the original entry point, hkcmd.exe would jump to loadShellCode() function, which loads base-independent code from file hccutils.dll.res:

hccutils.dll.res contains an encrypted, compressed image containing the core PlugX payload and its further encrypted settings. In some samples, this image may be wrapped in an additional encryption layer (possibly as AV evasion). After unpacking this extra layer, or if the decryptor doesn’t have an additional layer, it decrypts the image, and decompresses it.

It then copies this image, section by section, to new memory and erases the PE header, replacing it with its own header format. (Probably to evade AV software which might attempt to find unknown PEs in memory)

Next, it begins to establish persistence:
- it reads the encrypted settings
- copies all files in a temporary directory to “%ALLUSERSPROFILE%\<dir_name>” where dir_name is a path specified in the settings.
- restarts itself with parameters that cause it to install itself in one of several ways, depending on the environment in which it is running.

OPERATION

After the malware has established persistence on a system (copied files and creates itself as service, or added an autorun entry in registry), it tries to establish a network connection with the C&C.

It can communicate with a server using TCP, UDP, or HTTP protocols. It sends broadcast UDP packets to devices on the same subnet as the victim, and listens for a broadcast response, in an attempt to establish connections with other bots in the same local network.

As soon as the connection is established, the C&C is able to:

Get machine info - obtain information about a processor and a system
Start plugin manager - depending on the request’s option, ask the bot to initialize plugins, send information about plugins, or create a remote shell.
Uninstall - delete all bot’s files and registry entries
Get plugins info - obtain information about all plugins
Send settings - send new settings to the bot
Get settings - get current settings from the bot

If the plugin manager (OlProcManager) is started, the C&C is then able to communicate with the chosen plugins.

Below is a generic example of how the C&C interacts with plugins:

UPDATING

The authors took many steps to protect against antivirus software and forensic analysts; at no time during the updating process is any malicious code ever on disk in decrypted and decompressed form. Whenever PlugX attempts to update itself, create another instance of itself, or inject code into a process, it does so by first injecting a block of location-independent code that is used to decrypt and unpack the payload, which is then injected and used to create the new instance, or update an existing plugin.

New plugins are added in 2 stages:

1. deliver the plugin files to an infected host

2. attempt to open the files with name “$x.plg” where $x is number from 0 to 127; if a file successfully opens, then it is loaded to memory, decrypted, decompressed, and initialized as a new plugin

The malware also has download-and-execute functionality through the Disk plugin, allowing it to simply run a new copy of itself, or download and run any tools an operator desires.

ENCRYPTION OF hccutils.dll.res

When it is first dropped, the base-independent code is encrypted and compressed. There may also be an additional encryption layer which wraps the decryptor.

If so, it looks like:

After it unpacks itself (or if the decryptor doesn’t have an additional layer), it looks like:

It then decrypts the image using this algorithm:

It then uses RtlDecompressBuffer to decompress the image.

This scheme: {compress + encrypt / decrypt + decompress} is also used for:

sending/receiving data to/from C&C (through the network)
sending/receiving data to/from other bots (through named pipes)
extracting compressed and encrypted dlls from within the image
extracting other compressed and encrypted files from disk

INTERESTING MECHANISMS IN PLUGX - UAC EVASION

The PlugX malware has a UAC (User Account Control) evasion mechanism. It checks if UAC is enabled, and restarts itself through specific steps:

It has a compressed and encrypted dll inside:

It unpacks the dll and writes it to a temporary file. The compression and encryption mechanisms are discussed in theencryption section above.

It creates a new process in a suspended state:

“C:\Windows\System32\msiexec.exe UAC”

Then it allocates memory in the process and injects the base-independent code there:

This code strongly resembles the code found here:

http://sinowal.com/UAC.cpp

INTERESTING MECHANISMS IN PLUGX - CREATE NEW PROCESS

Besides the usual mechanisms of creating new processes like:

CreateProcess
CreateService
Injecting memory to another process

PlugX also has an interesting method for creating a process though COM interfaces:

When the above code is executed:

service.exe creates a process (if it does not already exist)

"C:\WINDOWS\System32\svchost.exe -k DcomLaunch"

svchost.exe creates a process (if it does not already exist)

“C:\WINDOWS\system32\wbem\wmiprvse.exe -Embedding"

wmiprvse.exe creates a process with a command line specified in a call:

pcreate_copy->Put(L“CommandLine”, 0, &cmd, 0);

C&C INFRASTRUCTURE + ATTACK TIMELINE

Other blogs have done a great job of documenting the C&C of several attackers using older variants of PlugX.

However, there does not appear to be any infrastructure intersection with the attacks we witnessed. The attackers using this C&C infrastructure also focus their efforts in Southeast Asia, primarily targeting technology manufacturers, developers, or organizations that deal with them.

Let’s look at a few of the domains these attackers used in recent campaigns:

ns4.msftncsl.com

ns5.msftncsl.com

ns1.attoo1s.com

msftncsl.com similar to msftncsi.com

attoo1s.com similar to at-tools.com

These attackers appear to use subdomains which strongly resemble legitimate domains for C&C. Using some passive DNS data, we examined this infrastructure.

the second level domains often have only a CNAME record, pointing to the legitimate domain they resemble

msftncsl.com CNAME msftncsi.com

or resolve to localhost.

attoo1s.com A 127.0.0.1

After discovery, the attackers respond by also changing the A record used in the attack to point to 127.0.0.1, or 224.0.0.225. This pattern is also visible in other domains and subdomains used by the same attackers in other attacks.

Using passive DNS data, we can see that ns4.msftncsl.com resolved to 211.48.96.142 by 2012-11-21. It was also seen resolving to210.116.103.68 shortly thereafter. After 2012-12-21, it was seen resolving to 244.0.0.225, most likely a typo of the224.0.0.225 (the upper end of IP space reserved for multicast traffic) which this infrastructure typically changes A records to point at after discovery. In this time, we observed this domain and IP being used for C&C traffic for two RATs frequently used in targeted attacks (9002 RAT, PlugX).

In the following, we establish a timeline with infrastructure information combined with information extracted from the binaries discussed above.

2009-02-20: hkcmd.exe compiled (0D58E5F4E82539DE38BA7F9B4A8DDA12)

2010-11-22: domain msftncsl.com registered with ENOM, INC:

Contact: msftncsl@hotmail.com

Domain name: msftncsl.com

Registrant Contact:

xu wenqiang ()

Fax:
anyuan road no 170,putuoqu, shanghai,China
shanghai, CN 200085
CN

Administrative Contact:

xu wenqiang (6g8wkx@gmail.com)
+86.013965128080
Fax: +86.02167326460
anyuan road no 170,putuoqu, shanghai,China
shanghai, CN 200085
CN

Technical Contact:

xu wenqiang (6g8wkx@gmail.com)
+86.013965128080
Fax: +86.02167326460
anyuan road no 170,putuoqu, shanghai,China
shanghai, CN 200085
CN

2010-11-24: first observed lookup for msftncsl.com (CNAME msftncsi.com)

2011-07-01: first observed lookup for update.msftncsl.com A 127.0.0.1

2012-06-09: rarsfx.exe compiled (D5F69A21BCC84E34B0DF9D36EA5891D5)

2012-09-13: hkcmd.exe added to rarsfx archive

2012-09-28: hccutils.dll compiled (55C15EFA6369957C69E7C6643BC86EF2)

2012-11-17: unpacked hccutils.dll.res compiled (93B86D6DFD36CAE603A5EFBE95FA9289)

2012-11-21: hccutils.dll.res added to rarsfx archive

2012-11-22: ns4.msftncsl.com first points to 211.48.96.142

2012-11-24: sample found in the wild

2012-11-30: ns4.msftncsl.com then points to 210.116.103.68

2012-12-04: ns4.msftncsl.com then points to 211.48.96.142

2012-12-21: ns4.msftncsl.com then points to 244.0.0.225

So what can this timeline tell us about the attacker?

It is clear that this is a pattern of behavior, not just a unique attack.

First, consider the fact that the rarsfx archive is created 5-6 months before this attack; next examine the insertion times of the different artifacts within it; each is different, and not just by a few minutes, but by days. This attacker likely used the same rarsfx archive with other payloads before this attack.

It appears that the planning, building, testing, and finally, deployment of these attacks was iterative, planned, and practiced. This attacker has likely engaged in other attacks which follow the same (or at least a similar) pattern. It is clear that different payloads have been swapped in and out of the rarsfx. For example, consider:

https://www.virustotal.com/file/d5f69a21bcc84e34b0df9d36ea5891d5/analysis/

https://www.virustotal.com/file/c48cdf2ce519307358ead3512e31f264/analysis/

Note that the compile times for the archives, hkcmd.exe, and hccutils.dll (you can verify on the additional information tab of the VT analyses) are the same. Note that the insertion time of hkcmd.exe is the same, and that the others differ.

The two hccutils.dll files were both built at the same time, but are replaced for new attacks. In one sample, hccutils.dll was inserted shortly after building, and in the other, it was swapped just in time for use. One can imagine an attacker building a number of these hccutils.dll files at the same time, for different attacks, and swapping a new one in for each to enhance AV evasion. It seems to work, given the VT reports for each sample.

Also note that the hccutils.dll.res is swapped in shortly before use as well; given the model of PlugX, this makes sense. An attacker who uses this same dll-hijacking mechanism in multiple attacks would likely use the same hijacking dll with several different PlugX-payload resources before changing out the dll for a new one with a slightly different packing.

rarsfx.exe (Sat Jun 09 17:19:49 2012) d5f69a21bcc84e34b0df9d36ea5891d5

hkcmd.exe (Fri Feb 20 23:31:55 2009) 0d58e5f4e82539de38ba7f9b4a8dda12
hccutils.dll (Fri Sep 28 07:13:51 2012) 55c15efa6369957c69e7c6643bc86ef2
hccutils.dll.res 3aa819b9089cd906d6434e446bea75ba
hccutils.dll.res.unp (Wed Oct 17 12:34:30 2012) 93b86d6dfd36cae603a5efbe95fa9289

rarsfx.exe (Sat Jun 09 17:19:49 2012) c48cdf2ce519307358ead3512e31f264

hkcmd.exe (Fri Feb 20 23:31:55 2009) 0d58e5f4e82539de38ba7f9b4a8dda12
hccutils.dll (Fri Sep 28 07:13:51 2012) 58c11dd3a9f257869bc362c7a5bc85f1
hccutils.dll.res 824ee49166f2cfb45c573434fb588dde

DECODING THE C&C COMMUNICATION OF PLUGX

The communication done by PlugX is encrypted in two stages: to perform the first stage, the malware uses this routine:

# Python implementation:

def decrypt(key, src, size):
    key0 = key
    key1 = key
    key2 = key
    key3 = key
    dst = b''
    i = 0
    if size > 0:
        while i < size:
            key0 = (key0 + (((key0 >> 3)&0xFFFFFFFF) - 0x11111111)&0xFFFFFFFF)&0xFFFFFFFF
            key1 = (key1 + (((key1 >> 5)&0xFFFFFFFF) - 0x22222222)&0xFFFFFFFF)&0xFFFFFFFF
            key2 = (key2 + (0x44444444 - ((key2 << 9)&0xFFFFFFFF))&0xFFFFFFFF)&0xFFFFFFFF
            key3 = (key3 + (0x33333333 - ((key3 << 7)&0xFFFFFFFF))&0xFFFFFFFF)&0xFFFFFFFF
            new_key = (((key2&0xFF) + (key3&0xFF) + (key1&0xFF) + (key0&0xFF))&0xFF)
            res = unpack("<B", src[i:i+1])[0] ^ new_key
            dst += pack("<B", res)
            i = i + 1
    return dst

using the first four bytes from a given TCP data as the key:

After the first round of decryption (see below), there is a header which includes:

a flag to indicate which plugin the traffic is for
the size of the compressed data to follow
the size of the decompressed data
the status of the operation

Then you may decrypt the body, using the same key as for the header:

And finally, you decompress the body, using the parameters from the header:

Since the keys are in every packet, and we have managed to decrypt and decompress the payload, all that is left is to identify the flags for routing communication to each component of the malware. We have already identified several flags:

GET_MACHINE_INFO_FLAG = 0x1 #returns machine name and identifier

START_PLUGIN_MGR_FLAG = 0x3 #select and enable plugins

INSTALL_NEW_COPY_FLAG = 0x5 #install itself again

SEND_NEW_SETTINGS_FLAG = 0x6 #send bot new settings

SAVE_SETTINGS_TO_FILE_FLAG = 0x7 #save current settings to file

SEND_PLUGINS_INFO_FLAG = 0x8 #send C&C info about plugins

For the time being, the tool to do completely decode C&C traffic is still in development, but we have prepared a sample script to decrypt and decompress payloads, and identify some flags.

↧

Automatically Detecting Evasive Malware

January 20, 2014, 2:44 pm

≫ Next: Using High-Resolution Dynamic Analysis for BHO Trigger Detection

≪ Previous: An Analysis of PlugX

Malware has always been in continuous evolution: Throughout the years we have seen simple viruses become polymorphic, autonomous self-replicating code connecting to a master host and becoming a botnet, and JavaScript being used to launch increasingly sophisticated attacks against browsers. This last attack vector has become increasingly popular, as drive-by-download exploits have become commoditized, and are routinely used to compromise hundreds of thousands of computers.

One of the major challenges in detecting malicious JavaScript is the dynamic nature of the language itself. Data in JavaScript can be turned into code by calling the eval() function on a string. This string can be heavily obfuscated in order to prevent signature-based systems from detecting an exploit. Therefore, the only way to reliably detect the attack is to execute the JavaScript code and observe its behavior.

This is achieved using sandboxing technologies, often called “honeyclients”. These tools load the web page to be analyzed, execute the associated JavaScript code, and observe the actions performed, looking for evidence of an attack in progress. These are effective tools in detecting web-based malware, but they are not perfect, and cyber-criminals are catching up fast. The bad guys took note of how these systems detect web-based attacks and are using sophisticated techniques to evade detection. The goal of these evasion attacks is to devise an exploit that works reliably when launched against a real victim but fails to expose its nefarious intent when executed in a sandbox or honeyclient.

These highly-evasive attacks are often "evolutionary" with respect to initial exploits. This means that the evasive attacks are variations of attacks that were once successful and then started losing effectiveness because the honeyclients were detecting them. Therefore, the exploit writer startS to ``tweak'' the exploit, adding evasive feature until the JavaScript Is, once again, undetected by existing solutions. So what can be said about these "evolved" attacks?

Quite a bit, according to recent research in this field which, for the first time, provided techniques for the automated detection of evasive web-based malware. This research has been published in 2013 in the Proceedings of the USENIX Security Symposium, one of the top venues for the dissemination of highly innovative scientific results. The research work is titled: "Revolver: An Automated Approach to the Detection of Evasive Web-based Malware" and has been authored by our group, composed of researchers from the University of California in Santa Barbara and Lastline, Inc.

revolver-evasive-malware-diagram

A high-level overview of Revolver is presented in the Figure above. Revolver analyzes JavaScript code that has been executed by a honeyclient (that is, after it has been de-obfuscated) and extracts an abstract representation of the code structure (in tech-speak this is called an Abstract Syntax Tree, or AST). The ASTs (i.e., the code fragments) are marked as benign or malicious according to the current capability of the system (which is called an oracle). This means that some malicious code might be marked as benign even if it is in fact malicious, because it successfully evaded the system.

The various ASTs that are collected are then clustered, that is, they are grouped together according to their functionality. This first step reduces substantially the number of items to be analyzed. In the second steps, the ASTs within a cluster are compared to each other. If two code fragments are similar, but have a different classification (i.e., one is malicious and one is benign), then the difference between the two code fragments is computed and analyzed. If the code that has been added to a fragment caused an evolution from being considered malicious to being considered benign, then this is a case of evasive behavior, and the evasion technique is automatically identified. The evasive code fragment can be then brought to the attention of a human analyst, so that the evasion can be mitigated. If, instead, the code that was added caused a benign code fragment to be considered malicious, this might represent an injection attack, in which cyber-criminals embed malicious functionality in popular benign JavaScript components such as jQuery, in order to confuse existing filters.

In either case, the Revolver system is able to leverage machine learning in order to identify cases in which malware evolution created variants that are not detected anymore or to identify injections in benign components. This is a very first step towards a new set of techniques that will focus on detecting evasive activity, in addition to openly malicious activity. It is a necessary new step in the fight against sophisticated malware, which is becoming more aware of sandboxes and other analysis systems.

The details of this research effort are available in the technical paper, which is available here:
http://www.lastline.com/papers/revolver.pdf

The system is available to malware analysts. Please contact revolver@lastline.com for further information.

The authors of the paper are:
Alexander Kapravelos, PhD Student at UCSB
Yan Shoshitaishvili, PhD Student at UCSB
Marco Cova, Head of Lastline Europe and Professor at University of Birmingham
Christopher Kruegel, Chief Scientist at Lastline and Professor at UCSB
Giovanni Vigna, CTO at Lastline and Professor at UCSB

For further information about this research work, please contact me at vigna@lastline.com.

↧

Using High-Resolution Dynamic Analysis for BHO Trigger Detection

February 4, 2014, 5:34 pm

≫ Next: Analyzing Environment-Aware Malware

≪ Previous: Automatically Detecting Evasive Malware

Looking at how malware analysis engines evolved over the last decade, the trend is quite obvious: Dynamic analysis systems are replacing purely static ones or at least combine elements from both approaches. While the advantages of dynamic analysis are convincing - resilience against code obfuscation or encryption - attackers have various techniques at hand to complicate dynamic analysis and possibly evade these systems.

Lastline’s answer to these attacks is what we call High-Resolution Dynamic Analysis. In a series of upcoming blog posts, we want to look at how our high-resolution sandbox tackles evasive code present in most of today’s Advanced Persistent Threats (APTs). These attacks range from environment-fingerprinting, to sandbox evasion, to behavior triggers.

High Resolution Dynamic Analysis

Lastline Analyst - our analysis product allowing forensic and audit teams to analyze zero-day exploits, advanced persistent threats and malware as well as the analysis backend for Lastline Enterprise - has one key-discriminating feature that distinguishes it from traditional analysis sandboxes: The analysis engine is built on top of a CPU emulator, which allows the engine to see every instruction executed by a malware program under analysis.

Traditional dynamic analysis engines only monitor interactions between an analysis subject and the operating system (through native system- or API-function calls) or communication between processes running inside the analysis environment. This is a practical approach for monitoring the behavior of the analysis subject, but suffers from crucial blind spots that an attacker can use to either detect or even entirely evade an analysis system.

Lastline’s engine, available in both Lastline Analyst and Lastline Enterprise, closes this blind spot by combining traditional API call monitoring and CPU emulation to find malicious behavior that does not rely on explicit interactions with the operating system. In addition Lastline is able to deal with the sophisticated evasion attempts that are common in today’s APTs.

Analysis of Evasive Malware

Lastline contains a reporting feature that provides security analysts and network administrators with a fast overview of the analyzed malware’s behavior. This gives a summary of the analysis without having to read dozens of pages of analysis logs and having to understand the meaning of various OS internals such as the Microsoft Windows Registry or Android Service Intents.

In a follow-up blog post, we will show more details about the Activity Summary, for now just consider the following example:

Lastline Evasive Malware Risk Assessment

_{Analysis report overview showing score, classification, and activity summary}

Without having to dig deeply into the report details, the Activity Summary gives the user a fast classification into high-level classes, such as benign or malicious, as well as an overview of the behavior exhibited during the analysis. Additionally, it highlights the key points that experienced researchers should focus on, when drilling further into the detailed analysis results.

For the rest of this blog-post, and as an outlook into a series of follow-up posts, we want to focus on a subset of the events in the above summary concerning evasive malware:

_{Analysis report activity summary}

These entries highlight a combination of two behaviors found in many of today’s advanced threats: Evasive behavior, i.e., revealing certain actions only in specific circumstances (e.g., the presence or absence of certain programs, such as analysis tools or AV products), as well as stealing a user’s confidential data (such as banking or gaming credentials).

Trigger-Based Dynamic Analysis

Combining environment checks with specific behavior is a major concern for traditional analysis sandboxes based on API-logging alone, as explained earlier. To make things more concrete, let us look at an example:

One common technique used by today’s malware to steal a user’s confidential data is to inject code into the user’s web-browser (such as Internet Explorer, Firefox, or Google’s Chrome Browser). This attack, usually called Man-In-The-Browser, allows the malware to get notifications about the user’s browsing activity without having to worry about secure connections through SSL or other security features (such as 2-factor authentication used by many web services). All the attacker has to do is manipulate the website’s Document Object Model (DOM) and register callbacks on interesting events, such as submissions of forms or loading of specific pages within a web service. By registering event listeners inside the DOM, the browser will notify the attacker’s code of data retrieved from or sent to specific URLs or web services.

An example for this kind of attack are Browser Helper Object (BHO) plugins that the popular botnet Alureon is injecting in Microsoft’s Internet Explorer: These BHOs monitor the URLs that a user browser visits and inject a number of JavaScript functions into the DOM as soon as an interesting URL is found. In the case of Alureon, interesting URLs are a set of (mostly Korean) banking and gaming websites, targeting individuals and companies alike.

_{Code extracted from Alureon BHO for injecting event listeners}

_{Code extracted from Alureon BHO for injecting event listeners (converted to pseudo-code)}

Traditional sandboxes are mostly unable to detect this specific behavior, as their API monitoring techniques are unable to detect the URLs that trigger the injection events in the DOM. Lastline's CPU emulation technique, on the other hand, can see how the malware’s browser plugin checks for URLs triggering certain behavior, and, as a consequence, can direct the browser to specific URLs for revealing additional behavior otherwise not extracted.

When analyzing a set of BHOs dropped by recent Alureon samples [1], the analysis engine is able to detect and extract the trigger component (checking for the URLs that trigger a certain behavior) as well as the stealing component that injects the event listener functions into the DOM. This happens fully automatically during analysis and is summarized in the analysis report overview and report details.

At the time of analysis, these URLs included online-banking and gaming sites alike, such as:

bank.cu.co.kr
banking.nonghyup.com
banking.nonghyup.com/common/pbarpagecontroller.jsp?surl=/bank/ar/pbari0101.jsp
epostbank.go.kr
ibk.co.kr
lcs.mezzo.hangame.com
shinhan.com
wooribank.com
www.nexon.com/login/login.aspx

Additionally, the engine extracts the JavaScript code that is injected when these URLs are visited.

_{JavaScript event listeners injected into the DOM}

The listeners that Alureon registers hook DOM events to extract user data (such as usernames and passwords), temporarily storing them in a cookie named IEPROXY. Later, this data is extracted from the cookie and sent to the attacker. Additionally, the BHO hides specific DOM elements.

Clearly, without Lastline’s high-resolution analysis engine, a sandbox would neither detect the triggers (visiting a URL in this case) or be able to extract the JavaScript code, as it is only injected when one of the triggers hits.

I invite you to register for Lastline Analyst and gain the ability to analyze up to 25 files per day for free. There is no hardware installation needed to try it out.

You can can also request a 30-day evaluation for Lastline Enterprise and gain the ability to analyze files from your network.

[1] Links to VT concerning a few samples:

MD5 17c04f953f25448055b4260262248a65
SHA1 2037b14ee16c2d7d1e6ee3e1f471c98fc37f9967
SHA256 73c6ef8a21c81d4496768723b6b045b4a8d16dd43cb3d415025268b65fb287ea
https://www.virustotal.com/en/file/73c6ef8a21c81d4496768723b6b045b4a8d16dd43cb3d415025268b65fb287ea/analysis/
MD5 bcfc7a9e963efabdf70a2b5911a4030e
SHA1 b55f4cb53f21e394290a8332eb223c9d4038cd00
SHA256 4da38ba0d708c6fbf0c9b473cfae52400bdc5f7c39a3105ae161a7c1859f731d
https://www.virustotal.com/en/file/4da38ba0d708c6fbf0c9b473cfae52400bdc5f7c39a3105ae161a7c1859f731d/analysis/
MD5 5ceb72b29e61cf26d2b3b4d97320dfb4
SHA1 8c42657fee4e8244eb96f031164b15995f33a910
SHA256 b6f9993e45978c958d0dfcda5e14ada347c800efb4fc337426c7c6505d5a270f
https://www.virustotal.com/en/file/b6f9993e45978c958d0dfcda5e14ada347c800efb4fc337426c7c6505d5a270f/analysis/
MD5 dcfba6aba9a557e84221e0d34ffb655c
SHA1 9caef91aed1f207482aa4da9c3a59e998fdfcb17
SHA256 b26d4ab40cbbb7c94521a672ec689a01662930f8422c032e374a09b7944f0ad5
https://www.virustotal.com/en/file/b26d4ab40cbbb7c94521a672ec689a01662930f8422c032e374a09b7944f0ad5/analysis/

↧

Analyzing Environment-Aware Malware

February 19, 2014, 1:59 pm

≫ Next: Lastline employees involved in top-tier Android research

≪ Previous: Using High-Resolution Dynamic Analysis for BHO Trigger Detection

A look at Zeus Trojan variant called Citadel evading traditional sandboxes

Fighting traditional sandboxes (or dynamic analysis systems in general) typically comes in the form of detecting the analysis environment or evading analysis through means of behavior triggers as mentioned in a previous blog post: Using High-Resolution Dynamic Analysis for BHO Trigger Detection. Some variants of the notorious Zeus trojan family use a different approach to hinder analysis: Host fingerprinting.

In this evasion technique, the malware sample computes a unique fingerprint of the host when it infects the system and embeds this fingerprint inside the malware binary. Whenever the malware starts execution, a new fingerprint of the host running the program is computed and compared to the original host’s fingerprint. This enables the malware sample to detect if it is running in an unexpected environment, and to take different set of actions in this case.

Zeus Trojan Malware Family Citadel Trojan Host Fingerprint Check

^{Zeus trojan host fingerprint check}

As a consequence, whenever security vendors try to analyze a copy of a suspicious program in a (even slightly) different environment than the one where the sample was captured, the malware program can detect the different environments and will not reveal any malicious behavior, defeating correct classification. In a way, this technique resembles Digital-Rights-Management (DRM) that prohibits the use of illegal copies of licensed content.

Others have identified this as a very difficult problem, but clearly, modern dynamic analysis systems, such as Lastline, must be able to handle environment-aware or -sensitive malware programs. Using High-Resolution Dynamic Analysis, our Lastline CPU emulation engine is able to defeat this kind of attacks, as we explain in the rest of this blog-post as well as our whitepaper on multi-path execution.

Defeating Host Fingerprinting

The Zeus variant we are looking at in this blog-post is called Citadel [1]. The way Citadel does host fingerprinting is by using a binary key and a randomly generated application directory and program name during the initial infection phase. More precisely, the code derives an installation fingerprint by encrypting the application’s name and path. This encrypted data is then embedded inside one of the resource sections of the malware program for later use.

Whenever the malware is launched (for example on reboot of the infected host), the code generates a new host fingerprint and compares it to the decrypted value stored in its resources.

Citadel trojan fingerprint check function ^{Citadel host fingerprint check function}

If the two fingerprints are different, it means the program is executed on a new host (such as an analysis environment). As a result, the program can behave differently — for example terminate immediately or hide any malicious behavior.

^{Diverging control-flows based on host fingerprint}

The high-resolution CPU emulation engine of Lastline has full visibility into the instructions executed by Citadel and can detect the fingerprint calculation. As a consequence, the engine can find the environment-aware code regions and redirect the control flow to execute the malicious behavior of the Trojan.

Zeus trojan environment sensitive malware

^{Analysis report showing environment sensitive malware (host-infection environment check)}

Say My Name!

A related technique of environment-aware malware is to make exhibited behavior depend on the execution context, such as the process in which the malware code is running. A common trick used by Trojans is to register one of the malware components using App-Init Dlls. These Dlls get loaded as part of any process running on an infected host, including Internet browsers or Anti-Virus software.

When the Trojan code is started in the context of another program, it can use the process name (such as the executable name or command-line) to determine its execution environment. Typically, when running as part of AV software, malware disables the security tool and terminates the Anti-Virus service. On the other hand, when executed as part of a browser, the attacker can steal confidential data of the user as described in a previous blog-post.

Similar to the Citadel example above, Lastline is able to determine that the execution context is used for making a decision that affects the control flow of the program. More concretely, the CPU emulation engine can determine which execution context would yield the most interesting behavior and enforce execution of the corresponding behavior as seen in the example report generated by our system below [2]:

Zeus trojan context aware malware targetting browsers

^{Analysis report showing context-aware malware targeting browsers}

Ironically, we can even use the fact that malware code is environment-sensitive against the attacker, as this sort of behavior is very uncommon in benign applications… turning the bad guys’ tricks against them.

[1] Link to VT for a recent Citadel sample:

MD5 cb8a849e9e4df6e9103063d6e67d4ab0
SHA1 9bc4b2104032925bc031399647a62a3d79938e99
SHA256 2a6416fb876d4e8220a5514c7248dfa7546cefc00d87e77f075ce4fe7dabb739
https://www.virustotal.com/en/file/2a6416fb876d4e8220a5514c7248dfa7546cefc00d87e77f075ce4fe7dabb739/analysis/

[2] Link to VT for sample

MD5 c9b2affe8a926d9b5ba3b0ba32d63807
SHA1 f21b5c7eb01201fbc1d48071fbc81d8509dfefd1
SHA256 6d105610ff6297deb483962dba0286769ece3dba5f34984e3d02dceddfa4ae9b
https://www.virustotal.com/en/file/6d105610ff6297deb483962dba0286769ece3dba5f34984e3d02dceddfa4ae9b/analysis/

↧

Lastline employees involved in top-tier Android research

March 20, 2014, 1:40 pm

≫ Next: How To Build An Effective Sandbox

≪ Previous: Analyzing Environment-Aware Malware

Researchers from UCSB and University of Bonn recently published a paper on the risks incurred by dynamically loaded external code in Android apps. The accompanying blog post on the iSecLab blog gives an overview of the work.

The first author of the paper is now a security researcher at Lastline, and professors Kruegel and Vigna are co-founders of the company. The insights gained during the research effort will contribute to improving Lastline's Android analysis technology. We therefore cross-post the iSecLab article here:

Execute this! Looking at code-loading techniques in Android

Recently, several research efforts related to the security of the Android mobile platform showed how often Android applications are affected by severe security vulnerabilities. During the last summer, we decided to investigate how benign and malicious Android apps use a feature that the security community identified as problematic a long time ago: remote-code-loading functionalities.

Our findings were surprising: in fact, we found a number of very popular Android apps (each with more than one million users) that were affected by remote code execution vulnerabilities! Our paper related to this work appeared at NDSS’14, and it was presented this week in San Diego. While we suggest to check out the full paper, here we provide an overview of our findings.

Android apps are allowed to load additional code dynamically from (mostly) arbitrary sources. There is no permission check or anything similar. Once an app is installed on your device, it can even download code from the Internet (provided that it has Internet permission). We figured that sounded a little risky, so we decided to have a deeper look.

Intuitively, there are two problems with dynamic code loading:

Malicious apps can download dangerous code onto your device. The harmless bike tracking app that you just installed might start downloading code that sends expensive premium SMS. Unfortunately, the providers of Android markets (such as Google Play or the Amazon app store) run malware scanners centrally at the store, and hence those scanners have no chance of ever seeing the code that an app downloads at runtime if it doesn’t want them to see!
Benign apps might make use of such “feature” for legitimate reasons (e.g., a game loading additional levels at runtime), but what if the developers didn’t think about security? Or, as it happens more often than one would think, what if the developers failed to properly implement the security checks? Well, if an app doesn’t verify the code it loads, then attackers might find their ways to inject malicious code, and hence a single carelessly designed app could open your device to the evil guys out there. For example, this very problem has just been discovered in the CyanogenMod updater.

We set out for a thorough analysis of both aspects of the problem. In a first step, we identified different ways for apps to load external code. A popular and well-known technique is DexClassLoader, a Java class loader that allows you to load Java code from files (which you might just have downloaded). Another way is to create a so-called app context for other apps, which allows you to run their code. Please refer to the paper for a full list of techniques.

The one thing that all the code-loading techniques have in common is that the Android system does not check the code you load. There is no verification that the code has been approved by anyone or originates from a trustworthy developer. iOS, in contrast, strictly enforces that any code run on the device be signed by a certificate chain that ultimately leads to Apple.

After systematically studying how a given Android app can load code, we went on to investigate the two scenarios that we mentioned above. Remember? (1) Malicious apps evading detection and (2) vulnerabilities in benign apps.

It turns out that malicious apps can use dynamic code-loading techniques to evade the centralized malware analysis systems that are common in the Android world (such as the Google Bouncer, used by the Google Play store). Their centralized nature, i.e., the fact that apps are usually analyzed in the app store rather than on the users’ devices, makes the protection systems unable to cope with dynamically loaded code. As an experiment, we published an app on Google Play that safely downloads and executes (innocuous) code from a web server we control. Monitoring requests to the web server we discovered that, during the verification process performed by the Google Bouncer before accepting our app, the remote code was not even accessed.

In this blog post, however, we will focus on the second aspect: the risk of vulnerabilities in benign apps.

We developed a static analysis tool (based on backward slicing) that is able to detect unsafe usage of code-loading techniques in Android apps with high accuracy. (It is important to stress that an app using such loading techniques is by no means always malicious. Have a look at the paper for a full list of the different legitimate motivations we discovered.)

In an attempt to assess the percentage of vulnerable apps, we ran our detection tool on a set of 1,632 apps from Google Play, each with more than one million installations. Disquietingly, we found that 9.25% of those apps are vulnerable to code injection. The situation among the top 50 free apps is even worse: in fact, we found that 16% are vulnerable!

Among the vulnerabilities that we found was a possible code injection attack against an advertisement framework called AppLovin. We notified the developers of this framework in July 2013, and they acknowledged the large security impact and suggested a fix within hours. We agreed with them to not publicly disclose the vulnerability until the fixed version of the framework was incorporated into a substantial amount of the vulnerable apps. This same vulnerability was then publicly disclosed in November 2013 by MWR. We’d like to stress how fast the AppLovin developers reacted to our discovery and encourage other companies to follow their example.

Finally, we designed a protection scheme that will protect Android users from vulnerable benign apps as well as malicious apps in the context of dynamic code loading. You can find the full details in the paper – here, we’ll just give a brief overview.

The basic idea behind our protection mechanism is code signing: Every piece of code that an app wants to execute must have been signed off by someone you trust. Now, we don’t want to depend on a single company for this signing business. Therefore, our system allows the user to choose trustworthy judges. Anyone is free to start analyzing Android code and providing signatures (actually, we use something that is similar to signatures, but not quite the same), and the users decide whom to trust.

We enforce our code checks by modifying Dalvik, the Java environment in Android. (Dalvik will be replaced by a new system called ART soon, but that’s not too much of a problem for the protection scheme.) Unfortunately, this means that you have to install a new version of Android in order to deploy the protection system. We are aware that this makes its wide adoption considerably harder, but we believe that the compromise is worthwhile because of the strong security guarantees that our system can provide: we can’t protect every existing Android device, but we offer greatly improved security for future devices. It is our hope that the concepts of our protection system will find their way into a future “official” version of Android.

If this short outline of our work has attracted your interest, check out our full NDSS paper to learn the details: which code-loading techniques we found, why benign apps use them, how we circumvented the Google Bouncer and how our protection system works under the hood, to name just a few things!

↧

How To Build An Effective Sandbox

March 26, 2014, 5:53 pm

≫ Next: A Pipeline for Scalable Analysis Capability

≪ Previous: Lastline employees involved in top-tier Android research

Automated malware analysis systems (or sandboxes) are one of the latest weapons in the arsenal of security vendors. Such systems execute an unknown malware program in an instrumented environment and monitor their execution. While such systems have been used as part of the manual analysis process for a while, they are increasingly used as the core of automated detection processes. The advantage of the approach is clear: It is possible to identify previously unseen (zero day) malware, as the observed activity in the sandbox is used as the basis for detection.

For a high level overview of this topic, please read Next-Generation Sandbox Offers Comprehensive Detection of Advanced Malware.

Goals of a dynamic analysis system (sandbox)

A good sandbox has to achieve three goals: Visibility, resistance to detection, and scalability.

First, a sandbox has to see as much as possible of the execution of a program. Otherwise, it might miss relevant activity and cannot make solid deductions about the presence or absence of malicious behaviors. Second, a sandbox has to perform monitoring in a fashion that makes it difficult to detect. Otherwise, it is easy for malware to identify the presence of the sandbox and, in response, alter its behavior to evade detection. The third goal captures the desire to run many samples through a sandbox, in a way that the execution of one sample does not interfere with the execution of subsequent malware programs. Also, scalability means that it must be possible to analyze many samples in an automated fashion.

What information should a sandbox collect?

In this post, we discuss different ways in which a sandbox can monitor the execution of malware that runs in user mode (either as a regular user or administrator). This leaves out malicious code that tampers with the kernel, such as rootkits. We leave those for a future post. Also, the vast majority of malware runs as regular user mode processes, and even rootkits typically leverage user mode components to install kernel drivers or to modify the operating system code.

When monitoring the behavior of a user mode process, almost all sandboxes look at the system call interface or the Windows API. System calls are functions that the operating system exposes to user mode processes so that they can interact with their environment and get stuff done, such as reading from files, sending packets over the network, and reading a registry entry on Windows. Monitoring system calls (and Windows API function calls) makes sense, but it is only one piece of the puzzle. The problem is that a sandbox that monitors only such invocations is blind to everything that happens in between these calls. That is, a sandbox might see that a malware program reads from a file, but it cannot determine how the malware actually processes the data that it has just read. A lot of interesting information can be gathered from looking deeper into the execution of a program. Thus, some sandboxes go one step further than just hooking function calls (such as system calls or Windows API functions), and also monitor the instructions that a program executes between these invocations.

Emulation versus virtualization

Now that we know what information we want to collect, the next question is how we can build a sandbox that can collect this data in a way that makes it difficult for malware to detect. The two main options are virtualization and emulation.

An emulator is a software program that simulates the functionality of another program or a piece of hardware. Since an emulator implements functionality in software, it provides great flexibility. For example, consider an emulator that simulates the system hardware (such as the CPU and physical memory). When you run a guest program P on top of this emulated hardware, the system can collect very detailed information about the execution of P. The guest program might even be written for a different CPU architecture than the actual CPU that the emulator runs on. This mechanism allows, for example, to run an Android program, written for ARM, on top of an emulator that runs on an x86 host. The drawback of emulation is that the software layer incurs a performance penalty. The potential performance impact has to be carefully addressed to make the analysis system scalable.

With virtualization, the guest program P actually runs on the underlying hardware. The virtualization software (the hypervisor) only controls and mediates the accesses of different programs (or different virtual machines) to the underlying hardware. In this fashion, the different virtual machines are independent and isolated from each other. However, when a program in a virtual machine is executing, it is occupying the actual physical resources, and as a result, the hypervisor (and the malware analysis system) cannot run simultaneously. This makes detailed data collection challenging. Moreover, it is hard to entirely hide the hypervisor from the prying eyes of malware programs. The advantage is that programs in virtual machines can run at essentially native speed.

Leveraging emulation and virtualization for malware analysis

As mentioned previously, the task of an emulator is to provide a simulated (runtime) environment in which a malware program can execute. There are two main options for this environment. First, one can emulate the operating system (this is called OS emulation). Intuitively, this makes sense. A program runs in user mode and needs to make system calls to interact with its environment. So, why not simply emulate these systems calls? While the malware is running, one can get a close look at its activity (one can see every instruction). When the malware tries to make a system call, this information can be easily recorded. At this point, the emulator simply pretends that the system call was successfully executed and returns the proper result to the program.

This sounds simple enough in theory, but it is not quite as easy in practice. One problem is that the (native) system call interface in Windows is not documented, and Microsoft reserves the right to change it at will. Thus, an emulator would typically target the Windows API, a higher-level set of library functions on top of the native system calls. Unfortunately, there are tens of thousands of these Windows API functions. Moreover, the Windows OS is a huge piece of software, and emulating it faithfully requires an emulator that has a comparable complexity of Windows itself! Since faithful emulation is not practical, emulators typically focus on a popular subset of functionality that works "reasonably well" for most programs. Of course, malware authors know about this. They can simply invoke less frequently used functions and check whether the system behaves as expected (that is, like a real Windows OS). OS emulators invariably fail to behave as expected, and such sandboxes are quite easy for malware to detect and evade. Security vendors that leverage OS emulation are actually well aware of this limitations. They typically include OS emulation only as one part of their solution, complemented by other detection techniques.

As the second option for an emulator, one can simulate the hardware (in particular, CPU and physical memory). This is called (whole) system emulation. System emulation has several advantages. First, one can install and run an actual operating system on top of the emulator. Thus, the malware is executed inside a real OS, making the analysis environment much more difficult to detect for malware. The second advantage is that the interface offered by a processor is (much) simpler than the interface provided by Windows. Yes, there are hundreds of instructions, but they are very well documented, and they essentially never change. After all, Intel, AMD and ARM want an operating system (or application) developer to know exactly what to expect when she targets their platform. Finally, and most importantly, a system emulator has great visibility. A sandbox based on system emulation sees every instruction that a malware program executes on top of the emulated processor, and it can monitor every single access to emulated memory.

Virtualization platforms provides significantly fewer options for collecting detailed information. The easiest way is to record the system calls that programs perform. This can be done in two different ways. First, one could instrument the guest operating system. This has the obvious drawback that a malware program might be able to detect the modified OS environment. Alternatively, one can perform system call monitoring in the hypervisor. System calls are privileged operations. Thus, when a program in a guest VM performs such an operation, the hypervisor is notified. At this point, control passes back to the sandbox, which can then gather the desired data. The big challenge is that it is very hard to efficiently record the individual instructions that a guest process executes without being detected. After all, the sandbox relinquishes control to this process between the system calls. This is a fundamental limitation for any sandbox that uses virtualization technology.

How is our sandbox built?

We think that more visibility is better, especially facing malware that becomes increasingly aware of virtual machines and sandbox analysis. We have seen malware that tries to detect the presence of VMware for many years. Even if one builds a custom sandbox based on virtualization technology, the fundamental visibility limitations remain. Of course, when a malware program checks for specific files or processes that a well-known hypervisor like VMware introduces, these checks will fail, and the custom sandbox will be successful in seeing malicious activity. However, virtualization, by definition, means that malicious code is run directly on the underlying hardware. And while the malicious code is running, the sandbox is paused. It is only woken up at specific points, such as system calls. This is a problem, and a major reason why we decided to implement our sandbox as a system emulator.

Why does not everybody use system emulation, since it seems such a great idea? The reason is that one needs to overcome two technical challenges to make a system emulator work in practice. One challenge is called the semantic gap, the other one is performance. The semantic gap is related to the problem that a system emulator sees instructions executed on the CPU, as well as the physical memory that the guest OS uses. However, it is not immediately clear how to connect CPU instructions and bytes in memory to objects that make sense in the context of the guest OS. After all, we want to know about the files that a process creates, or the Windows registry entries that it reads. To bridge the semantic gap, one needs to gain a deep understanding into the inner workings of the guest operating system. By doing this, we can then map the detailed, low level view of our system to high level information about files, processes and network traffic that are shown in our report.

The second question is about performance. Isn't emulation terribly slow? The answer is yes, if implemented in a naive way. If we emulated every instruction in software, the system would indeed not scale very well. However, we have done many clever things to speed up emulation, to a level where it is (almost) as fast as native execution. For example, one does not need to emulate all code. A lot of code can be trusted, such as Windows itself. Well, we can trust the kernel most of the time - of course, it can be compromised by rootkits. Only the malicious program (and code that this program interacts with) needs to be analyzed in detail. Also, one can perform dynamic translation. With dynamic translation, every instruction is examined in software once, and then translated into a much more efficient form that can be run directly.

Summary

A sandbox offers the promise of zero day detection capabilities. As a result, most security vendors offer some kind of sandbox as part of their solutions. However, not all sandboxes are alike, and the challenge is not to build a sandbox, but rather to build a good one. Most sandboxes leverage virtualization and rely on system calls for their detection. This is not enough, since these tools fundamentally miss a significant amount of potentially relevant behaviors. Instead, we believe that a sandbox must be an analysis platform that sees all instructions that a malware program executes, thus being able to see and react to attempts by malware authors to fingerprint and detect the runtime environment. As far as we know, Lastline is the only vendor that uses a sandbox based on system emulation, combining the visibility of an emulator with the resistance to detection (and evasion) that one gets from running the malware inside the real operating system.

↧

A Pipeline for Scalable Analysis Capability

April 16, 2014, 12:35 pm

≫ Next: Analyzing a banking Trojan

≪ Previous: How To Build An Effective Sandbox

An area where we spend quite some effort here at Lastline is scaling up our malware analysis capabilities, that is our ability to analyze (potentially) malicious artifacts, such as binaries, documents, and web pages. This is a very important area that affects not only our internal/backend operations, but also the data that our users see on their network (and the quality of this data).

We all know the challenges: we want to achieve great accuracy and performance, while keeping cost down. Accuracy is concerned with having good detection rates (classifying correctly the artifacts we analyze). Performance is critical both in terms of throughput (number of artifacts we inspect overall) and of latency (time one has to wait to know whether a specific artifact is malicious or benign). The third factor is basic economics: analyzing any given sample costs some money, for example, to cover the CPU, disk, and network usage that are required to perform an analysis, and the research costs associated with developing, improving, and maintaining a given analysis system.

Finding a solution to this problem requires one to explore a big search space (lots of different options at all corners), or, more likely, multiple search spaces, as one may try to individually optimize the detection capabilities for binaries, documents, mobile apps, web pages, etc., all of which may require or benefit from specialized techniques or analysis engines.

As we were going through this sort of exercise for all the malware domains we care about, we found a process that allows us to effectively tackle these challenges and to scale up our analysis capabilities to levels we are comfortable with. In short, we break up the full analysis process, from the collection of samples to the evaluation of analysis results, into a series of steps and we develop components that work on each of these steps. The result is a processing pipeline, which we apply, with small variations, to each artifact type we deal with. In the rest of the post, we will give some more details about the pipeline we use specifically for processing web pages to detect drive-by-download attacks.

Analysis Processing Pipeline

Artifact Collection

The first step in the analysis process is to collect artifacts to analyze. There exist many sources of artifacts: for example, one can collect and inspect the artifacts observed at customer locations (given appropriate permissions are in place) or use industry-wide feeds, both commercially produced or open source.

Of course, we like to also obtain artifacts on our own, to ensure that we have good coverage and an up-to-date view of current threats. In the domain of malicious web pages, the standard way of doing this is via crawling: one picks some reasonable seeds to get some initial web pages and then follows links extracted in visited web pages. This works but has the disadvantage of visiting a lot of benign web pages and only few malicious web pages. While this is expected (luckily, there are far more benign pages out there than malicious ones), we would like of course to increase the number of malicious web pages we discover.

We have improved on this situation by coming up with better ways to seed the crawling process. In particular, we have a number of "gadgets," methods to search for web pages that are "similar" to pages that we found in the past to be malicious: the assumption here is that these pages are also more likely to be malicious than those that we find by random crawling. For all the nitty-gritty details, see this academic paper.

Filtering

If the artifact collection works as expected, we end up with far more artifacts than we can reasonably analyze in depth (e.g., using a sandbox). This is not as bad as it sounds: the vast majority of artifacts that are collected will actually be found to be benign and can be safely discarded. In fact, we want to discard as many of the benign samples as early as possible in the processing pipeline: there's no point in spending resources to analyze them.

The challenge here is of course that of identifying benign samples (e.g., benign web pages) quickly, without doing a full analysis. To address this, we introduced a number of filters that statically inspect web pages and determine, using various techniques (e.g., lightweight machine learning techniques), whether they are likely to be benign: these are discarded without further ado. Filters are designed to provide a response in a few milliseconds as opposed to the several tens of seconds that a regular analysis would take. Of course, it would be great if filters had perfect detection, but that's of course rarely the case: in general, we strive to keep their false positives down (note that, in this phase, they result in no actual alert but only extra work from our in-depth analyzers) and to avoid false negatives as much as possible (they would lead to actual missed detection).

Similarly, filters can be applied to detect early on malicious artifacts that are similar to artifacts that were analyzed in the past and found to be malicious. These can also be discarded (or, more likely, their analysis can be prioritized at a lower level), since the results of their analysis can be pre-determined with high confidence. In these cases, filters typically rely on clustering techniques: incoming artifacts that are clustered sufficiently "close" to known malware samples (e.g., polymorphic variations of an existing sample) are de-prioritized.

If you are interested in the details, we have published papers describing our web pages filtering system and our malware clustering techniques.

In-Depth Analysis

The actual in-depth analysis is where we actually perform the full analysis of all artifacts that have arrived this stage of the pipeline. This is where we visit URLs, execute binaries, open documents and inspect the resulting behavior of the system. If you are interested in the details of one our analysis systems, we have discussed the design of our binary analysis environment here.

Result Evaluation

The last step in the pipeline consists of evaluating the results produced by the previous steps, and, in particular, to assess the presence of false positives and false negatives (e.g., resulting from evasion attempts). This step is important for both quality control purposes and to provide a feedback to improve the tools and techniques used in the earlier phases of the pipeline.

False positives are typically easier to detect: they result in all sorts of noise, e.g., spikes in the numbers of detections, etc. False negatives are more challenging: there's nothing obviously wrong to see. To identify them, we use a number of techniques. For example, we selectively re-run the analysis of certain artifacts on systems that use different detection techniques (e.g., on a emulation-based sandbox and on a physical machine) with the idea that analysis errors or evasion techniques that worked on one will not trigger on the other system as well. We can also actively check for evasion attempts: for example, in Revolver we introduced a technique to automatically identify evasions in web pages by finding pairs of pages that are similar and that have been classified differently (one malicious and the other benign): the different classification outcome is in some cases attributable to successful evasion techniques, which, once identified, we can handle and bypass, by fixing the appropriate component in the pipeline.

Is there a role in this pipeline for manual analysis? There is, and we are lucky to have a terrific team of reversers and malware analysts: they get to review the hard cases (evasive malware that uses techniques that we cannot isolate automatically) and to propose the fixes needed to allow us to handle such cases automatically.

Looking for protection against advanced malware?

Lastline Enterprise provides premier malware protection for web, e-mail, content, and mobile applications, deployed anywhere on your physical or virtual network. Learn more here:

↧

Analyzing a banking Trojan

April 17, 2014, 6:00 am

≫ Next: Antivirus Isn't Dead, It Just Can't Keep Up

≪ Previous: A Pipeline for Scalable Analysis Capability

In our effort to detect threats to the users of Android devices, we analyze a lot of malicious apps. This post exemplifies the analysis of such malware, more specifically a banking Trojan that we came across recently. It pretends to generate one-time authentication codes for online banking, but its real purpose is to steal the users' banking credentials and to intercept incoming SMS (possibly containing Transaction Numbers). Also, it tries to evade analysis by checking its runtime environment.

We have seen different versions of the app, but this post is based on samples with SHA1 hashes e370ab3f1fbecfc77bdc238591d85882923ed37e and 698a1c5574fbe8ea1103619d81fdd4e8afa85bd5.

The user's perspective

Let's start with observing how the app under analysis presents itself to the user. It disguises itself as an app provided by the user's bank and pretends to help with the generation of one-time secrets for online banking. We have seen versions that target the Swiss ZKB and the Austrian Erste Bank. Note, however, that it will be easy for the malware's author to adapt the Trojan to other banks.

The app's login screen.

On startup, the app displays the impersonated bank's logo and asks for the user's password. After users enter their data, they are presented with a "security code", which is supposed to be used with their online banking account.

The app imitates the behavior of legitimate applications for two-factor authentication. A benign app would generate authentication codes as a second proof of identity for the user when banking online, in addition to the usual user name and password. By imitating this behavior known to the users, the Trojan attempts to gain their trust.

The fake security code presented to the user.

Going deeper

Now let's take a look at what happens behind the scenes. Since the app does not use any code obfuscation, we can decompile it, producing Java-like code that is more easily readable than the Dalvik bytecode that we would normally be looking at. Thus, for this blog post, we employ the freely available tools dex2jar and jd-gui to create a pretty readable Java representation of the app's code.

Our decompilation using jd-gui.

The interesting functionality is located in packages named com.gmail.*. We will not go into too much detail on the code itself but rather sum up our findings.

Stealing information

Looking at the app from the user's perspective, we have seen that it asks for credentials. As expected, whatever the user enters is sent out to the malware's author. Also, the app monitors incoming SMS and forwards them to the attacker (potentially to capture Transaction Numbers for online banking). Along with the stolen data, the Trojan reports various details of the victim's phone, such as the version of the operating system, the phone number, carrier and country code.

Let's focus a little more on how the stolen data is transferred from the victims' phones to the attacker. Apparently, the author of this Trojan has gained unauthorized access to a number of legitimate web servers that now act as intermediaries between infected phones and the attacker's server. The infected phones only know the URLs of the compromised web servers, so that the destination of the exfiltrated data remains hidden even when the Trojan app is analyzed. Tracing the data further would require access to one of the intermediaries. This is a common, very basic strategy for attackers to hide and stay flexible.

We have informed the administrators of the compromised web servers that we learned about when analyzing our samples. Unfortunately, none of them has provided us with additional information about the attack so far.

Configuration

The remote addresses that the Trojan contacts to upload stolen information are configurable. The app comes with a basic initial configuration, and it regularly tries to retrieve updated versions (potentially tailored to the victim's phone) from the Command-and-Control servers. The configuration files are encrypted using the Blowfish algorithm.

This is the (anonymized) initial configuration of one of our samples:

The various url_* parameters specify the targets of the Trojan's exfiltration and update activity.

Runtime checks

The malware conducts some simple checks to thwart dynamic analysis. We found a function isEmulator that tries to detect if the app is executed in the standard Android emulator, a common technique for basic analysis. Essentially, it compares some properties of the potentially emulated phone with the known default values used in the emulator.

The function isEmulator.

Furthermore, there is a location check in one of our samples: The app obtains the SIM's country code and asserts that it is either Austria, Switzerland or Russia. While the first two countries align well with the area of operation of the targeted banks, the last one feels somewhat out of place. It may or may not hint at a Russian developer - always bear in mind that the criminals who develop such malware might deliberately place such details to set us on the wrong track.

SMS commands

As a last feature, we discuss the Trojan's ability to receive commands via SMS. We identified the commands START, STOP and DEL. The first two start and stop an SMS forwarding service, respectively: When the service is active, it forwards every incoming SMS to a number specified in the START command.

The last command, DEL, triggers removal of the Trojan app. Android apps need the user's approval to uninstall themselves or other apps, so the malware applies a trick to get the user's consent: It displays a message announcing the availability of an update; the user is told that a new version is installed after removing the current one. When they press the "update" button, the app initiates the removal process, to which the user will likely consent after having been informed about the alleged update. The removal mechanism allows the malware to cover its traces after having stolen all the information that the attacker was interested in.

Lastline analysis report

Below, you see a screenshot of how our analysis engine presents the results of its automated analysis. The details are very similar to what we have discussed in this post: The system detects the encrypted configuration file, and it finds that the app sends out data, in SMS as well as encrypted over the Internet. All in all, the app's threat potential is scored at 100 out of 100, yielding the final verdict "High Risk - Malicious behavior detected."

Results of the Lastline Analyst.

Conclusion

We have looked at the functionality of a typical Android banking Trojan, both from the user's perspective and from the view of an analyst. Our analysis usually centers around the technical side of things and, in particular, the question how to detect such threats automatically. However, we also believe that it is important to raise awareness with the users and to caution them to maintain a healthy measure of skepticism when it comes to giving apps access to their phones and even entering data as important as banking credentials.

Looking for protection against advanced malware?

Lastline Enterprise provides premier malware protection for web, e-mail, content, and mobile applications, deployed anywhere on your physical or virtual network. Learn more here:

↧

Antivirus Isn't Dead, It Just Can't Keep Up

May 21, 2014, 10:00 am

≫ Next: Detecting Keyloggers on Dynamic Analysis Systems

≪ Previous: Analyzing a banking Trojan

Much has been said in recent weeks about the state of antivirus technology. To add facts to the debate, Lastline Labs malware researchers studied hundreds of thousands of pieces of malware they detected for 365 days from May 2013 to May 2014, testing new malware against the 47 vendors featured in VirusTotal to determine which caught the malware samples, and how quickly.

The focus of this test is to determine how fast the anti-virus scanners catch up with new malware.

Note that the configuration of the various antivirus scanners used by VirusTotal is not necessarily optimal, and it is always possible that a better detection rate could be achieved by relying on external signals or using more “aggressive” configurations.

On any given day, according to Lastline Labs’ analysis, much of the newly detected malware went undetected by as much as half of the antivirus vendors. Even after 2 months, one third of the antivirus scanners failed to detect many of the malware samples. By averaging the daily detection rates, we are able to plot the pace at which theantrivirus scanners catch up with the malware. The least-detected malware - that is the malware in the 1-percentile “least likely to be detected” category - went undetected by the majority of antivirus scanners for months, and in some cases was never detected at all.

Some other interesting findings of this Lastline Labs research:

On Day 0, only 51% of antivirus scanners detected new malware samples
When none of the antivirus scanners detected a malware sample on the first day, it took an average of two days for at least one antivirus scanner to detect it
After two weeks, there was a notable bump in detection rates (up to 61%), indicating a common lag time for antivirus vendors
Over the course of 365 days, no single antivirus scanner had a perfect day - a day in which it caught every new malware sample
After a year, there are samples that 10% of the scanners still do not detect

View Full Size

Top 1% of malware evolved against antivirus patterns

As you can see in grey lines in the chart above, there is a steady growth curve in the detection rates from Day 0 to Day 365 of the average malware. This pattern mostly mirrors that in the 1-percentile malware trajectory (percentiles based on least detected) which are likely more sophisticated or unique. The 1% of malware that most effectively evaded detection in this dataset is likely to represent the kind of advanced malware created and exploited by cyber-criminals who are persistently and directly targeting and infiltrating organizations, as opposed to more opportunistic malware distributors.

Antivirus alone is not enough

For us, this preliminary dataset leaves us with as many questions as answers. This analysis does not single out any antivirus vendor, and provides only insights based on VirusTotal data (with the caveats expressed at the beginning). We think that “traditional” AV technology is not dead, but needs to be complemented with other approaches (e.g., based on dynamic analysis of samples, network anomaly detection) that provide additional signals for detection.

In future analyses, we will be looking for patterns in the least-detected malware that may indicate common trends or behaviors that could help all network security - including antivirus scanners - improve malware detection effectiveness and speed. This data definitely points to the conclusion that antivirus alone is not enough.

More research required

We plan to test further and compare the effectiveness of traditional sandboxing with next-generation sandboxing. Our hypothesis is that the least detectable malware is designed to both evade detection and fingerprint the analysis environment. From what we have seen so far, no commercially available signature-based security system appears to be able to get ahead of advanced malware on its own.

Looking for protection against advanced malware?

Lastline Enterprise provides premier malware protection for web, e-mail, content, and mobile applications, deployed anywhere on your physical or virtual network. Learn more here:

↧

Detecting Keyloggers on Dynamic Analysis Systems

May 28, 2014, 9:00 am

≫ Next: Analyzing an “Ultra-Advanced APT Tool” Using High-Resolution Dynamic Analysis

≪ Previous: Antivirus Isn't Dead, It Just Can't Keep Up

Authored by: Kevin Hamacher, Dario Filho, Clemens Kolbitsch

One notorious functionality present in many variants of today’s advanced malware is the ability to steal sensitive user information. Taking control of a targeted machine, an adversary has basically unlimited abilities to secretly monitor the actions performed by an unsuspecting victim who uses the infected machine. The type of data stored on a typical machine, and to which the attacker has access to, ranges from user account credentials (such as usernames and passwords), to financial data (such as credit card numbers or transaction secrets), and even personal data (such as social security numbers).

Very often, malware is specialized in capturing and identifying different types of information typed by the victim, allowing the collection of very specific information in which the attacker is interested. Typical examples are information entered in a login form for a specific URL, or values that resemble social security numbers (SSN) or credit card numbers.

Automatic identification of this type of attack using traditional dynamic analysis (inside a sandbox) is tricky, as most activity requires a user to trigger the attack. Such a trigger could be using username and password to authenticate to a service, or entering user-sensitive information into an application or website.

In this post, we describe some of the ways attackers manage to collect sensitive information on an infected machine and how Lastline's high-resolution dynamic analysis engine is able to trigger and detect this type of malware.

How Keyloggers Capture Data

When looking at keyloggers, we can typically distinguish three basic types: User-mode keyloggers, kernel-mode keyloggers and hardware-based keyloggers. In this post, we will focus on the first two types - software-based keyloggers. Hardware-based keyloggers work by intercepting data sent from external devices (such as keyboard or mouse) to the computer hardware, and are thus outside the reach of most remote attackers.

Overview of keylogger methods

In user-mode keyloggers, a very common approach used to steal information typed by the user is through the use of the windows API SetWindowsHook. This API can be used to intercept events from the system, such as keyboard and mouse activity. When the to-be-intercepted action is triggered, a function of the attacker’s choosing is executed. Another user-mode method for capturing keystrokes that we found in many malware variants consists of continuously checking the system’s keyboard state using the GetAsyncKeyState or GetKeyState API functions. Different from the first method, which notifies you at every keyboard event, the attacker here needs to actively monitor which keys are pressed.

Kernel-mode keyloggers are more powerful than their user-mode counterparts, as they work with higher privileges, but are inherently more complex to implement. This type of keylogger uses filter drivers to intercept keystrokes received from the keyboard or modify internal Windows kernel structures in order to capture input data. The complexity and mostly-undocumented nature of kernel code can lead to malfunction of the system if a sample is executed on an unsupported system, making user-mode keyloggers a more prominent approach.

Analyzing Keylogger Software

As mentioned earlier, when executing malware with keylogging abilities on a traditional analysis system, this functionality typically remains unnoticed. This is because, unless a user triggers specific actions, the malware will not be able to capture anything during the analysis.

To reveal this type of behavior during the automated analysis of a sample, the system needs to mimic user interactions such as keyboard typing and mouse movement. Interestingly, this is far from simple since some malware families only capture data when the key is pressed within a specific application or window, such as Internet Explorer or Firefox, for example.

Further, the analysis framework must be able to identify the type of application and/or information that is relevant for the malware under analysis. For instance, if a keylogger only targets information on financial transactions (such as in a breach of Point-of-Sale terminals late last year), injecting other user information would not reveal any interesting behavior. If the sandbox fails to identify the data in which the attacker is interested, the keylogger remains inactive, disguising itself from the analysis system.

High-Resolution Analysis of Keyloggers

The Lastline high-resolution dynamic analysis engine has the ability to simulate specific user interactions during the analysis of a sample. Additionally, the system identifies the type of data that the malware sample is searching for and can simulate user behavior accordingly. For example, the system instruments a virtual user to use fake credit card information to do financial transactions, post user credentials to various services, or use email addresses in a way that might attract the attacker’s attention.

Once the keylogger starts collecting the injected data, the system tracks how the malware under analysis uses the stolen information, which gives vital information on the attacker’s goals. For example, the system tracks if the stolen data is written to the file-system or is sent to the attacker using a command-and-control channel.

Chewbacca analysis report

Chasing Chewbacca with Lastline

Chewbacca [2] is a trojan used in a Point-of-Sale malware operation to log keyboard data of certain systems [3]. Besides capturing keyboard activity, the malware can also scan the memory of other processes for credit card numbers using regular expressions. Clearly, all of this happens without any signs of infections visible to the user.

Windows desktop showing no signs of the keylogger

When executed, the malware creates a copy of itself named spoolsv.exe under the Windows Startup folder. After that, it will start capturing all keyboard events from the system, logging them to a file named system.log.

Chewbacca data collection hook installation

The above figure shows the code used by Chewbacca to install the hook procedure on the system after its execution. We can see that the malware uses the Windows API SetWindowsHookEx to install the hook, with hook type WH_KEYBOARD_LL, which is used to capture low-level keyboard input. Also, to capture all keystrokes from the system, the malware sets parameter dwThreadId=0, which will install the hook on all existing threads running on the same desktop.

During its execution, whenever a keypress event happens, the callback KEYLOG_$$_HOOKPROC$LONGINT$LONGINT$LONGINT$$LONGINT will record the keys pressed along with the window title on which the keypress was recorded. Additionally, the software records when the window focus changes, logging the captured information to system.log.

The Lastline analysis engine gives access to files generated and network traffic captured during the analysis

Captured data shown in the analysis report

When analyzing Chewbacca inside Lastline's high-resolution analysis engine, the system identifies the keylogging functionality, clearly marking it in the report overview. Additionally, the report contains detailed examples of data the malware sample extracted, as can be seen in the figure above.

Text content of the data captured by the keylogger into file C:\Users\...\Temp\system.log

Raw file content of the data captured by the keylogger into file C:\Users\...\Temp\system.log

As one can see, the report even shows examples of the sensitive data that was targeted by the attacker, which, in this case, includes credit card information and user passwords.

Conclusion

Keylogging functionality in advanced malware is a severe threat to user data, but traditional analysis systems are often blind to this vector of attack. Lastline’s high-resolution analysis engine tackles this threat in two ways: First, the analysis engine identifies that a keylogger was installed on the victim machine and identifies the type of information the attacker is targeting. Second, the system uses the collected information to instrument a virtual user to inject specific data to trigger the keylogger’s functionality. The injected data is then tracked throughout the analysis system to monitor how the malware processes (and potentially leaks) the stolen information.

REFERENCES:

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms644990%28v=vs.85%29.aspx

[2] https://www.virustotal.com/en/file/31d4e1b2e67706fda51633b450b280554c0c4eb595b3a0606ef4ab8421a04dc9/analysis/

[3] https://blogs.rsa.com/rsa-uncovers-new-pos-malware-operation-stealing-payment-card-personal-information/

↧

Analyzing an “Ultra-Advanced APT Tool” Using High-Resolution Dynamic Analysis

May 29, 2014, 11:02 pm

≫ Next: An Analysis of PlugX Using Process Dumps from High-Resolution Malware Analysis

≪ Previous: Detecting Keyloggers on Dynamic Analysis Systems

Every AV I've tested is helpless against Violent Python attacks; the only good defense I've found is @LastlineLabs
— Sam Bowne (@sambowne) May 27, 2014

Earlier this week, Sam Bowne (@sambowne) posted a nice example of how to write a simple keylogger in a few lines of Python. He used this code to evaluate a few sandboxes, including Lastline. The full code can be found on Sam’s blog, but the essential lines can be seen in the following snippet of code:

Python keylogger used to test APT solutions

As one can see, the code uses the Windows API for registering callbacks to invoke on key-press events, as we describe in our recent blog-post on the internals of keylogging malware. The recorded key-press events are collected and sent to the attacker (in this case, they are uploaded to pastebin) as soon as the return key (or enter key with key-code 13) is pressed (see the highlighted code regions).

High Resolution Dynamic Analysis

As we described in the previous blog-post, the Lastline high-resolution dynamic analysis engine recognizes when programs under analysis attempt to steal sensitive information of the user using keylogging activity.

Below, one can see the report of an analysis run in our system. The analysis sandbox identifies that the program is hooking the key-press events and instruments a virtual user to start behaving in a way that could trigger interesting behavior in the malware program.

Advanced Keylogger Malware Analysis Report
Analysis report overview - Keylogging behavior detected

Analysis subject overview highlighting interesting keyboard activity in Analysis subject 2

Since this sample is not looking for any specific data (such as credit card information or account credentials), our system uses data that seems generally attractive to an attacker as can be seen in the screenshot below.

Keylogger Malware Analysis Report Details
Report showing captured keyboard data

Network traffic showing upload of the stolen data

“Evil APT Tool”

An attentive reader might have noticed that the code shown above slightly differs from the code posted by @sambowne originally. The main difference is that we decided to make the “evil APT tool” (as Sam calls it) slightly more realistic/real-world applicable:

The original program assumes users will follow instructions (and start the tool using the return key) and confirm inputs using the same return key. However, most of today’s interfaces (such as web-forms in browsers, or GUI applications) do not make much use of this key any longer, but instead use “submit” buttons that can be triggered using the mouse or many other ways through the keyboard. Therefore, our modified program simply uploads data whenever enough has been captured, as shown in the highlighted code regions.

At this point, we should address that Sam originally posted that the Lastline system fails to identify this sort of attack (which inspired us to look into the sample and write this short post). The main difference between the two outcomes is the modified return-key behavior mentioned above and not requiring user-confirmation to start the actual payload. Sam’s proof-of-concept code illustrates nicely how to write the keylogger in Python, but a real-world attack would not make these assumption. We acknowledge, however, that we should identify this sort of activity and extend our virtual user to follow the instructions accordingly.

Conclusion

Sam did a great job showing how easy it has become for attackers to quietly steal an unsuspecting victim's personal data. Traditional sandboxes fail at analyzing this type of threat despite that many of today’s advanced malware families contain this sort of behavior. The Lastline high-resolution malware analysis engine, on the other hand, correctly identifies and detects the behavior as shown in the example analysis report.

↧

An Analysis of PlugX Using Process Dumps from High-Resolution Malware Analysis

June 5, 2014, 10:54 am

≫ Next: Dissecting Payload Injection Using LLama Process Snapshots

≪ Previous: Analyzing an “Ultra-Advanced APT Tool” Using High-Resolution Dynamic Analysis

Targeted attacks and so-called APTs (advanced persistent threats) come in many forms and colors. Very often, in-house malware analysis teams want to go beyond the detection information offered by traditional analysis systems (which often only says if a program looks malicious or not). The Lastline High-Resolution analysis engine exposes a lot of details describing the malware behavior, such as file-system modifications, changes to the Windows registry, interesting network communication, and it even highlights sophisticated evasion attempts completely automatically.

But sometimes, an analyst wants to go even beyond that and get a deeper look into the program binary. This can be useful for research purposes, finding a more effective remediation process, or just because some people need to know it all.

Sadly, the ability to perform in-depth static analysis of APT malware is far from easy, even the most powerful tool, such as the IDA Pro disassembler or WinDbg debugger, are sometimes not enough. Malware has been using polymorphic packers or so-called “protectors” as well as other sophisticated tricks to prevent analysts to get their hands on or understanding payload functionality.

The good news is that modern dynamic malware systems are very often immune to many of these obfuscation tricks that hinder static analysis in tools like the IDA Pro disassembler. To bridge the gap between the dynamic and static analysis, Lastline now provides an efficient and universal unpacker as an integral part of the engine performing the advanced dynamic analysis. This gives an analyst the ability to look into a malware sample at various stages during the dynamic analysis, eliminating barriers to static analysis.

So, do you know how APTs are attacking your company in detail? Do you want to know all possible functionality of the malware used in a targeted attack against you? In a series of blog posts, we will show you how easily you can load a fully-unpacked snapshot of a malware sample taken by the Lastline analysis engine.

LLama versus PlugX

One component of the Lastline analysis engine is a full-system emulator, we internally refer to as LLama (short for Lastline Advanced Malware Analysis - we really like words starting with Ls ;) ). In addition to exposing a sample’s behavior to the Lastline Analyst, LLama also acts as a universal unpacker by running the sample inside a guest operating system.

LLama fights many different forms of evasion attempts present in advanced malware, as we described in various previous blog posts. Therefore, it is much more powerful than (and goes far beyond) launching a program inside a virtual machine with an attached debugger.

In this blog post, we will demonstrate the system in action using as an example a recent variant of PlugX (described in a previous blog post).

Malware Family: PlugX
md5: 220f376a58123329617249e87bb7e6bb
VT link: https://www.virustotal.com/en/file/6d579c3ab1a31719120da90e7b7aa639df65d45b9af666addd0ab0e573a6e9e1/analysis/
Full analysis result link: https://user.lastline.com/malscape#/task/f7b5c2293e574d069e0a48bcd7691b16 (accessible to Lastline customers only, sign-up now)

Due to the internal structure of this PlugX sample, static analysis has become quite complex. To give a short overview of the infection process, see the chain of events below - for the full details, refer to the PlugX post (Section Process of Infection):

The rarsfx archive

first drops three files into the %TEMP% directory:

EmpPrx.exe - a benign file with valid digital signature.
EmPrxRes.dll - an auxiliary DLL imported by EmpPrx.exe; It contains (fake) exports, identical to the legitimate DLL that EmpPrx.exe is expecting.
EmPrxRes.dll.dat - not a PE file, but a file containing position-independent code, consisting of a decryptor and an encrypted malicious image. It also contains encrypted settings.

then starts EmpPrx.exe

During the EmpPrx.exe loading process, the Windows loader looks for “EmPrxRes.dll” in the current directory, finds it, and loads the DLL which was dropped by the rarsfx archive. This technique is known as dll-load-order-hijack, where a local DLL imitates - and is loaded instead of - a legitimate library.

EmPrxRes.dll (in the DllMain function) patches the entry point of the EmpPrx.exe image in memory (which has not started execution at this point).

The original entry point of EmpPrx.exe is replaced with a jump instruction that transfers the execution to a function loading the position-independent code from the file EmPrxRes.dll.dat

To do an in-depth static analysis of these components and events, one needs to have all three components in memory, and do a step-by-step analysis inside a debugger. Further, since this malware uses position-independent code, simple dumping of memory does not expose any import tables. As a result, recognizing API functions correctly becomes very difficult.

The Lastline analysis engine already provides the analyst with an overview of behavior exhibited by malware. Additionally, the reports contain in-depth results for each interesting behavior observed (but omitted in this post).

1-advanced-malware-malicious-activity-summary
PlugX behavior overview

Behavior overview of PlugX malware variant

As described earlier, in addition to the exposed behavior, the LLama engine exposes multiple process dumps for further analysis by an analyst.

2-malware-snapshots-taken-during-dynamic-analysis

Snapshots taken during dynamic analysis to ease static analysis

Each analysis subject has a few process dumps (or snapshots) taken at different stages of the analysis. A snapshot is taken whenever the LLama engine considers an observed functionality (or the memory content) to be interesting.

For example, one of the snapshots above were triggered after observing a call to a critical API function from an allocated, untrusted memory region, which typically means code was first unpacked and then executed.

Bridging the Gap Between Static and Dynamic Malware Analysis

Each exported process dump is a full PE image, and each section represents a loaded code module or a memory block allocated by the program. This allows the exported dumps to be opened by a wide range of analysis tools. What is even more interesting is the fact that the dumps contain only memory regions considered interesting by our engine. This means that an analyst does not need to analyze several megabytes of unrelated process memory and can focus on relevant code and data regions right from the start: The average size of the process dump usually doesn’t exceed a couple hundred kilobytes.

Without doubt, one of the most popular tools for analyzing PEs is IDA Pro (usually in combination with a decompiler plugin). To even further simplify analysis in IDA, we provide a Python script to post-process a PE snapshot, reconstructing the program imports to ease analysis. Additionally, this processing step adds bookmarks to several points of interest, highlighting interesting code execution / entry points. This script is integrated in the Lastline Analyst help and report web-interface (it can be found by clicking on the question mark next to “Process Dumps” report section).

3-direct-process-snapshot-integration-malware-analysis-report

Direct process snapshot integration in analysis report

4-malware-process-snapshot-integration-lastline-analyst

Process snapshot integration in Lastline Analyst documentation

After loading a process dump and running the Python script, IDA Pro displays two new tabs highlighting additional analysis metadata, for example the reconstructed API import table (even for packed malware).

5-reconstructed-malware-PE-import-table

Reconstructed PE import table

In addition to the standard PE image import tables, LLama also reconstruct other custom tables containing virtual addresses, often used by packed malware variants. These links are fully interactive and allow to navigate to the highlighted function or code regions.

6-reconstructed-malware-lastline-custom-import-table

Reconstructed custom import table

In combination with the Hex-Rays decompiler, the process memory dumps enable users to analyze the source code of the unpacked program at different stages of execution. Clearly, this vastly simplifies the in-depth analysis of the malicious program and provides a powerful tool to get a fast understanding of a malware’s functionality.

Dissecting PlugX

For demonstration purposes, let’s open and analyze a process snapshot of analysis subject 2 (triggered by an API call as described earlier). Above, one can already see the fully reconstructed PE import tables, as well as custom call tables used by the malware.

Now, let’s look at the list of code regions the LLama engine considered to be of interest:

7-ida-table-points-of-interest-code-execution

IDA table showing “points of interest”, such as code execution after unpacking

As one can see, LLama highlights a total of four different code regions (and gives a short description of each). The first two regions are return addresses from system calls that were considered “interesting” and we can immediately jump to these code regions:

0x003df9bb return address after a call to CreateDirectoryW

8-createdirectoryw-lastline

0x003d3124 return address from the CreateFileW

9-createfilew-malware-research

These two functions called by the code above use the NtCreateFile system call to create a directory and file, respectively. By analyzing the call stack, the LLama engine finds return addresses pointing to untrusted code regions. In our case, they point to position-independent code, which makes it even more relevant for analysis. These points could be used as a starting point for a payload analysis.

The remaining two points of interest highlighted in IDA Pro are the original entry points to the PE images described in the infection chain earlier.

0x00434066 is the entry point to EmpPrx.exe, and
0x10001160 is the entry point to EmPrxRes.dll

To find out what the malware does in these entry points, we decompile the code region, which reveals two very interesting behaviors:

10-interesting-malware-behaviors

First, the program uses a time-trigger to hide parts of its functionality based on the local system time (it only executes this code after 2014/03/03). If this check is passed, the function patches the memory located at an address relative to the image base - more concretely, it patches data at location base(EmpPrx.exe) + 0xc178.

Since our process dump contains all memory buffers allocated by untrusted code, this patched code is also available in IDA:

11-untrusted-malware-behaviors-ida

As one can see, the process dump includes all stages of the infection chain. Additionally, every interesting component of the infection is detected and highlighted by the analysis engine, allowing straight-forward, in-depth analysis of this malware sample.

Conclusion

Code packing and other code obfuscation techniques make the static analysis of practically all modern malware variants very difficult. A great way to bypass these tricks is running malware in a dynamic malware analysis environment or debugger. Sadly, many malware variants detect these analysis environments and refuse to execute (correctly).

The Lastline high-resolution analysis engine combines the best of both worlds by bypassing dynamic evasion attempts to enforce execution inside the LLama full-system analysis engine. At the same time, it exposes unpacked process dumps that can be used with a wide range of analysis tools.

For analyzing these dumps in IDA Pro, we provide means to load a process dump as if no code obfuscation technique was present: We reconstruct import tables, strip uninteresting code regions, and expose all allocated memory for analysis. This allows to get an in-depth look into sophisticated APT malware in basically no time.

↧

Dissecting Payload Injection Using LLama Process Snapshots

June 17, 2014, 6:00 am

≫ Next: Exploit Analysis via Process Snapshotting

≪ Previous: An Analysis of PlugX Using Process Dumps from High-Resolution Malware Analysis

In our last blog-post on process snapshotting, we showed how process snapshots (or “dumps”) allow bridging the gap between dynamic and static analysis. In this post, we want to continue along this line and describe a related problem security analysts face: Analyzing code injections in analysis tools such as IDA Pro.

Injected code is particularly tedious to analyze when working on traditional program dumps. The reasons for this are manifold: The injected code is not part of the process image and thus hard to locate, parts of the memory used by the payload might no longer be resident in memory at the time of the snapshot, recognizing addresses of API functions is painful and error-prone, just to name a few.

Not when using LLama process dumps!

As we will show in this post, the high-resolution dynamic analysis engine automatically finds all code-regions related to injected code. It keeps track of memory blocks allocated by the untrusted code running in the target process, and includes them in the process dump for later analysis. Last, unrelated code-areas (for example unmodified, trusted code of the process that the malware injected into) are removed from the process dump, making it small and straight-forward to analyze.

LLama versus Shiz

Just like our previous post, we show how our process snapshots guide the analysis by looking at a concrete malware family:

Malware Family: Trojan.Win32.Shiz
md5: 09c98a1ecd554e2f8cec98b314335e3d
VT link: https://www.virustotal.com/en/file/377f6d302fbe45823b3e8c88319a31b8128ab23c74c691b59d33f13fbf98a225/analysis/
Full analysis result link: https://user.lastline.com/malscape#/task/f48cefc3253c40e6abd6e3da86aed508
(accessible to Lastline customers only, sign-up now)

Let’s start with an overview of the behavior extracted during dynamic analysis:

High-resolution analysis report overview

What is particularly interesting is the code injection into the Explorer.exe process as shown in the analysis report details:

The analyzed program injects code into Explorer.exe process

From the context of Explorer.exe, the injected code then propagates itself to a number of other processes running on the system as can be seen in the analysis subject overview:

Propagating payload from Explorer.exe into all running processes

Clearly, the injected data is very interesting for further analysis, since it could contain a lot of valuable information such as C&C endpoints or URLs, malware configuration files, password lists used for brute forcing, and much more. So let’s have a look at the process dumps provided by the dynamic analysis run:

Process Dumps generated during analysis of Trojan.Win32.Shiz

Dissecting Shiz

Let’s continue our analysis by looking into one of the process dump for the second analysis subject - the Explorer.exe process with injected code. More precisely, let’s dive into the snapshot taken on invocation of an API function from an untrusted memory region.

Upon opening and processing the snapshot (see last blog-post for details), IDA display the list of interesting code locations exposed by the processing script:

IDA showing code locations of interest

Now let’s look into these points of interest a bit more closely:

0x03cc1360 is the originally injected, position-independent code. What’s interesting about this code is that it uses hashed names (highlighted in orange) of API functions to call system APIs

0x03cc2000 points to a PE image embedded in the target process. Although the malware did not load this memory as library, it can still be used by the injected code, which makes it very interesting for analysis and is thus exported.

0x03e921c0 is the entry point to a PE image loaded by the position-independent shellcode. Unlike in the previous entry, the injected code uses the Windows API for loading the code.

If we jump to the definition of function sub_3e92500 we find the logic for injecting code into the remaining processes running on the system (as described above).

A quick analysis using the Hex-Rays decompiler (often integrated in IDA Pro) immediately reveals very interesting behavior, such as the processes targeted (including Chrome, Opera, and Internet Explorer browsers):

0x03e8671b and 0x03e8f3a5 are return addresses from system call invocations showing C&C related functionality of the malware sample. The sample uses a domain-generation algorithm (DGA) to contact the attackers server and then posts data via HTTP

Network traffic generated by the injected code

:

Conclusion

This blog post shows how to use process snapshots generated during dynamic analysis to get detailed information about the internals of malware making use of code-injection using a couple of easy-to-follow steps. Even without any debugging, it is possible to do in-depth analysis of the even complex malware variants.

↧

Exploit Analysis via Process Snapshotting

July 29, 2014, 10:48 am

≫ Next: A Look at Advanced Targeted Attacks Through the Lense of a Human-Rights NGO, World Uyghur Congress

≪ Previous: Dissecting Payload Injection Using LLama Process Snapshots

In this third post in our blog series on process snapshotting (see previous posts on PlugX and Shiz’ code injection), we will show how to dissect exploit payloads using the LLama full-process snapshot functionality.

Document exploits are particularly tedious to analyze using traditional analysis tools, as the vast majority of code (and/or data) located in the exploited process’ memory are benign (that is, are unrelated to the actual exploit). Lastline’s high-resolution malware analysis engine is able to track all data generated as part of opening/rendering a document, and, in turn, limits the process snapshots exported for analysis to those parts relevant to the exploit (and subsequent shellcode and payload).

LLama versus CVE-2012-0158

Like in previous posts, we will walk through the concrete analysis steps by looking at a real-world exploit:

Malware Family: Exploit.Win32.CVE-2012-0158
md5: 0ad65802ba0dc1ae0fa871907f83b729
VT link: https://www.virustotal.com/en/file/faced273353acea599d96e46c47d352beded0cc753867f02aa5f32744b692c1d/analysis/
Full analysis result link: https://user.lastline.com/malscape#/task/9c7452b11aa64204ad9e30cd7d1688fc (accessible to Lastline customers only, sign-up now)

The analysis engine is able to identify the execution of malicious code at runtime using various mechanisms. This allows an analyst, for example, to look at stack code execution after exploiting a vulnerability, as well as follow the entire infection process after the exploit code has been executed. Additionally, the analysis engine keeps track of all stages of the exploit, if the malicious payload happens in multiple phases, as one can see in the example below.

To start, let’s first look at the behavior observed during the dynamic analysis run, as seen in the behavior overview:

Behavior summary of exploit and dropped file

Analyzing Multi-Stage Document Exploits

The behavior summary already gives us a good idea about what the exploit, and subsequently dropped malware sample, did. But for this blog post, we want to focus on the how rather than the what, so let’s look at the stages of the exploit from the perspective of the process snapshots (or “dumps”) generated inside the vulnerable program (in this case Microsoft Word 2003/2007, labeled Analysis Subject 1 below):

Exported process snapshots

As one can see, three process snapshots have been generated for the MS Office program, each representing one stage of the exploit. So, let’s download all three process snapshots for the primary analysis subject and analyze them in IDA Pro!

The first snapshot was taken when the analysis engine detected code execution in stack memory, which already hints at the first stage of the exploit (executed right after triggering the vulnerability).

Opening the downloaded snapshot in IDA Pro, the overview points us right to the position-independent shellcode executed in the first stage:

NOP sled and first stage of the exploit

As one can see, the shellcode uses function name hashing to obfuscate the real semantics of the code. When using this trick, malicious code finds API functions indirectly by iterating through available imports and hashing the functions’ names. The idea is to obfuscate static strings used in many exploits to evade static analysis techniques.

Clearly, this trick does not work on a dynamic analysis system, as the executed code still has to jump to the API functions of interest, revealing the true behavior of the shellcode. Below, we highlight the individual functions used by the code:

NOP sled and first stage of the exploit (after de-obfuscation)

So, with very little work for identifying hashed function names, we can see the first stage of the exploit opening the original document to read, decrypt, and finally execute the second stage in memory.

This same information is also shown in the process snapshot overview: The second snapshot was taken after observing the execution of untrusted code in memory - exactly as described above. After opening this snapshot in IDA Pro, we see the second stage of the exploit:

First, the code obtains the base address of kernel32.dll from the PEB

which it then uses to read a list of API function addresses

used for executing API functions later on.

After obtaining a list of function names, the code hashes each function name, and builds a mapping table between hashed function name and function address. This way, it can later execute individual functions by looking them up via the hashed function ID in the shellcode.

Function name hash to address mapping built by second stage

The invocation of API functionality can be seen in the third process snapshot, labeled Observed API function invocation from untrusted memory region. When the analysis engine observes a call to an API function, it triggers another snapshot, allowing us to inspect the second exploit stage after the function hash table has been populated.

After opening this last snapshot in IDA Pro, the overview immediately highlights function tables (as mentioned in our previous blog post):

Function tables highlighted in process snapshot

Since the snapshot was taken after the hash-to-address mapping has already been done by the malware sample, this table contains API function addresses. Since shellcode uses a reference to this table for all API function invocations, and to simplify further analysis, we create a struct that contains this table of API function addresses as members.

Extracted function table struct

With this in place, it is easy to recognize the remaining behavior:

As one can see from that code above, the shellcode has the ability to drop executable files (EXE as well as DLLs), and invoke them subsequently:

12-apt-malware-drop-executable-files-EXE-DLLs

In the analysis run that we have looked at for this post, the shellcode restarts MS Word with a dropped fake document to avoid raising suspicion with the user under attack. The rest of the analysis report contains details on the payload dropped by the exploit. Just like any other analysis report, it also contains full process snapshots for the processes spawned after successful exploitation.

Summary

Full process snapshots generated by Lastline’s high-resolution analysis engine provide a very easy way to observe all stages of document exploits. Each snapshot contains only information relevant to the exploit, executed shellcode, or subsequently-executed payload.

In combination with the full analysis report summarizing interesting behavior as well as the full activity of the dropped malware, the process snapshots provide an analyst with the perfect tool to analyze the full spectrum of malware behavior, from high-level API calls to low-level shellcode, without requiring a lot of manual work.

↧

A Look at Advanced Targeted Attacks Through the Lense of a Human-Rights NGO, World Uyghur Congress

August 12, 2014, 10:00 am

≫ Next: Rogue Online Pharmacies Use Fake Security Seals and Content Obfuscation to Deceive Humans and Programs

≪ Previous: Exploit Analysis via Process Snapshotting

In my capacity as an academic researcher at Northeastern University, I collaborated with computer scientists Stevens Le Blond, Adina Uritesc and C´edric Gilbert at the Max Planck Institute for Software Systems as well as Zheng Leong Chua and Prateek Saxena at the National University of Singapore to study cyber-attacks against the human-rights Non-Governmental Organization (NGO) representing the Uyghur ethnic minority group living in China and in exile: World Uyghur Congress (WUC). Our findings illuminate a series of apparently targeted, sophisticated cyber-attacks deployed against WUC and affiliated organizations and individuals -- with a combination of social engineering and exploits through email (similar to spear phishing) -- over a period of four years.

Two volunteers at WUC provided more than 1,000 suspicious emails sent to more than 700 different email addresses from 2009-2013, including WUC leaders as well as journalists (including at AFP, CNN International, Los Angeles Times, New York Times and Reporters Without Borders) politicians (including in the Socialist Party of the Netherlands and the Chinese Democratic Party), academics (including at Penn State University, Howard University, Syracuse University, George Washington University and the Xinjiang Arts Institute China) and employees of other NGOs (including Amnesty International and Save Tibet - International Campaign for Tibet). We analyzed those emails including any embedded URLs, attachments or files to determine if and how often they contained social engineering techniques, attack vectors, exploits and malware.

We found that the language and subject matter of malicious emails were intricately tailored to appear familiar, normal or friendly, with the sender impersonating someone else to lure the recipient into opening an attachment or URL: all hallmarks of social engineering. The majority of the messages sent to WUC and others were in the Uyghur language, and about a quarter were in English. Emails were sent from compromised accounts inside the WUC organization or from email addresses that were a character or two off from the known email address to trick the eyes of the recipients.

The majority of these first-stage malware attacks were executed through attached documents (rather than. zip or .exe files) using recent but disclosed vulnerabilities that tend to evade common defenses. Interestingly, in November 2010 there was a marked shift from Adobe to MS Office documents coinciding with the addition of sandboxing technology to Adobe Reader and the public disclosure of a stack buffer overflow MS Office vulnerability. Also, the malicious documents sent to WUC contained several different families or classifications of malware. More than 25% of this malware can be linked to entities that have been reported to engage in targeted attacks against political and industrial organizations, and Tibetan NGOs.

We tested existing AV software for effectiveness in detecting the attacks in the WUC emails shared with us. No single tool detected all of the attacks, and some attacks evaded detection from all of the antivirus scanners. Yet we found the attacks in the malicious documents to be quite similar to those used in other recent targeted attacks, rather than attacks using zero-day vulnerabilities. Keep in mind, we were scanning these samples months or years after they had been deployed against WUC. Even so, standard anti-virus (AV) detection software was insufficient in detecting these targeted attacks despite their similarity to known threats because it relies on static signatures rather than malicious behavior profiling. Lastline Labs also recently reported on the inadequacy of traditional AV scanners in detecting advanced malware, using different samples not specifically targeted against an NGO like those sent to WUC.

Engin_Blog_Image

This chart shows AV scanners’ 2014 detection rates of malware sent to WUC from 2009-2013.

Our complete findings from A Look at Targeted Attacks Through the Lense of an NGO will be presented at the UNENIX Security Conference on August 21, and the full paper is available here.

↧

Rogue Online Pharmacies Use Fake Security Seals and Content Obfuscation to Deceive Humans and Programs

September 17, 2014, 1:00 pm

≫ Next: The Malicious 1% of Ads Served

≪ Previous: A Look at Advanced Targeted Attacks Through the Lense of a Human-Rights NGO, World Uyghur Congress

New research being presented tomorrow at RAID 2014 demonstrates that just two signals can automatically and effectively detect hundreds of malicious pages within 150,000 real-world samples with relatively high precision and accuracy: 1) content obfuscation and 2) fake certification seals. The UCSB research paper by Jacopo Corbetta, Luca Invernizzi, Christopher Kruegel and myself entitled “Eyes of a Human, Eyes of a Program: Leveraging Different Views of the Web for Analysis and Detection” dissects these two common techniques used by malicious websites -- particularly rogue online pharmacies -- to mislead web visitors and evade security scanners.

Perhaps one of the more scientifically and sociologically interesting elements of this research is the fact that computer programs and human eyes see the online world very differently. At a basic level, programs see code and parse text that represents actions to be performed while humans see the online world visually, usually by interacting with a browser. So the complex, textual JavaScript that is interpreted by the browser becomes an eye-catching web site with images and text.

Malicious web developers exploit these discrepancies between what programs and humans see to elude automated detection while masquerading as legitimate web sites for their criminal or unethical purposes. For example, there are many malicious websites disguised as legitimate online pharmacies that are in fact peddling in counterfeit goods, selling illegal or controlled substances, stealing personal information and/or distributing malware. In fact, Lastline’s director of research Christian Kreibich co-authored a fascinating paper in 2012 that looks inside the economics of pharmaceutical affiliate programs and uncovers botnets, malware, bullet-proof hosting and more.

To test our hypotheses, we built a “maliciousness detector” using just these two signals:

Content obfuscation: this technique is used by web authors to hide web content from scanning programs, which might recognize patterns that are associated with malicious intent. Some forms of content obfuscation are common on benign websites, such as email and web addresses, so we ignored those.
Certification seals: these are small images bearing the brand of a certification provider of some sort -- including security vendors, payment systems providers, government administrations, NGOs and professional associations. When used without permission, these seals serve to deceive humans into believing the malicious site owner is certified by a reputable organization and therefore trustworthy. When fake, seals generally do not redirect to the actual certification program.

Example Rogue Pharmacy Icons

Six example counterfeit seals found on rogue online pharmacy websites

Ultimately, we’ve determined that content obfuscation and the use of fake seals are both very strong signals for malicious intent. Of the 149,700 pages studied, we found that benign pages rarely exhibit these behaviors. We also uncovered hundreds of malicious pages that traditional malware detectors would have missed, including 400 rogue pharmacy websites displaying fake seals like those above.

While this is by no means a comprehensive way to detect all malicious web pages, we believe this research can contribute to the ever-growing toolshed of cyber-security defenses against Internet fraud. And all of us can learn from this to treat certification seals on otherwise unknown webpages with a healthy dose of suspicion.

↧

The Malicious 1% of Ads Served

November 14, 2014, 11:31 am

≫ Next: Not so fast my friend - Using Inverted Timing Attacks to Bypass Dynamic Analysis

≪ Previous: Rogue Online Pharmacies Use Fake Security Seals and Content Obfuscation to Deceive Humans and Programs

Last week at IMC Vancouver 2014, cyber-security researcher Apostolis Zarras of Ruhr-University Bochum presented a research paper entitled “The Dark Alleys of Madison Avenue, Understanding Malicious Advertisements” that he co-authored along with other researchers including my fellow Lastline co-founder Christopher Kruegel and myself. For this paper, we performed the first large-scale study of ad networks that serve malicious ads or “malvertising,” investigating the safety of 600,000 ads on 40,000 websites.

Our research revealed the widespread and presumably uninvited distribution of malware through online ad networks, dubbed “malvertising.” To detect malicious behavior in ads we used a composition of blacklists and Wepawet, a honeyclient developed at UCSB that uses an emulated browser to capture the execution of JavaScript to identify signs of maliciousness such as drive-by-download attacks. (Side note: Wepawet celebrates its 6th birthday this Friday, November 14.)

The basic idea behind the experiment was to use a real browser to crawl both very popular and not-so-popular web sites, analyzing the ads that were served. If clicking on an ad would lead the browser to a suspicious web site (that is, a host that is deemed malicious by 5 or more public blacklists or a landing page that is suspicious according to Wepawet) then we would mark the advertisement as “malvertisement.”

During this experiment we looked at which services (ad networks, ad brokers, ad providers) delivered the ad that was eventually displayed on the page.

The malicious 1% of ads served

Ultimately, we measured that on average 1% of served ads led to suspicious pages. When multiplied by the millions of ads served every day, that is a sizeable number. Interestingly, entertainment and news websites hosted more malvertising than adult websites. This widespread proliferation of malvertising through unsecured or undersecured ad networks on mainstream websites is a serious threat to both Internet users and the Internet economy.

Malvertising can be prevented in modern browsers by using the sandbox attribute of iframes in HTML5, which can protect those who click on ads from link hijacking (the most common vector for malvertising in our study). Unfortunately, not one website we looked at used this attribute to protect its users.

As stated in the paper presented in Vancouver last week, “one of the greatest and most prevalent cyber-threats facing marketers, advertising and creatives is malware.” When you consider how pervasive malvertising is based on these findings, it could be one of the greatest threats to the Internet as we know it. Thankfully, there are clear steps that can -- and should -- be taken today to stamp out malvertising.

Editor’s note: Some ad networks expressed concerns about the validity of ranking them by the percentage of benign ads in our dataset, which was included in a previous version of this blog post. We have removed that section while we investigate those concerns.

Need Security Breach Detection?

Lastline’s Breach Detection Platform enables security operations to rapidly detect, block and respond to active breaches caused by APTs and evasive malware. Learn more here:

↧

Not so fast my friend - Using Inverted Timing Attacks to Bypass Dynamic Analysis

November 18, 2014, 10:35 am

≫ Next: Ninety Five Percent of Carbanak Malware Exhibits Stealthy or Evasive Behaviors

≪ Previous: The Malicious 1% of Ads Served

We're very happy that a lot of you are enjoying our research. If you'd like to discuss this topic with us, please tweet @LastlineLabs or comment on HackerNews and we'll join you!

Authored by: Arunpreet Singh, Clemens Kolbitsch

Dynamic malware analysis - or sandboxing - has become a central piece of every major security solution... and so has the presence of evasive code in malicious software. Practically all variants of current threats include some sort of sandbox-detection logic.

One very simple form of evasive code is to delay execution of any suspicious functionality for a certain amount of time - the basic idea is to leverage the fact that dynamic analysis systems monitor execution for a limited amount of time, and in the absence of malicious behavior classify a program as benign. On a victim machine, on the other hand, delaying behavior for a few minutes does not have a real impact, allowing the attacker to easily achieve different behavior in the analysis environment and on a real target machine.

The easiest, and definitely most prevalent method of stalling behavior is to make a program “sleep” for a certain amount of time. Since this is such a common behavior, most analysis sandboxes are able to detect this kind of evasion, and in most cases, simply “skip” the sleep. While this sounds like a simple solution, it can have a wide range of unintended effects as we will see in this blog post.

The Power of Procrastination

In our whitepaper Automated Detection and Mitigation of Execution-Stalling Malicious Code we describe the basic principle behind stalling code used against sandboxes:

Stalling code is typically executed before any malicious behavior. The attacker’s aim is to delay the execution of the malicious activity long enough so that an automated dynamic analysis system fails to extract the interesting malicious behavior.

Code stalling can be achieved in a number of ways: Waiting for a specific action of the user, wasting CPU cycles computing useless data, or simply delaying execution using a call to the Sleep() function.

According to MSDN

VOID WINAPI Sleep(  _In_  DWORD dwMilliseconds);
Suspends the execution of the current thread until the time-out interval elapses.

a call to Sleep() will delay the execution of the current thread by the time passed as argument. Most sandboxes monitor the system- or API-calls of a program under analysis and will therefore see this evasion attempt. Therefore, the sandbox is able to detect, and in most cases even react to this, either by patching the delay argument passed to the operating system, by replacing the called function with a custom implementation, or simply by returning immediately to the calling code (skipping the sleep altogether).

Detecting Sleep Patching

Recently, we have come across an interesting malware family that uses this anti-evasion trick used by sandboxes to detect the presence of the analysis environment (one could call it an anti-evasion-evasion trick…)

This malware detects sleep-patching using the rdtsc instruction in combination with Sleep() to check acceleration of execution, as one can see in the following code extract:

In summary, this code:

executes rdtsc, which reads the CPU’s timestamp counter, and stores the timestamp in a temporary value,
invokes Sleep() to delay execution,
re-executes rdtsc, and
compares the two timestamps.

Sleep Patching Using High-Resolution Dynamic Analysis

Different from traditional sandboxes, Lastline’s high-resolution analysis engine monitors more than just the interaction of programs with the operating system (or API functions). Our engine sees - and can thus influence - every instruction that is executed by the malicious program, not just API function invocations. Thus, since we can also manipulate the values returned by the rdtsc instruction, we can maintain a consistent execution state even when patching a sleep, for example by fast-forwarding the timestamps returned by the CPU to the program each time a sleep is skipped or accelerated.

As a result, the program can no-longer distinguish if a sleep was truly executed in full, or if the analysis system simply forwarded the time inside the sandbox.

Side-Effects of Sleep Patching: User Emulation

We found other interesting side-effects introduced by sleep patching that might not be directly related to deliberate sandbox detection, as can be seen in the following piece of code:

Here, the malware sample checks for user-activity by repeatedly checking the cursor position (in 30 second intervals).

Most sandboxes have some mechanism to trigger (or simulate) user activity. Typically this means repeatedly changing cursor position, opening new windows, click on dialog-boxes, etc, just to name a few.

In the code above, the malware sample uses the Sleep() method not for delaying malicious activity, but merely to have a simple way for checking that some user-activity --mouse movement, in this case-- was observed within a certain time period. Clearly, if a sandbox naively accelerates this code by patching the sleeps, the behavior that was expected to happen while the malware sample is dormant will not happen, and as a consequence, the presence of the analysis environment will be detected, evading analysis.

Therefore, again, a naive approach to execution-stalling will allow an attacker identify the presence of the sandbox, or, as in this case, the absence of a real user, evading analysis.

Side-Effects of Sleep Patching: Race Conditions

Another interesting problem related to sleep-patching are race conditions: Race conditions are a non-trivial programming error, where multi-threaded code needs to be executed in a specific order to work correctly.

One (ugly, as many programmers would agree) way of avoiding race conditions is to delay code depending on completion of another task by the amount of time this task typically needs.

In the presence of sleep-patching, however, this approach is bound to fail, as the sandbox influences the amount of time that is slept. One such example can be seen in the code below, extracted from another malware family:

In this code, the malware decrypts and executes code from a dropped file, cleaning up after the program has executed (by deleting the file). Between invoking and deleting the program, the malware sample uses a - one already guessed - sleep to make sure the program is started before it is deleted. Once again, by patching the sleep incorrectly, the sandbox breaks this logic, causing the malware to delete the payload before it is ever executed.

A more complex example can be seen below:

Here, malware reads encrypted code from a file on disc and executes it in the context of the current process using a separate thread. Once the payload has been started, the main thread goes into an infinite sleep (but this could equally be a long sleep), before executing ExitProcess (which terminates the execution of all threads in the process).

If this sleep is patched to be shorter than the execution of the malicious payload, the process is terminated before completing its activity, unintentionally stopping the process before it can completely reveal its malicious behavior.

Summary

Timing attacks are common to most malware families today. While some of these timing attacks are easy to detect, naive approaches to overcoming these evasion attempts often cause more harm than they do good, opening gates to evasion attacks based on anti-evasion systems.

Using high-resolution dynamic analysis and leveraging its insight into each instruction that is executed by the malicious program, the Lastline sandbox is able to foil these attacks and reveal the malicious behavior.

↧

Ninety Five Percent of Carbanak Malware Exhibits Stealthy or Evasive Behaviors

February 19, 2015, 12:00 pm

≫ Next: High-Resolution Dynamic Analysis of Windows Kernel Rootkits

≪ Previous: Not so fast my friend - Using Inverted Timing Attacks to Bypass Dynamic Analysis

We’ve talked a lot about the increasing sophistication of malware and the serious threats it poses. But it’s rare to be able to analyze malware that is evasive or stealthy and has already been deployed in the wild to carry out cybercrime without being detected by in-place security systems for months.

From the Security Analyst Summit in Cancun this week, Kaspersky Labs published a report detailing a bank heist purported to rake in as much as $1 billion since late 2013 from banks around the world using a series of sophisticated attacks that may still be underway.

According to Kaspersky, Carbanak malware was deployed to infiltrate banks, take over ATMs, adjust balances and transfer funds via remote access. To better understand how the malware used in these attacks evaded detection by traditional security technology for so long, we took a closer look at all of the 74 Carbanak malware samples available to us through VirusTotal (nearly 70% of those listed in the report).

Using the Lastline Breach Detection Platform, we detected a high level of malicious behavior, with 96% of samples rating at a security impact of 96 or more out of 100 possible points. Our analysis automatically determined all of them (100%) were “malicious” regardless of whether a signature existed for them or not. (Our system considers anything with an impact of above 70 clearly malicious.)

The suspicious behaviors we detected in these Carbanak samples revealed some interesting commonalities:

93% exhibited ten or more malicious or suspicious behaviors
92% had a packer loading an embedded PE image indicating a potential unpacking
95% hid network activity through code injection
95% displayed stealth behavior including creating .exe files that were hidden and/or masquerading as system files
95% autostarted by registering a new service at startup
97% altered memory by replacing the image of another process, indicating either detection evasion or privilege escalation
Nearly one in five (17%) demonstrated evasive behavior -- such as trying to detect a virtual sandbox, sleep or forbid debugging -- which is a relatively high percentage of evasion as compared to the average malware sample set

You can see here a breakdown of the categories of malicious behavior and their prevalence across the 74 analyzed Carbanak samples:

Number of Samples	Fraction of Samples	Suspicious/Malicious Behavior Category
13	17.57%	Evasion
60	81.08%	Execution
69	93.24%	Packer
70	94.59%	Network
70	94.59%	Autostart
70	94.59%	Stealth
70	94.59%	File
72	97.30%	Memory

To learn more about these malicious behavior categories, please go here.

One of the more interesting and alarming components of the Carbanak family of malware is that it is likely still actively involved in robbing banks around the world of millions of dollars.

The clearly sophisticated and varied family of malware has a broad arsenal of stealthy and evasive maneuvers tailor-made to bypass the security systems the targeted banks have in place.

Because these malware samples are environmentally-aware with stealthy and evasive behaviors they require a stealth sandbox to automatically detect them with an analysis environment that appears to be a victim’s system. Only then will banks be protected against these evolving threats.

Carbanak malware sample analysis:

lastline-advanced-malware-report-carbanak-bank-malware Example Carbanak Malware Sample Analysis
View Full-size

Tripwire, a Lastline integration partner, has added to the conversation on their State of Security Blog. You can read their post and other security news and industry commentary here.

↧