DUBNIUM (which shares indicators with what Kaspersky researchers have called DarkHotel) is one of the activity groups that has been very active in recent years, and has many distinctive features.
We located multiple variants of multiple-stage droppers and payloads in the last few months, and although they are not really packed or obfuscated in a conventional way, they use their own methods and tactics of obfuscation and distraction.
In this blog, we will focus on analysis of the first-stage payload of the malware.
As the code is very complicated and twisted in many ways, it is a complex task to reverse-engineer the malware. The complexity of the malware includes linking with unrelated code statically (so that their logic can hide in a big, benign code dump) and excessive use of an in-house encoding scheme. Their bootstrap logic is also hidden in plain sight, such that it might be easy to miss.
Every sub-routine from the malicious code has a “memory cleaner routine” when the logic ends. The memory snapshot of the process will not disclose many more details than the static binary itself.
The malware is also very sneaky and sensitive to dynamic analysis. When it detects the existence of analysis toolsets, the executable file bails out from further execution. Even binary instrumentation tools like PIN or DynamoRio prevent the malware from running. This effectively defeats many automation systems that rely on at least one of the toolsets they check to avoid. Avoiding these toolsets during analysis makes the overall investigation even more complex.
With this blog series, we want to discuss some of the simple techniques and tactics we’ve used to break down the features of DUBNIUM.
We acquired multiple versions of DUBNIUM droppers through our daily operations. They are evolving slowly, but basically their features have not changed over the last few months.
In this blog, we’ll be using sample SHA1: dc3ab3f6af87405d889b6af2557c835d7b7ed588 in our examples and analysis.
Hiding in plain sight
The malware used in a DUBNIUM attack is committed to disguising itself as Secure Shell (SSH) tool. In this instance, it is attempting to look like a certificate generation tool. The file descriptions and other properties of the malware look convincingly legitimate at first glance.
When it is run, the program actually dumps out dummy certificate files into the file system and, again, this can be very convincing to an analyst who is initially researching the file.
The binary is indeed statically linked with OpenSSL library, such that it really does look like an SSH tool. The problem with reverse engineering this sample starts from the fact that it has more than 2,000 functions and most of them are statically linked to OpenSSL code without symbols.
The following is an example of one of these functions – note it even has string references to the source code file name.
It can be extremely time-consuming just going through the dump of functions that have no meaning at all in the code – and this is only one of the more simplistic tactics this malware is using.
We can solve this problem using binary similarity calculation. This technique has been around for years for various purposes, and it can be used to detect code that steals copyrighted code from other software.
The technique can be used to find patched code snippets in the software and to find code that was vulnerable for attack. In this instance, we can use the same technique to clean up unnecessary code snippets from our advanced persistent threat (APT) analysis and make a reverse engineer’s life easier.
Many different algorithms exist for binary similarity calculation, but we are going to use one of the simplest approach here. The algorithm will collect the op-code strings of each instruction in the function first (Figure 5). It will then concatenate the whole string and will use a hash algorithm to get the hash out of it. We used the SHA1 hash in this case.
Figure 6 shows the Python-style pseudo-code that calculates the hash for a function. Sometimes, the immediate constant operand is a valuable piece of information that can be used to distinguish similar but different functions and it also includes the value in the hash string. It is using our own utility function RetrieveFunctionInstructions which returns a list of op-code and operand values from a designated function.
01 def CalculateFunctionHash(self,func_ea):
02 hash_string=''
03 for (op, operand) in self.RetrieveFunctionInstructions(func_ea):
04 hash_string+=op
05 if len(drefs)==0:
06 for operand in operands:
07 if operand.Type==idaapi.o_imm:
08 hash _string+=('%x' % operand.Value)
09
10 m=hashlib.sha1()
11 m.update(op_string)
12 return m.hexdigest()
Figure 6: Pseudo-code for CalculateFunctionHash
With these hash values calculated for the DUBNIUM binary, we can compare these values with the hash values from the original OpenSSL library. We identified from the compiler-generated meta-data that the version the sample is linked to is openssl-1.0.1l-i386-win. After gathering same hash from the OpenSSL library, we could import symbols for the matched functions. In this way, removed most of the functions from our analysis scope.
(This blog is continued on the next page)