June 8, 2005
Again, welcome on behalf of the eEye Research team to another issue of VICE, a free technical newsletter featuring content from the team representing the foundation of eEye as a company and culture.
With this third issue, we offer the second part of Derek Soeder's Assembly Forensics article which walks the reader through disassembly of four worms in an effort to present their techniques: CodeRed, CodeRed II, Sapphire, and Witty.
As always, we have reviewed the questions from the pool of responses and our Research Team provided expert comments and answers. We received an interesting email from a subscriber who wrote about his recent research regarding Spam Process Analysis. His efforts fit perfectly with the theme of the newsletter and has been included as an exclusive passage in the Etcetera section.
Enjoy!
eEye Team Vice
With this third issue, we offer the second part of Derek Soeder's Assembly Forensics article which walks the reader through disassembly of four worms in an effort to present their techniques: CodeRed, CodeRed II, Sapphire, and Witty.
As always, we have reviewed the questions from the pool of responses and our Research Team provided expert comments and answers. We received an interesting email from a subscriber who wrote about his recent research regarding Spam Process Analysis. His efforts fit perfectly with the theme of the newsletter and has been included as an exclusive passage in the Etcetera section.
Enjoy!
eEye Team Vice
In This Issue
Vulnerability Exposed!
Ask Research
- What is the difference between trojans and viruses?
- Is LINUX really secure as compared to MS-WINDOWS? What seem to be the prospects of LINUX becoming a commercial success? And if the source code is open to all, doesn't it become all the more easy to exploit it?
Etcetera
Assembly Payload Forensics: Part II
The source of an exploit payload, many of which are written in assembly, can often be tied to other samples based on some subtle identifying characteristics. Last month, we examined how the machine code bytes generated by an assembler can bear small traces of its origin that are all but invisible to the author. In the second article of this two-part series, we will examine some slight but significant choices the author can make in coding a payload that may help to correlate the work with who specifically wrote it.
We will present some real-world examples of assembly-based malcode for illustrative purposes, although we stop short of suggesting authorship. As was suggested in the first article, a working knowledge of Intel x86 assembly language is recommended.
Stylistic Features of Assembly Programming
More than one criminal has been collared thanks in part to another individual who recognized his writing style in a letter or manifesto sent to the press. In this respect, the connotations of the phrase "writing" malicious code are particularly appropriate, as there really is a discernible style to writing code as well, with tell-tale characteristics that can be used to imply a common author between code samples. Although resources on identifying stylistic traits in writing would no doubt be applicable here as well, we lack the space and the expertise to discuss that here, and will instead focus on some specific nuances of assembler code.
A piece of code can be represented as a hierarchy of abstraction like the following:
CodeRed: Visual C-Generated Assembly
The body of the CodeRed worm is Visual C code compiled in debug mode, with some apparent assembly-level edits in order to make it execute properly in its target environment. Learning to recognize compiled Visual C code when disassembled is recommended, but as it's too large a topic to thoroughly cover here, we'll just hit the highlights. The following figures illustrate the two most obvious artifacts of debug-mode compilation.
Figure 1. Signs that the CodeRed worm was at least based on machine code generated by Visual C with the debug configuration selected: (a) local variable initialization to {0xCC} as part of the function preamble, and (b) the "value of ESP was not properly saved across a function call" debugging code that wraps functions calls. The latter has been partially removed, because the __chkesp C run-time function necessarily does not exist, although its use of ESI is still evident.
The entire disassembly is available on the eEye research web site at:
http://www.eeye.com/html/research/advisories/AL20010717.html
The simple fact that these traces are present is noteworthy in and of itself, although studying the disassembly of generated machine code is forensically less interesting. The particular technique used to disable the __chkesp debugging (Figure 1b), however, is likely very unique. Besides leaving the "mov esi, esp" and "cmp esi, esp" instructions which the stack pointer check in part comprises, what would be the third instruction - a relative CALL to a statically-linked __chkesp C run-time function - was "NOP-ed out" in a peculiar way. No doubt this was done to evade generic "NOP detection" in network IDSes, but the choice of "INC EBX / DEC EBX" pairs is definitely a stylistic trait based on an arbitrary decision by the worm author.
Although there are some brief segments of pure hand-written assembly in CodeRed, including some that perform peculiar techniques like self-modification within a custom exception handler, in general they're too brief to examine here. But at the risk of digressing just a bit, there is a little more to be learned from CodeRed. Particularly, it's interesting to note that the pseudorandom number generator used in CodeRed is apparently unique. A PRNG, even if it's a knock-off of a common PRNG in wide use, still represents a choice of very distinct values on the part of the author. For CodeRed (v1), the formula is essentially the following:
Although it's far beyond this article to try to examine the mathematical properties of this pseudorandom IP generator, the use of such an uncommon PRNG and its relative success or failure at producing an effective distribution of targets can also offer hints about its author's level of mathematical sophistication.
The lessons of CodeRed on our topic are mostly valuable by contrast with what we're hoping to see. To really kick off our foray into assembly coding style, we'll now analyze the CodeRed II worm (no relation), a somewhat crude piece of malcode literature, but useful as a case study.
CodeRed II: Distinctive Hand-Written (?) Assembly
We were in for a bit of a surprise when disassembling CodeRed II for this article. The code looks nothing like any hand-written or programmatically generated assembly we’ve seen, but there are a number of traits that tend to suggest that it was written in assembly or at least heavily adapted from a disassembly of compiled higher-level language code. As such, it presents a good sample case for observing some fairly atypical assembly coding signatures. We recommend following along in the original eEye disassembly, available at:
http://www.eeye.com/html/research/advisories/AL20010804.html
Things start off strange with the worm function's preamble, which uses an "ENTER 1C8h, 0" instruction to set up the EBP-based stack frame - standard practice is to perform the functionally equivalent sequence, "PUSH EBP / MOV EBP, ESP / SUB ESP, 1C8h", although this more typical approach is a few bytes larger. The "PUSHAD" instruction that immediately follows is a more common way to preserve multiple registers, although it would doubtfully ever be generated by a higher level language compiler. Convention dictates that most Windows functions preserve EBX, ESI, EDI, and of course EBP upon entry, and restore them on exit, so it's possible that this instruction takes the place of a "PUSH EBX / PUSH ESI / PUSH EDI" trio.
The third instruction is also interesting, a CALL over an INT 3 breakpoint and a tight infinite loop that will serve as the worm body's exception handler. This calling-over approach is used repeatedly throughout CodeRed II in order to push the starting address of the data in between the CALL instruction and its destination; it's a very common way for assembly programmers to allow self-relative code to retrieve a pointer into itself. In this case, the result is that the address of the INT 3 is pushed onto the stack as part of a Structured Exception Handling registration record; the technique is also used to retrieve string pointers, and rivals the alternate practice of building strings on the stack and using ESP as the string pointer (demonstrated in the Sapphire worm) in terms of popularity.
There are a vast number of little tricks, tweaks, and cachets that assembly authors may use, far too many to list here, but we'll briefly cover some of the more obvious ones from CodeRed II below.
One more point worth mentioning is an observation on the assembler's behavior, possibly given certain explicit directives by the author: the SEH setup instructions (e.g., "PUSH FS:[0]") use a 67h prefix in order to shorten the offset from a 32-bit zero to a 16-bit zero, for a net saving of one byte. It might be useful information, or it might just be machine code trivia.
In any case, die-hard assembly fans - "dorks" for short - will probably enjoy this next worm more. Many of the traits of interest have already been mentioned here, but it has some interesting aspects of its own.
Sapphire: Assembly with Machine Code Character Restriction
The Sapphire worm is a tiny piece of optimized malicious code that was unmistakably hand-written in assembly by a competent author. Despite its small size, the worm still has some worthwhile examples to offer, but first we will compare some of its traits to those called to attention in CodeRed and CodeRed II. Pull up the eEye disassembly from:
http://www.eeye.com/html/research/flash/sapphire.txt
It becomes apparent that there are a couple of problems with the Sapphire worm as a study of assembly coding traits. First, it's really small, and second, a lot of the subtle coding decisions are dictated by size optimization and other constraints of the worm. However, this drives us to look a little deeper into the code and notice some other points of interest.
We have been saving one critical fact about the Sapphire worm until now: the entirety of the code is written to avoid null machine code bytes. All the seemingly quirky practices of XOR-ing strange values with other strange values to produce the desired ones, using negative or 16-bit constants instead of positive 32-bit ones, and even constructing strings on the stack (without ever explicitly encoding a null character), are done because the payload must exist inside a null-terminated multibyte string. This restriction heavily influenced the coding style, but there are still other emergent traits.
A slight observation early on is that "XOR ECX, ECX / MOV CL, 18h" (four bytes) is used to initialize ECX as a counter instead of "PUSH 18h / POP ECX" (three bytes). It may sound like nit-picking, but because it's a sub-optimal approach in a rather size-optimized worm, it could serve to clue us in to the author's methodologies, or hint at incremental stages in the worm's development, or suggest that maybe the author was just in a hurry or was unaware of the slightly better technique.
Less trifling is the worm's call to the socket API. The third argument, protocol, is supplied as its proper value, IPPROTO_UDP (11h), but many payloads leave this as 0 because Winsock will fill in the default of TCP for SOCK_STREAM sockets or UDP for SOCK_DGRAM. It's very common to see calls like "socket(AF_INET, SOCK_STREAM, 0)", so it's significant that the author did not take this shortcut. (It could have saved an additional byte!)
About a dozen lines down is the PRNG for target IP addresses, and a neat observation. That block of LEA, SHL, ADD, and SUB instructions - roughly 20 bytes in all - is simply to accomplish the multiplication portion of the PRNG function, in place of the more straightforward MUL or IMUL. Perhaps the intention was a speed benefit within the very tight propagation loop at the cost of size, but our testing reveals this is not the case (it may have helped on a very old processor), and anyway the Winsock sendto call will obliterate any CPU cycles reclaimed from instruction-level optimizations like that. As an aside, compiled Visual C code has sometimes been observed to perform multiplication using a series of LEA instructions and simpler arithmetic.
Slightly apart from the assembly level, but no less worth noticing, is the appearance of strings whose values are not rigidly defined by necessity. Specifically, the strings passed to LoadLibraryA are lowercase and include the ".dll" extension, although either point is flexible. Lowercase seems to be the norm among payload authors, although the presence of the extension is much more rare. In comparison, the export name strings passed to GetProcAddress are case sensitive with no extra fat around the edges, and therefore do not provide any leeway.
Perhaps the most important trait expressed in Sapphire's code, and one that appears in the Witty worm as well, is the practice of "import leaching" that takes function addresses from a loaded DLL's Import Address Table rather than locating them through the export directory of the module of interest. This technique makes for very small code, but at the cost of DLL version dependence. In this case, Sapphire takes the addresses of LoadLibraryA and GetProcAddress from SQLSORT.DLL, with which it then "imports" the remaining API functions.
Witty: More Clever Assembly Programming
The Witty worm resembles Sapphire in a suspicious number of ways. For starters, it’s smaller, centered around fast propagation, and even exploits a UDP-based stack string buffer overflow, which gives it the same no-nulls character restriction. We’ll cover more below, but first we recommend obtaining a copy of the disassembly. Pull up the eEye disassembly from:
http://www.eeye.com/html/research/flash/witty.txt
By now the many opportunities for a malcode author's preferences to manifest themselves are probably very clear, so rather than become too redundant, we'll end our analysis of the Witty worm with one final observation. At offset 226h within the worm packet payload, there is a "CMP EAX, -1" instruction that checks the return value from CreateFileA against the constant INVALID_HANDLE_VALUE. The instruction is five bytes long, as it uses the "CMP EAX, imm32" opcode, but every assembler we tested will only compile this instruction using the three-byte "CMP mem, simm8" form. One guess might be that the author's compiler somehow allowed this 32-bit immediate to be forced, maybe unintentionally, but it's an idea short on corroboration. Determining the significance of this one very aberrant instruction is left as an exercise to the reader, because frankly, we don't know.
Register Choice as a Stylistic Trait
It's an arbitrary decision on the author's part that's vague enough to defy succinct analysis, but important enough to mandate its own separate discussion. The choice of which registers an author uses in what ways is a very pervasive, somewhat subtle, and perhaps very telling characteristic of any assembly code. Many registers have special uses and caveats due to architecture and platform specifics; we'll list some of those now for reference.
Figure 2. A programmer's register usage will likely be influenced by special treatment of certain registers on the part of the operating environment and some specific or optimized processor instructions.
Of all the dedicated uses, ESP's is likely the most implicit and absolute - ESP is never seen used as a data register, although the possibility exists. EBP is often but not always reserved for stack addressing as well. Outside of that, most code is a juggling act of registers going to and from memory and other registers, to get data from where it's required to be ejected to where it will by convention be consumed. Because data registers are so limited in number, sometimes interesting behaviors can emerge from shuffling registers around and trying to maximize register and value longevity.
Memory-to-memory transfers and multiple pointer dereferences are a couple examples of places where a value's usefulness is very short-lived, but the use of a register is mandatory. It may be interesting to observe what "throwaway" registers are used in such scenarios - although the selection may simply fall into place out of necessity, it could also demonstrate a preference the author has for using one register or another. The string construction code in Sapphire and Witty shows an affinity for ECX, although other registers were left unused at the time.
Arbitrary Instruction Ordering
Our final, highest-level, and most theoretical consideration deals with the ordering of mutually inert instructions. Certainly a piece of data must be read before it can be operated upon and stored, and some sort of comparison must precede a conditional branch, but when each of a set of instructions (or abstractly speaking, operations in general) is unrelated to the other, their ordering is free-form. Higher-level language compilers often take advantage of multiple instruction pipelines, but will the assembly coder, and what about situations where even pipelining doesn't dictate order?
The Witty worm author seems to prefer storing an API function's return value immediately after the call, even before stack cleanup, but otherwise the Witty and Sapphire worms are both too small to be of much interest from this perspective. CodeRed is compiled C code (with optimizations disabled), so its basically demand-based ordering doesn't offer much educational benefit either. CodeRed II, like most human-coded assembly, also orders instructions mostly as soon as they become necessary or relevant, but a couple small observations can still be made.
A compulsive CLD is performed immediately before two of the three string instructions - the first is as close as possible to the LODSD that follows it, while staying just outside the loop, and the second is the instruction right before the pair of MOVSD instructions that copy the "\CMD.EXE" string. The last string instruction, a LODSB, is not preceded by a CLD - likely this strange and inconsistent attention to the Direction Flag tells us more than exactly where the instruction comes. Does it indicate an unfamiliarity or discomfort with the Windows execution environment, or an uncommon level of caution? The absence of a third CLD could indicate the vestiges of some flailing debugging attempts, or a level of competence, knowing that DF had already been cleared earlier in the thread.
There are surprisingly few other places in CodeRed II where anything noteworthy can be said about instruction ordering, or even places where the ordering is flexible. The conclusion we can draw is that the analysis of instruction order is yet another technique to keep in mind, but one that will rarely pay off except in the case of an assembly coder with a very distinct and pronounced style.
Bonus Material: Part I Revisited
Since this is the last article in the "Assembly Payload Forensics" series, let's take one last look at these four example worms and apply the tests proposed in Part I.
Figure 3. "Assembly Payload Forensics: Part I" assembler identification technique applied to the CodeRed, CodeRed II, Sapphire, and Witty worms. *Indicated by other evidence in the code.
Conclusion
There are infinitely many ways to write an assembly program, and although many of the instructions are coded around circumstances imposed on the author, the versatility of the architecture leaves plenty of room for variance. Our hope is that a malcode author's arbitrary decisions in these situations will allow him or her to be identified through comparison with other works - and it is our hope that this article has proposed a diverse set of useful techniques to apply to just that.
For those who missed the previous issue or would like to revisit the first part of this article, it is available at: http://www.eeye.com/html/resources/newsletters/vice/VI20050504.html#vexposed
We will present some real-world examples of assembly-based malcode for illustrative purposes, although we stop short of suggesting authorship. As was suggested in the first article, a working knowledge of Intel x86 assembly language is recommended.
Stylistic Features of Assembly Programming
More than one criminal has been collared thanks in part to another individual who recognized his writing style in a letter or manifesto sent to the press. In this respect, the connotations of the phrase "writing" malicious code are particularly appropriate, as there really is a discernible style to writing code as well, with tell-tale characteristics that can be used to imply a common author between code samples. Although resources on identifying stylistic traits in writing would no doubt be applicable here as well, we lack the space and the expertise to discuss that here, and will instead focus on some specific nuances of assembler code.
A piece of code can be represented as a hierarchy of abstraction like the following:
- Purpose – What is the code intended to do?
- Implementation – How (algorithmically) does the code do this?
- [Highest-Level Language]
- [Intermediate Language(s)]
- Machine Byte Code
CodeRed: Visual C-Generated Assembly
The body of the CodeRed worm is Visual C code compiled in debug mode, with some apparent assembly-level edits in order to make it execute properly in its target environment. Learning to recognize compiled Visual C code when disassembled is recommended, but as it's too large a topic to thoroughly cover here, we'll just hit the highlights. The following figures illustrate the two most obvious artifacts of debug-mode compilation.
| push | ebp | mov | esi, esp |
| mov | ebp, esp | push | eax ; KERNEL32!LoadLibraryA |
| sub | esp, 218h | call | dword ptr [ebp-170h] |
| push | ebx | cmp | esi, esp |
| push | esi | nop | |
| push | edi | inc | ebx |
| lea | edi, [ebp-218h] | dec | ebx |
| mov | ecx, 86h | inc | ebx |
| mov | eax, 0CCCCCCCCh | dec | ebx |
| rep stosd | |||
The entire disassembly is available on the eEye research web site at:
http://www.eeye.com/html/research/advisories/AL20010717.html
The simple fact that these traces are present is noteworthy in and of itself, although studying the disassembly of generated machine code is forensically less interesting. The particular technique used to disable the __chkesp debugging (Figure 1b), however, is likely very unique. Besides leaving the "mov esi, esp" and "cmp esi, esp" instructions which the stack pointer check in part comprises, what would be the third instruction - a relative CALL to a statically-linked __chkesp C run-time function - was "NOP-ed out" in a peculiar way. No doubt this was done to evade generic "NOP detection" in network IDSes, but the choice of "INC EBX / DEC EBX" pairs is definitely a stylistic trait based on an arbitrary decision by the worm author.
Although there are some brief segments of pure hand-written assembly in CodeRed, including some that perform peculiar techniques like self-modification within a custom exception handler, in general they're too brief to examine here. But at the risk of digressing just a bit, there is a little more to be learned from CodeRed. Particularly, it's interesting to note that the pseudorandom number generator used in CodeRed is apparently unique. A PRNG, even if it's a knock-off of a common PRNG in wide use, still represents a choice of very distinct values on the part of the author. For CodeRed (v1), the formula is essentially the following:
IP = ((thread_count * 0x50F0668D) * 0xCF3383) + 0x76BFE53;Because the set of target IP addresses is directly limited to the number of distinct values thread_count can assume, this is a fairly crippled pseudorandom IP generator – which is probably the reason why CodeRed II was released shortly thereafter with a "fix" as follows:
if ( (IP & 0xFF) == 0x7F || (IP & 0xFF) == 0xE0 ) IP += 0x20DA9;
IP = (thread_count * 0x50F0668D) + (msec2 * 0xCD59E3) + (msec * 0x1E1B9);
IP = (IP * 0xCF3383) + 0x76BFE53;
if ( (IP & 0xFF) == 0x7F || (IP & 0xFF) == 0xE0 ) IP += 0x20DA9;
Although it's far beyond this article to try to examine the mathematical properties of this pseudorandom IP generator, the use of such an uncommon PRNG and its relative success or failure at producing an effective distribution of targets can also offer hints about its author's level of mathematical sophistication.
The lessons of CodeRed on our topic are mostly valuable by contrast with what we're hoping to see. To really kick off our foray into assembly coding style, we'll now analyze the CodeRed II worm (no relation), a somewhat crude piece of malcode literature, but useful as a case study.
CodeRed II: Distinctive Hand-Written (?) Assembly
We were in for a bit of a surprise when disassembling CodeRed II for this article. The code looks nothing like any hand-written or programmatically generated assembly we’ve seen, but there are a number of traits that tend to suggest that it was written in assembly or at least heavily adapted from a disassembly of compiled higher-level language code. As such, it presents a good sample case for observing some fairly atypical assembly coding signatures. We recommend following along in the original eEye disassembly, available at:
http://www.eeye.com/html/research/advisories/AL20010804.html
Things start off strange with the worm function's preamble, which uses an "ENTER 1C8h, 0" instruction to set up the EBP-based stack frame - standard practice is to perform the functionally equivalent sequence, "PUSH EBP / MOV EBP, ESP / SUB ESP, 1C8h", although this more typical approach is a few bytes larger. The "PUSHAD" instruction that immediately follows is a more common way to preserve multiple registers, although it would doubtfully ever be generated by a higher level language compiler. Convention dictates that most Windows functions preserve EBX, ESI, EDI, and of course EBP upon entry, and restore them on exit, so it's possible that this instruction takes the place of a "PUSH EBX / PUSH ESI / PUSH EDI" trio.
The third instruction is also interesting, a CALL over an INT 3 breakpoint and a tight infinite loop that will serve as the worm body's exception handler. This calling-over approach is used repeatedly throughout CodeRed II in order to push the starting address of the data in between the CALL instruction and its destination; it's a very common way for assembly programmers to allow self-relative code to retrieve a pointer into itself. In this case, the result is that the address of the INT 3 is pushed onto the stack as part of a Structured Exception Handling registration record; the technique is also used to retrieve string pointers, and rivals the alternate practice of building strings on the stack and using ESP as the string pointer (demonstrated in the Sapphire worm) in terms of popularity.
There are a vast number of little tricks, tweaks, and cachets that assembly authors may use, far too many to list here, but we'll briefly cover some of the more obvious ones from CodeRed II below.
- Use of SETcc. Although their use is definitely not unheard of, the functionality of these instructions can also be accomplished using comparison and conditional branching logic. The CMP instructions that precede them can be replaced with subtractions if the value in question does not need to be preserved (as Visual Studio does when compiling some switch statements), although CMP is definitely the norm.
Use of IMUL. IMUL is common to see in compiled higher-level language code, probably because it allows any register as a destination (whereas MUL is restricted to returning the product in EDX:EAX), and for the same reason it's probably popular with assembly programmers as well. Overall this is not a very significant instruction choice.
Mixture of EBP- and ESP-relative addressing. There are a few rare cases in the CodeRed II worm body function that use ESP to address stack contents, rather than EBP as the brunt of the EBP-based frame-bearing function does. In the first case, it retrieves the pointer to the "CodeRedII" atom string from the stack - without removing it - by reading from [ESP] rather than popping, so that it can also be passed as an argument later to a function later. The second use of ESP is in a call to ioctlsocket, to pass the address of a temporary stack variable, but every other variable reference is relative to EBP.
Register checking using OR. There are at least three identical ways to test the contents of a register that set SF and ZF meaningfully, and can always be done in two bytes of machine code: AND-ing, OR-ing, and TEST-ing a register with itself. TEST is the most common variant in a Visual C-dominated Windows world, although the choice is truly arbitrary. CodeRed II almost exclusively uses instructions such as "OR EAX, EAX"; however, it should be mentioned that if a mixture of these were seen within a single piece of code, it would be an extremely uncommon and therefore useful observation. The only, slight aberration, which actually makes some contextual sense, is the "CMP AL, 0" instruction in the random octet generator - it's also two bytes and behaves identically to the instructions mentioned above.
Use of XCHG. CodeRed II uses the XCHG instruction exactly once, in order to copy EAX into EBX. Of course, the reverse also happens, but the former value of EBX that EAX takes on is never used before it's lost. The benefit is that "XCHG reg, EAX" instructions are each only one byte, whereas "MOV reg, EAX" (or any source register for that matter) always takes two bytes. This single-byte optimization seems very odd in the respect that it is almost alone as an optimization in the worm's code, so alternate theories of why this exists should be considered: Was the code borrowed from somewhere else? Could CodeRed II actually have been compiled in a higher language, of which this is an artifact? Is it a deliberate ruse, or a random whim more suggestive of the author's personality and/or somewhat boorish assembly coding?
Method of setting a register to 0 and -1. Producing the values 0, 1, and -1 in a 32-bit register can be done in many ways, more than one of which tie for optimal solutions with respect to code size. To zero a register efficiently, it can be either XOR-ed or SUB-ed with itself (both are two-byte instructions), although XOR seems to be the more favored of the two. Because of their proximity to 0, there are a few ways to get 1 and -1 into a register using only three bytes: zeroing and then INC-ing or DEC-ing the register, pushing ("PUSH simm8") and then popping the register, and - specific to -1 - OR-ing ("OR reg, simm8") the register by -1. CodeRed II demonstrates its answer to both of these open-ended questions when it initializes ECX to -1 at the beginning of its "GetProcAddress" search loop, with "XOR ECX, ECX / DEC ECX". Again, the common theme of an arbitrary selection among multiple equally valid choices prevails.
Method of multiplying a register by 2. In the same loop as mentioned above, the worm needs to double the value in ECX. It uses "SHL ECX, 1", although adding ECX to itself would be functionally equivalent and would also cost two bytes.
Preference for unsigned over signed. The worm code contains three conditional jumps that deal with less than / greater than comparisons. Although signedness doesn't matter in any of the cases, the author displays a preference for unsigned (JNB and JNA) rather than signed (JNL and JNG) logic. The first pair of jumps and the use of MOVZX that accompanies them could be chalked up to data types in the SYSTEMTIME structure, but in the comparison checking the character in BL near the end of the worm body, the choice is harder to explain away.
One more point worth mentioning is an observation on the assembler's behavior, possibly given certain explicit directives by the author: the SEH setup instructions (e.g., "PUSH FS:[0]") use a 67h prefix in order to shorten the offset from a 32-bit zero to a 16-bit zero, for a net saving of one byte. It might be useful information, or it might just be machine code trivia.
In any case, die-hard assembly fans - "dorks" for short - will probably enjoy this next worm more. Many of the traits of interest have already been mentioned here, but it has some interesting aspects of its own.
Sapphire: Assembly with Machine Code Character Restriction
The Sapphire worm is a tiny piece of optimized malicious code that was unmistakably hand-written in assembly by a competent author. Despite its small size, the worm still has some worthwhile examples to offer, but first we will compare some of its traits to those called to attention in CodeRed and CodeRed II. Pull up the eEye disassembly from:
http://www.eeye.com/html/research/flash/sapphire.txt
- Method of zeroing a register. The more boring "XOR register by itself" technique is used.
Preference for signed versus unsigned. No inequality comparisons are performed.
Pseudorandom number generator. Its PRNG function is "seed = (seed * 0x343FD) + DLL_specific_value"; the addend is not consistent across all systems because of what appears to be a bug ("OR EBX, EBX" doesn't set EBX to -1), although indications in the code suggest that 0x269EC3 was the intended value. This is the Visual C run-time rand() PRNG function. Sapphire seeds the PRNG with the number of milliseconds since boot, as returned by GetTickCount.
Register checking. The worm actually never inspects a register for a zero / non-zero value.
Stack addressing. Stack variables are referenced relative to EBP, although the worm body is not really a function and does not establish an EBP-based stack frame. Most likely this is a size optimization, because any ESP-relative memory access requires an additional byte (the Scale-Index-Base or SIB postbyte). This also spares the writer the agony of being constantly mindful of the ever-changing stack pointer when coding memory references.
String data handling. Strings are constructed on the stack by pushing their constituent doublewords in reverse order, rather than calling over them or using a string table.
It becomes apparent that there are a couple of problems with the Sapphire worm as a study of assembly coding traits. First, it's really small, and second, a lot of the subtle coding decisions are dictated by size optimization and other constraints of the worm. However, this drives us to look a little deeper into the code and notice some other points of interest.
We have been saving one critical fact about the Sapphire worm until now: the entirety of the code is written to avoid null machine code bytes. All the seemingly quirky practices of XOR-ing strange values with other strange values to produce the desired ones, using negative or 16-bit constants instead of positive 32-bit ones, and even constructing strings on the stack (without ever explicitly encoding a null character), are done because the payload must exist inside a null-terminated multibyte string. This restriction heavily influenced the coding style, but there are still other emergent traits.
A slight observation early on is that "XOR ECX, ECX / MOV CL, 18h" (four bytes) is used to initialize ECX as a counter instead of "PUSH 18h / POP ECX" (three bytes). It may sound like nit-picking, but because it's a sub-optimal approach in a rather size-optimized worm, it could serve to clue us in to the author's methodologies, or hint at incremental stages in the worm's development, or suggest that maybe the author was just in a hurry or was unaware of the slightly better technique.
Less trifling is the worm's call to the socket API. The third argument, protocol, is supplied as its proper value, IPPROTO_UDP (11h), but many payloads leave this as 0 because Winsock will fill in the default of TCP for SOCK_STREAM sockets or UDP for SOCK_DGRAM. It's very common to see calls like "socket(AF_INET, SOCK_STREAM, 0)", so it's significant that the author did not take this shortcut. (It could have saved an additional byte!)
About a dozen lines down is the PRNG for target IP addresses, and a neat observation. That block of LEA, SHL, ADD, and SUB instructions - roughly 20 bytes in all - is simply to accomplish the multiplication portion of the PRNG function, in place of the more straightforward MUL or IMUL. Perhaps the intention was a speed benefit within the very tight propagation loop at the cost of size, but our testing reveals this is not the case (it may have helped on a very old processor), and anyway the Winsock sendto call will obliterate any CPU cycles reclaimed from instruction-level optimizations like that. As an aside, compiled Visual C code has sometimes been observed to perform multiplication using a series of LEA instructions and simpler arithmetic.
Slightly apart from the assembly level, but no less worth noticing, is the appearance of strings whose values are not rigidly defined by necessity. Specifically, the strings passed to LoadLibraryA are lowercase and include the ".dll" extension, although either point is flexible. Lowercase seems to be the norm among payload authors, although the presence of the extension is much more rare. In comparison, the export name strings passed to GetProcAddress are case sensitive with no extra fat around the edges, and therefore do not provide any leeway.
Perhaps the most important trait expressed in Sapphire's code, and one that appears in the Witty worm as well, is the practice of "import leaching" that takes function addresses from a loaded DLL's Import Address Table rather than locating them through the export directory of the module of interest. This technique makes for very small code, but at the cost of DLL version dependence. In this case, Sapphire takes the addresses of LoadLibraryA and GetProcAddress from SQLSORT.DLL, with which it then "imports" the remaining API functions.
Witty: More Clever Assembly Programming
The Witty worm resembles Sapphire in a suspicious number of ways. For starters, it’s smaller, centered around fast propagation, and even exploits a UDP-based stack string buffer overflow, which gives it the same no-nulls character restriction. We’ll cover more below, but first we recommend obtaining a copy of the disassembly. Pull up the eEye disassembly from:
http://www.eeye.com/html/research/flash/witty.txt
- Character-restricted value construction. Whereas Sapphire showed a preference for XOR-ing one value by another, this worm uses subtraction. (It's probably slightly more convenient for the programmer to write "-value" instead of XOR-ing two numbers.)
Importation. Witty uses the same "import leaching" technique as Sapphire and the same GetProcAddress entry point signature check, but uses "CALL DS:[offset]" instructions rather than "MOV ESI, offset / CALL [ESI]" to invoke the API. It's an interesting artifact of the assembler (or the author's ability to use it) that all of these CALL instructions have an explicit DS: prefix byte. Code borrowing should not be ruled out, but should not be assumed either.
Method of multiplying a register by 2. The instruction "SHL ECX, 1" is used instead of "ADD ECX, ECX"; both are common and behave identically.
Method of zeroing a register. To zero out a register, it is, as usual, XOR-ed by itself.
Pseudorandom number generator. The PRNG function in Witty is also the same as the Visual C run-time rand(), except this one does not contain the same mistake as Sapphire's PRNG. Also unlike Sapphire, the Witty worm generates each 16-bit word of the target IP address separately, and generates random destination ports and packet sizes as well. (Sapphire uses the entire raw 32-bit output, essentially the random seed at each iteration of the propagation loop, as the target IP.) Also in contrast, Witty uses MUL rather than IMUL or elaborate LEA gymnastics to perform the multiplication. However, like the Sapphire worm, it also seeds its PRNG using the return value from GetTickCount.
socket API call. The protocol parameter is passed as IPPROTO_UDP rather than 0.
Stack addressing. The Witty worm doesn't really use stack variables, so the little stack addressing necessary to complete API calls is implemented using ESP.
String variation. The library strings are all lowercase and do not include ".dll" extensions, whereas the "\\.\PHYSICALDRIVEn" string from the destructive portion is entirely uppercase. Both are case-insensitive. The latter string was possibly copied from an external source, with which its nuances are more likely to bring about a correlation than with other malicious code samples, thanks to its (fortunate) rarity in that realm.
String data handling. Witty also constructs its strings on the stack, in exactly the same way as Sapphire, even using ECX to build the partially-used doublewords and null characters at the end of strings.
By now the many opportunities for a malcode author's preferences to manifest themselves are probably very clear, so rather than become too redundant, we'll end our analysis of the Witty worm with one final observation. At offset 226h within the worm packet payload, there is a "CMP EAX, -1" instruction that checks the return value from CreateFileA against the constant INVALID_HANDLE_VALUE. The instruction is five bytes long, as it uses the "CMP EAX, imm32" opcode, but every assembler we tested will only compile this instruction using the three-byte "CMP mem, simm8" form. One guess might be that the author's compiler somehow allowed this 32-bit immediate to be forced, maybe unintentionally, but it's an idea short on corroboration. Determining the significance of this one very aberrant instruction is left as an exercise to the reader, because frankly, we don't know.
Register Choice as a Stylistic Trait
It's an arbitrary decision on the author's part that's vague enough to defy succinct analysis, but important enough to mandate its own separate discussion. The choice of which registers an author uses in what ways is a very pervasive, somewhat subtle, and perhaps very telling characteristic of any assembly code. Many registers have special uses and caveats due to architecture and platform specifics; we'll list some of those now for reference.
| Intel Architecture | |
| 16-bit addressing | BX, BP, SI, DI |
| 8-bit component registers | AX, CX, DX, BX |
| Additional addressing cost | ESP, (EBP) |
| Binary Coded Decimal support | AX |
| Dedicated shorter opcodes | EAX |
| LOOPcc and JECXZ | ECX |
| MUL and DIV | EAX, EDX |
| Stack pointer | ESP |
| Stack segment addressing | ESP, EBP |
| String instruction REP counter | ECX |
| String instruction destination | EDI |
| String instruction source | ESI |
| XLAT instruction | AL, EBX |
| Windows Calling Conventions | |
| __alloca_probe parameter | EAX |
| __chkesp parameters | ESI, ZF |
| Class 'this' pointer | ECX |
| __fastcall parameters | ECX, EDX |
| Function destroys | EAX, ECX, EDX |
| Function preserves | EBX, (ESP), EBP, ESI, EDI |
| Function return value | EAX, (EDX) |
| KiSystemService parameters | EAX, EDX |
Of all the dedicated uses, ESP's is likely the most implicit and absolute - ESP is never seen used as a data register, although the possibility exists. EBP is often but not always reserved for stack addressing as well. Outside of that, most code is a juggling act of registers going to and from memory and other registers, to get data from where it's required to be ejected to where it will by convention be consumed. Because data registers are so limited in number, sometimes interesting behaviors can emerge from shuffling registers around and trying to maximize register and value longevity.
Memory-to-memory transfers and multiple pointer dereferences are a couple examples of places where a value's usefulness is very short-lived, but the use of a register is mandatory. It may be interesting to observe what "throwaway" registers are used in such scenarios - although the selection may simply fall into place out of necessity, it could also demonstrate a preference the author has for using one register or another. The string construction code in Sapphire and Witty shows an affinity for ECX, although other registers were left unused at the time.
Arbitrary Instruction Ordering
Our final, highest-level, and most theoretical consideration deals with the ordering of mutually inert instructions. Certainly a piece of data must be read before it can be operated upon and stored, and some sort of comparison must precede a conditional branch, but when each of a set of instructions (or abstractly speaking, operations in general) is unrelated to the other, their ordering is free-form. Higher-level language compilers often take advantage of multiple instruction pipelines, but will the assembly coder, and what about situations where even pipelining doesn't dictate order?
The Witty worm author seems to prefer storing an API function's return value immediately after the call, even before stack cleanup, but otherwise the Witty and Sapphire worms are both too small to be of much interest from this perspective. CodeRed is compiled C code (with optimizations disabled), so its basically demand-based ordering doesn't offer much educational benefit either. CodeRed II, like most human-coded assembly, also orders instructions mostly as soon as they become necessary or relevant, but a couple small observations can still be made.
A compulsive CLD is performed immediately before two of the three string instructions - the first is as close as possible to the LODSD that follows it, while staying just outside the loop, and the second is the instruction right before the pair of MOVSD instructions that copy the "\CMD.EXE" string. The last string instruction, a LODSB, is not preceded by a CLD - likely this strange and inconsistent attention to the Direction Flag tells us more than exactly where the instruction comes. Does it indicate an unfamiliarity or discomfort with the Windows execution environment, or an uncommon level of caution? The absence of a third CLD could indicate the vestiges of some flailing debugging attempts, or a level of competence, knowing that DF had already been cleared earlier in the thread.
There are surprisingly few other places in CodeRed II where anything noteworthy can be said about instruction ordering, or even places where the ordering is flexible. The conclusion we can draw is that the analysis of instruction order is yet another technique to keep in mind, but one that will rarely pay off except in the case of an assembly coder with a very distinct and pronounced style.
Bonus Material: Part I Revisited
Since this is the last article in the "Assembly Payload Forensics" series, let's take one last look at these four example worms and apply the tests proposed in Part I.
| Worm | "reg, reg" Form | Prefix Order | Possible Compiler |
| CodeRed | reg, mem | N/A | Visual C* |
| CodeRed II | reg, mem | FS: 67h | TASM |
| Sapphire | mem, reg | N/A | NASM? |
| Witty | mem, reg | N/A | NASM? |
Conclusion
There are infinitely many ways to write an assembly program, and although many of the instructions are coded around circumstances imposed on the author, the versatility of the architecture leaves plenty of room for variance. Our hope is that a malcode author's arbitrary decisions in these situations will allow him or her to be identified through comparison with other works - and it is our hope that this article has proposed a diverse set of useful techniques to apply to just that.
For those who missed the previous issue or would like to revisit the first part of this article, it is available at: http://www.eeye.com/html/resources/newsletters/vice/VI20050504.html#vexposed
Source: Derek Soeder, Software Engineer
Q: What is the difference between trojans and viruses?
A: Generally, "trojans" are a class of "virus" and are also a class of "spyware". A good all-inclusive word for all categories is "malware". Largely, it is a semantical issue. The term "virus" indicates two qualities of a file: that it does intentional damage and that it is designed to replicate itself. This is the rigorous, religious answer.
That said, AV products would not being doing their job if they did not find and destroy all files which do intentional damage regardless of their replication method or apparent lack thereof; after all, all software is designed to be replicated and used elsewhere.
Trojans, per se, do not necessarily have automatic replication code within them, so semantically someone might say "a trojan is not a virus"; however, it is most realistic to consider trojans as a class of virus.
Finally, if you find a file that your AV does not pick up because it is a trojan and "not a virus", then you need to drop that AV product.
Q: Is LINUX really secure as compared to MS-WINDOWS? What seem to be the prospects of LINUX becoming a commercial success? And if the source code is open to all, doesn't it become all the more easy to exploit it?
A: The difference between the security of a Linux and an MS system depends on the administrator. There are a number of studies available today that support both operating systems and their levels of security but in reality the security of the operating systems is completely dependent on the administrator responsible for those systems. If your organization has a number of well trained and security-aware Windows Administrators it would not make sense from an operations and security perspective to force them to run Linux systems. Yes, each operating system has their benefits and their drawbacks, but it is difficult to pinpoint one as being more secure than another without looking at the complete picture. Take your pick.
So far as Linux becoming a commercial success, Redhat is a profitable company. Doesn't that already make it a commercial success? Regarding open source making things easier to exploit, it also makes things easier to fix. Both operating systems continue to have security problems.
A: Generally, "trojans" are a class of "virus" and are also a class of "spyware". A good all-inclusive word for all categories is "malware". Largely, it is a semantical issue. The term "virus" indicates two qualities of a file: that it does intentional damage and that it is designed to replicate itself. This is the rigorous, religious answer.
That said, AV products would not being doing their job if they did not find and destroy all files which do intentional damage regardless of their replication method or apparent lack thereof; after all, all software is designed to be replicated and used elsewhere.
Trojans, per se, do not necessarily have automatic replication code within them, so semantically someone might say "a trojan is not a virus"; however, it is most realistic to consider trojans as a class of virus.
Finally, if you find a file that your AV does not pick up because it is a trojan and "not a virus", then you need to drop that AV product.
Q: Is LINUX really secure as compared to MS-WINDOWS? What seem to be the prospects of LINUX becoming a commercial success? And if the source code is open to all, doesn't it become all the more easy to exploit it?
A: The difference between the security of a Linux and an MS system depends on the administrator. There are a number of studies available today that support both operating systems and their levels of security but in reality the security of the operating systems is completely dependent on the administrator responsible for those systems. If your organization has a number of well trained and security-aware Windows Administrators it would not make sense from an operations and security perspective to force them to run Linux systems. Yes, each operating system has their benefits and their drawbacks, but it is difficult to pinpoint one as being more secure than another without looking at the complete picture. Take your pick.
So far as Linux becoming a commercial success, Redhat is a profitable company. Doesn't that already make it a commercial success? Regarding open source making things easier to exploit, it also makes things easier to fix. Both operating systems continue to have security problems.
Have a question you would like answered? Send it to vice@eEye.com, and win an eEye t-shirt if we select your question for an upcoming newsletter.
"So I was doing an analysis on a SPAM process as an IT Auditor and a forensic investigator, and I realized that most corporate email policies increase SPAM at least 100% and they do that by reflecting the inbound, and detected spam emails to the bogus return address in the header...
"This is more than amusing, it may, in fact, be a federal crime under the Computer Fraud and Abuse Act and any number of anti-SPAM laws.
"Seriously do the walk through - SPAM hits email gateway and is detected by the gateway's filters. The next step is that the email gateway or Anti-Virus/Anti-Spam filter bounces the email to the return address. This is even more amusing when what's being deflected back to the return address contains malware. That for sure is a CFAA violation, and the real victim is the person lucky enough to have their name in the return address.
"The problem is that 99.999% of all email gateways are setup this way – i.e. to bounce the SPAM back to the return address, and they don't do any header parsing or validation since that would add significant overhead to handling mail if the domain names and header IP addresses were matched.
"Cool eh?"
Todd Glassey, CISM, CIFI
"This is more than amusing, it may, in fact, be a federal crime under the Computer Fraud and Abuse Act and any number of anti-SPAM laws.
"Seriously do the walk through - SPAM hits email gateway and is detected by the gateway's filters. The next step is that the email gateway or Anti-Virus/Anti-Spam filter bounces the email to the return address. This is even more amusing when what's being deflected back to the return address contains malware. That for sure is a CFAA violation, and the real victim is the person lucky enough to have their name in the return address.
"The problem is that 99.999% of all email gateways are setup this way – i.e. to bounce the SPAM back to the return address, and they don't do any header parsing or validation since that would add significant overhead to handling mail if the domain names and header IP addresses were matched.
"Cool eh?"
Todd Glassey, CISM, CIFI
Firstly, thank you for sharing your recent research. This is an interesting method for indirectly delivering malware but in the end it's not a very useful technique, and I doubt it really creates legal liability for companies that enforce that policy. It would definitely be a smarter approach to reduce spamming to quietly ignore the emails, but that has problems with emails that are actually important and require follow-up if they were not delivered. I think most corporations would prefer to be sure that they are getting all their critical communications.
Robert Ross, Software Engineer
Robert Ross, Software Engineer