Home | Resources | Newsletters | VICE | VI20050504

Vice Newsletter - May 4, 2005

Welcome on behalf of the eEye Research team to another issue of VICE, a free technical newsletter featuring content from the team representing the foundation of eEye as a company and culture.

This month we continue the "no holds barred" technical content with a unique paper from Senior Research Engineer Derek Soeder describing his recent research on identifying the assembler used by a payload author. In the following section, our Research Team has provided expert comments on questions our readers submitted last month. As always, please keep the questions and comments coming: vice@eeye.com.

Finally, as for the Junior Research Position offered last month, we are pleased to announce that it has been filled by a motivated individual with a passion to learn; hopefully it won't be too long before his first contribution to VICE.

Enjoy!
eEye Team VICE

In This Issue

1.Vulnerability Exposed!
 • Assembly Payload Forensics: Part I
 
2.Ask Research
 • How Can an Attacker Bypass a Firewall?
 • How Can the Sender of a Fake Email Be Identified?
 
3.Etcetera
 • Publishing Exploit Code Ruled Illegal in France? - Reported by the Press
 • Publishing Exploit Code Ruled Illegal in France? - The Researcher's Story



Vulnerability Exposed!

Assembly Payload Forensics: Part I

With Microsoft's five-million-dollar bounty still hanging over the collective heads of worm and virus authors, it seems like a perfect time to discuss some techniques for sleuthing out where a payload came from. In the first of a two-part series, Derek Soeder will present the idea of inferring which assemblers were used to construct a payload based on subtle characteristics inherent in the machine code. The second part will look at stylistic traits of a few virus and worm authors on an assembly level that can help tie payloads and public exploits together. A working knowledge of Intel x86 assembly language is recommended.

Assembler Fingerprinting

Before the shameful era of Blaster and Bagel began, most major Internet worms had been coded in assembly language. Except in some very rare circumstances, shellcode for the majority of exploits is written in assembly as well. Not coincidentially, most worms and shellcodes are also built to run on the Intel x86 architecture, since it is by far and away the mode widespread mainstream architecture – making it accessible to malicious code authors and abundant among the population of potential targets.

Interestingly, there are at least a few ambiguities in x86 machine code that, like subtle clues from a crime, may help correlate malicious code specimens and give away details about the author. Especially important about these types of "clues" is that most payload authors don't know they’re dropping them. It is our desire in this article to introduce the technique by example and lay the groundwork for further discussion and research into making this technique a truly useful addition to worm and payload forensics. To this end, we describe two traces that get imprinted into machine code by the tools a payload author uses, explaining where these traits can be spotted and surveying some of the most popular compilers and assemblers for the specific "fingerprints" they leave behind.

Register-to-Register Instructions

It's a happy coincidence that a major machine code ambiguity occurs in one of the more common genre of instructions: register-to-register operations. Given the prevalence of this type of instruction, then, it may be surprising to note that there are actually very few register-to-register opcodes that anyone really uses. Most common register-to-register instructions are based off opcodes that reference a register and a memory location, except they specify a special addressing mode that substitutes a second register for the memory operand. By browsing a list of opcodes, we can see that most mnemonics have memory-to-register and register-to-memory versions – and herein lies our first ambiguity.

As far as we are aware, there is no standard specifying which of these two opcodes to use for the register-to-register case, yet an assembler has to pick either one or the other. For example, the instruction "XOR ECX, EAX" can be compiled both as 31h/C1h (opcode for XOR with operands "mem, reg") and as 33h/C8h (opcode for XOR with operands "reg, mem"). So which one is right? Technically, they are both correct, and they both disassemble to the same assembly instructions, so really, it depends on whom you ask. The following table lists which variants six popular assemblers (DOS DEBUG.COM, MASM 6.11, NASM 0.98.38, SoftICE 4.3.0, TASM 5, and Visual C++'s __asm directive) choose when compiling a register-to-register instruction:

AssemblerOperand Choice'XOR ECX, EAX' compiles as:
DOS DEBUGmem, reg31h/C1h
MASM 6.11reg, mem33h/C8h
NASM 0.98.38mem, reg31h/C1h
SoftICE 4.3.0reg, mem33h/C8h
TASM 5reg, mem33h/C8h
Visual C++reg, mem33h/C8h

Table 1. Some assemblers will compile register-to-register instructions differently, because there are two separate (but equal) ways to represent most register-to-register instructions.


It should be noted that a few mnemonics, notably TEST and XCHG, have register-to-memory but not memory-to-register opcodes, since having both types of opcode would be entirely redundant, but many major operations (ADC, ADD, AND, CMP, MOV, OR, SBB, SUB, and XOR) are available in both flavors.

Prefix Ordering

Analyzing the machine code chosen to perform a register-to-register operation has its advantages and disadvantages – to its credit, it is almost inevitable that there will be a register-to-register instruction in any bit of serious machine code, but as the above table suggests, the fingerprint is fairly homogeneous across mainstream assemblers. Our second trait, the ordering of prefix bytes, is the opposite – much more diverse combinations are possible than simply a binary choice between "reg, mem" and "mem, reg", so results are likely to vary significantly between assemblers, but at the same time, samples on which the analysis can be performed are much more rare. Nevertheless, the trait is an interesting observation if for no other reason than to provoke thought on the subject, so here we go.

The x86 architecture provides a number of "prefixes" that can be applied to modify the behavior of an instruction, each in some specific way. Besides the more well-known segment override (CS:, DS:, ES:, FS:, GS:, and SS:) and REP/REPNE prefixes, there are also prefixes that toggle the operand size and memory addressing mode for the instruction that follows, between 16 and 32 bits (we will refer to them by their hexadecimal byte values, 66h and 67h, respectively). There is one more prefix, called LOCK, that deals with synchronizing access to the system bus, but it will probably never be encountered in the type of machine code with which we are interested.

A prefix byte is not itself an instruction, but will instead have an effect on the next (non-prefix) opcode encountered. In the case of an instruction with multiple prefix bytes, then, the order of the prefixes is irrelevant and depends only on the whims of the assembler. So, to explore this ambiguity, we crafted an instruction that makes use of every worthwhile prefix, and checked out what those same six assemblers produced. We lumped the segment override prefixes into one set and REP and REPNE into another, since x86 assembly language only allows one prefix from each group to be specified per instruction, so this gave us 66h, 67h, LOCK, REPxx, and seg: as the five elements to be tested. For these tests, we compiled the instruction "LOCK REP LODS WORD PTR CS:[SI]" (or as close to it as possible) in 32-bit mode, the outcome of which uses all of the five prefix classes listed above. Table 2 shows the results:

AssemblerPrefix Order
DOS DEBUG{LOCK,REPxx,seg:}
MASM 6.11{LOCK,REPxx} 67h 66h seg:
NASM 0.98.38LOCK REPxx seg: 66h 67h
SoftICE 4.3.067h 66h seg: {LOCK,REPxx}
TASM 5LOCK REPxx 66h seg: 67h
Visual C++{LOCK,REPxx} 66h seg:

Table 2. Assemblers can choose to order machine code prefix bytes in arbitrary ways. Set {...} notation signifies prefixes whose order relative to one another is determined by the order in which the programmer supplies them. DOS DEBUG.COM, a rather archaic assembler, forces the user to enter all prefixes manually and completely lacks support for 32-bit extensions (which include prefixes 66h and 67h). MASM did not allow REPxx and LOCK to be combined for a single instruction, and Visual C++ did not provide any opportunities to test the 67h prefix's relative position.


Conclusion

So what can we really do with observations such as these? Most obvious would be to attempt to ascertain which assembler (or assemblers) a payload author uses, in order to correlate different payloads – for instance, to match a public exploit published on an author's site, with a worm or a separate, privately-held exploit. This especially becomes possible if unusual patterns develop across multiple samples of code, such as an identifiable trait that always appears in a certain function of the payload, which might indicate the use of a "shellcode lab" or payload libraries favored by the author.

Ultimately, we hope that this article will inspire others to think about new techniques for analyzing and fingerprinting malicious code samples. Next time, we will examine some actual malicious code samples at a stylistic level, to help in identifying particular coding habits of different payload authors, and maybe get a little closer to claiming that bounty.

Source: Derek Soeder, Senior Software Engineer


Ask Research

Q: How Can an Attacker Bypass a Firewall?

A: If the ultimate goal is executing commands on a system behind a firewall, there are a number of very different approaches an attacker can take to accomplish this. The applicability of each depends on the configuration and network layout between the attacker and the target; we will address the key points here.

The most obvious way to penetrate a firewall is to look for a hole in its configuration that will allow your traffic to slip through. Possibly the best-known trick in this category is to send packets to the target host with a source port of 53. Because DNS queries are (connectionless) UDP packets often sent out from a variable high port, but always with a destination port of 53, sometimes the easiest way to allow the DNS responses to reenter the network is just to allow all incoming traffic with a source port of 53. Another example of a misconfiguration would be mistakenly allowing a firewalled server to receive connections to some of its services (e.g., SQL on port 1433) from any IP address, rather than a restricted "whitelist" of addresses. This type of misconfiguration is so egregious that there might as well not be a firewalled installed at all.

A bit more wily approach is attempting to attack the firewall itself, by exploiting a software vulnerability or abusing a misconfiguration in its administrative settings (e.g., default password or SNMP community name). In the case of a NAT-enabled router, compromising the router would give the attacker a way to send packets directly to hosts on the internal network, an ability that may not otherwise be possible (for instance, in the case of a home user connected to a cable modem via a broadband router). Similarly, taking over any other device that bridges the internal and external networks, like a server with two NICs, would provide the same sort of stepping-stone to sending unsolicited traffic to internal hosts.

In contrast to sending "unsolicited traffic" into a protected network, a firewall could be bypassed in some circumstances by getting a protected host to "solicit" traffic from a malicious host on the outside, in which case the firewall would most likely allow the malicious host's response back through. For instance, an attacker could compromise a web site visited by the user of a protected host within the network, or could hijack or poison a DNS server and redirect that user's traffic to a malicious system. For a more specific example, consider the Windows XP SP2 firewall and its behavior regarding UDP broadcasts. A Windows XP SP2 system block all incoming traffic by default, but whenever it sends out a UDP broadcast packet, all UDP responses originating from the system's subnet, and with a destination port set to broadcast packet's source port, are allowed back through for three seconds following the broadcast. Because default XP SP2 systems will send UDP/137 and UDP/138 broadcasts periodically on their own, an attacker could repeatedly send spoofed packets to a host running XP SP2 and the traffic would eventually be allowed through. If a vulnerability were discovered in a service related to UDP/137 (NetBIOS Name Service), UDP/138 (NetBIOS Datagram Service), or in SMB named pipe sessions (which the system may be tricked into establishing in response to crafted UDP/138 traffic), it would be able to penetrate the XP SP2 firewall using this approach.

Very often, the users themselves may allow attacks through the firewall by connecting to externally-accessible proxy server or other conceptually similar services, including chat services (e.g, IM and IRC) and peer-to-peer networks, or by accepting external data onto their machines via web pages and e-mail. As has been painfully demonstrated, sending malicious e-mails, e-mail attachments, instant messages, and IM file transfers is an effective attack that allows an individual, rather than a computer, to be targeted and attacked via client software vulnerabilities and/or social engineering.

Finally, it's worth mentioning that alternative access and physical access both provide additional proven ways around firewalls and into protected networks. "Alternative access" here refers to phone lines and wireless media, which allow internal systems or devices (e.g., WAPs, laptops in ad-hoc wireless mode, computers with modems, and even fax machines) that are connected to both this "alternative network" and also the protected LAN to be accessed. Physical access obviously refers to an attacker who can physically tamper with the hardware which the network comprises -- by introducing rogue access points or modems, reconfiguring systems, rearranging cables, or adding a rogue PDA or even a Dreamcast with specific malicious objective to the network. The attacker need not be able to communicate with the device once it's on the network, as long as its instructions are self-sufficient, or the device could "phone home" and accept further commands from its owner.

This is just a sample of different approaches to circumventing a firewall; undoubtedly there are more concepts and infinitely many "tricks of the trade" that can be used to bypass firewalls and other related forms of protection.


Q: How Can the Sender of a Fake Email Be Identified?

A: The real IP address used in emails is found within the header of the email and is taken from the receiving mail server. It can not be forged because of the nature of TCP, but it can be sent from a proxied or otherwise compromised system.

Tracking down the "real" sender can be very difficult. Obviously, for spam control you have little reason to do this outside of the ISP level. However, where the email contains death threats, or is responsible for carrying a specialized attack, or such reasons, one may have to track down the author. One possible method to do so would be to respond to the person with a web bug, and when they view that email with an HTML based client (most web based email services, or outlook by default) you can get their real IP address.

However, in some cases, there are anonymous remailers which strip the originating information from the email. These remailers essentially say, "I will be your proxy"; sometimes, for a price. This kind of remailing activity is similar to remailing services criminals (and others) use in the real world. It has the same drawbacks. The weakness is in the integrity of the remailer and his or her ISP. While the remailer may destroy all records with a secure wiping system -- ultimately, they still have an ISP to contend with.

Have a question you would like answered? Send it to vice@eeye.com, and win an eEye t-shirt if we select your question for an upcoming newsletter.


Etcetera

Publishing Exploit Code Ruled Illegal in France? - Reported by the Press

"Researchers that reverse engineer software to discover programming flaws can no longer legally publish their findings in France after a court fined a security expert on Tuesday."

Publishing Exploit Code Ruled Illegal in France? - The Researcher's Story

The real story on this issue was not the fact that the researcher published exploit code but the fact that he, according to the French courts, published counterfeit software. Here is an account of the entire issue by the researcher who was involved.


How to Subscribe
To subscribe to this and other eEye newsletters, please visit: http://www.eeye.com/html/resources/newsletters/subscribe.html

Feedback
The eEye newsletter staff welcomes any comments, questions or suggestions from our readers. We hope that you will not hesitate to contact us with any feedback you may have. Send all feedback to vice@eeye.com.

Disclaimer
The information within this newsletter may change without notice. Use of this information constitutes acceptance for use in an AS IS condition. There are NO warranties with regard to this information. In no event shall the author be liable for any damages whatsoever arising out of or in connection with the use or spread of this information. Any use of this information is at the user's own risk. Opinions expressed are not necessarily those of eEye Digital Security.

Notice
Permission is hereby granted for the redistribution of this newsletter electronically. It is not to be edited in any way without the express consent of eEye. If you wish to reprint the whole or any part of this newsletter in any other medium excluding electronic medium, please email vice@eeye.com for permission. Permission to reprint readers' comments or questions, and edit where necessary for length and clarity, is assumed unless explicitly forbidden.