dirtbox: a Highly Scalable x86/Windows Emulator

Presented at Black Hat USA 2010, July 29, 2010, 4:45 p.m. (75 minutes).

The increasing amount of new malware each day does not only put anti-virus companies up to new limits handling these samples for detection by creating new signatures. But also for network security providers and administrators, getting information on how samples affect the networks they try to protect is an increasing problem. Dynamic analysis of malware by execution in sandboxes has been an approach that has been successfully applied in both of these problem scenarios, however classic sandbox approaches clearly suffer from severe scalability problems. Most of these rely on setting up a real target system ­ such as the Windows XP operating system ­ as a virtual machine with additional software that does logging of performed actions. While these are easy to develop and set up, they require a separate virtual machine instance for each malware sample to be analyzed and therefore do not scale up with today's requirements in terms of malware growth.

Anti-Virus vendors tried to circumvent performance issues for file analysis by developing custom emulators that can be deployed on a customer end-host for detection and do not require a whole operating system inside a virtual machine. These emulators however often are software interpreters for the x86 instruction set and run therefore into execution speed limitations on their own. Additionally, they suffer from detectability because they try to emulate every single Windows API but suffer from accuracy issues.

dirtbox is an attempt to implement a highly scalable x86/Windows emulator that can be both used for simple malware detection and detailed behavior analysis reports. Instead of emulating every single x86 instruction in software, malware instructions are executed directly on the host CPU in a per basic block fashion. A disassembling run on each basic block ensures that no privileged or control flow subverting instructions are executed. The notion of virtual memory that is separated from the emulators memory is employed by special LDT segments and switching segment selectors before executing guest instructions.

Since no instrumentation alike instruction rewriting is being done, disassembler results per basic block can be cached and all execution happens in the same process without context-switches, a high grade of performance is achieved.

The operating system is emulated at the syscall layer. While this layer is mostly undocumented and implementing it in an accurate fashion is a challenging task on its own, the fact that no register changes are leaked from Ring 0 thwarts a lot of detection techniques. For usage of the high-level APIs, corresponding libraries are directly mapped into the virtual memory as well. Detection mechanisms such as: Examination of the ecx register after a SEH protected API call Stolen bytes from an API library implementation Direct reads and writes from PEB or other static locations or libraries are supported automatically Furthermore, process and heap layout reassemble that of a genuine process since the original ntdll PE loading and heap management code can be executed and used.


Presenters:

Links:

Similar Presentations: