Over the years, obfuscation has taken a significant place in the software protection field. The term generally embraces any means aimed at slowing down the analysis of a program, either by an analyst or an automated algorithm. As such, it has gained a certain popularity in the video-game industry. Unfortunately, it has also gained popularity in the malware underground ecosystem - in order to delay malware comprehension, leading to the need for deobfuscation techniques.
In the broadest sense, deobfuscation is the means to make the behaviour of a piece of malware more intelligible, taken as a fact that recovering the original program is generally impossible. Since the first step towards understanding a binary program is to disassemble it in order to obtain a good representation of its Control-Flow Graph (CFG), obfuscation techniques (also) aim at fooling existing disassembly tools and techniques: while static disassembly covers the whole program but is quickly fooled by obfuscations such as self-modification, dynamic disassembly helps to get a real execution trace of the program but is limited to one or a few execution paths.
Symbolic execution has recently been proposed as an interesting alternative to deobfuscation: more robust than static analysis and more complete than dynamic analysis. Interestingly, the technique amounts to reason over the semantic of the program (which is not modified by obfuscation) rather than its syntax (which is heavily modified). Yet, the technique is still young, and can quickly suffer from scalability issues.
We show in this talk how to combine in a successful way several state-of-the-art variants of symbolic execution together with dynamic analysis and static analysis in order to help recover a more precise CFG of the obfuscated code under analysis. In particular, dynamic analysis brings robustness to tricky obfuscations such as self-modification, variants of symbolic executions can both discover new parts of the obfuscated code and prove that other parts are spurious (e.g. not reachable because of protections such as opaque predicates or stack tampering), and standard static analysis can be guided in a safe way to extend the disassembly. We will explain in detail the method and how it is implemented in the open-source platform BINSEC, and describe successful case studies on state-of-the-art packers and the government-grade X-Tunnel malware - allowing its entire deobfuscation.
The end goal is to empower the reverse-engineering by giving the analyst semantic information about the program such as obfuscation in order to hold all the cards in hand for a better and deeper understanding of the binary being analysed.
Symbolic deobfuscation methods are a recent and hot topic, holding great promise but not easy to apply in practice. This talk aims to demystify these techniques, presenting them in a clear manner - from basic concepts to the latest cutting-edge implementations - and giving insights into their strengths and weaknesses, including potential counter-measures (together with possible mitigations). This line of work is also in the continuation of other efforts to bring more semantic analysis into the anti-malware arsenal.