Code Deobfuscation: Intertwining Dynamic, Static and Symbolic Approaches

Presented at Black Hat Europe 2016, Nov. 3, 2016, 11:15 a.m. (60 minutes)

Over the years, obfuscation has taken a significant place in the software protection field. The term generally embraces any mean aiming at slowing down the analysis of a program, either by an analyst or an automated algorithm. As such, it has gained a certain popularity in the video-game industry. Unfortunately, it has also gained popularity in the malware underground ecosystem, leading to the need of deobfuscation techniques. The only property that should be preserved by obfuscation is the semantic of the program, i.e. its behavior. Hence, in the broadest sense, deobfuscation is the mean to make the behavior of the malware more intelligible, taken as a fact that recovering the original program is impossible in the general case. The first step toward understanding a binary program is to disassemble it in order to obtain a good representation of its Control-Flow Graph (CFG). As a consequence, obfuscation techniques (also) aim at fooling existing disassembly tools and techniques. Standard obfuscations usually target either static analysis (CFG flattening, self-modification, etc.) or dynamic analysis (anti-debugging tricks, VM detection or runtime monitoring). Thus, taking advantages of different approaches might become essential to handle obfuscated codes. While static analysis cover the whole program but is quickly fooled by obfuscations such as self-modification, dynamic analysis helps getting a real execution trace of the program but is limited to one or a few execution paths. In between, dynamic symbolic execution (DSE) -- aka concolic execution -- helps covering more new paths in the program using symbolic values and automatic solvers. This technique has already been fruitfully applied for various purposes such as test generation [1], vulnerability discovery [2] and more recently deobfuscation [3]. Unfortunately, the main problem is that it hardly scales on large obfuscated codes.<br><br> We show in this talk how to combine in a successful way several disassembly techniques -- namely dynamic analysis, several state-of-the-art variants of symbolic execution and static analysis, in order to help recovering a more precise CFG of the obfuscated code under analysis. Especially, dynamic analysis brings robustness to tricky obfuscations such as self-modification, variants of symbolic executions can answer both feasibility and infeasibility queries arising during the deobfuscation process, and standard static analysis can be guided in a safe way to extend the disassembly. These analyses are implemented in the open-source framework BINSEC. They are articulated around three components: * BINSEC/SE: the core dynamic symbolic engine * Pinsec: a Pin-based dynamic instrumenter * IDASec: an IDA plugin allowing to lift analysis data into IDA, making them straightly usable for the reverse-engineer. <br><br>This talk will explain in detail the method and how it is implemented in BINSEC. Practical examples will focus on the detection of opaque predicates and call stack tampering, with case-studies based on Tigress, o-llvm and several commercial packers. The end goal is to empower the reverse-engineering by giving the analyst semantic information about the program such as obfuscation in order to hold all the cards in hand for a better and deeper understanding of the binary being analyzed. <br>[1] {BHUSA2014} Contemporary Automatic Program Analysis. Julian Cohen <br>[2] {RECON14,HITB14, Shakacon2014} Fuzzing and Patch Analysis SAGEly Advice. Richard Johnson<br> [3] {CCS15} Symbolic Execution of Obfuscated Code. Babak Yadegari, Saumya Debray

Presenters:

  • Sébastien Bardin - Dr, CEA LIST
    Sébastien Bardin obtained his PhD in 2005 at ENS Cachan, France, in the field of formal methods. He joined CEA LIST, France, in 2006 as a full-time researcher, with research activities centered on program analysis and automatic software verification. For a few years now, Sébastien has been interested in automating software-level security analysis by lifting formal methods developed for the safety-critical industry. More especially, he focuses on binary-level formal methods, vulnerability detection & assessment, and malware deobfuscation. He leads the "binary-level security" group at CEA LIST as well as several related research projects, and he is one of the main designers of the (open-source) BINSEC platform for binary-level code analysis.
  • Robin David - Security Phd Student, CEA LIST,
    Robin David is a PhD student at CEA LIST (Atomic Energy Commission, France) working in collaboration with the High Security Lab of LORIA (CNRS, Inria and Lorraine University) under the supervision of Jean-Yves Marion. His research interests are security at large but more especially binary analysis, malware analysis and reverse-engineering. His PhD goal is to use formal methods for security purposes, more especially for malware deobfuscation. In order to do so, he has implemented the core dynamic symbolic engine in BINSEC along with the instrumenter (Pinsec) and an associated IDA plugin (IDASec). Before his PhD, he obtained his Msc at the Paris-Est University and a Bsc at the Napier University in Edinburgh.

Links:

Similar Presentations: