Cryptographic Function Identification in Obfuscated Binary Programs

Presented at REcon 2012, June 16, 2012, 1 p.m. (60 minutes)

Cryptographic function recognition constitutes an important problem forbinary program analysis because such functions usually contain crucialparts of the program internal logic.Moreover, cryptographic tasks are often implemented with well knownfunctions, whose reference implementations are publicly available.Identifying automatically these functions with their arguments allowsthe analyst to understand the code without actually studying it. Current techniques to identify well known functions in binary form areusually syntactic-centric and mainly rely on characteristics preservedin reasonable implementations, like particular constants or specificmachine instructions. Nevertheless, these features are no more reliablein obfuscated binary programs, because they can easilydisappear due to the obfuscation process.In contrast, the solution we will discuss in this talk tends to besyntactic-independent by leveraging the predefined input-outputrelationship of cryptographic functions.An interesting fact with cryptographic functions is that they utilizevery particular input values - typically a key and an input text - toproduce deterministically an also very particular output value - anoutput text. Hence a pair of input-output values (A,B) such that acertain cryptographic function F verifies F(A)=B constitutes a signaturefor this function (A being typically the pair (key, input text),sometimes with an initialization vector, whereas B is the output text).Indeed the false positive probability is remarkably low, as it is veryunlikely that another function - cryptographic or not - produces B whentaking A as input. Moreover, all F implementations, even obfuscatedones, have to respect this relationship by definition.Implementing this relatively simple idea to identify cryptographicfunctions in obfuscated programs is far from trivial. Indeed suchenvironment lacks natural abstractions that would allow to easilyconsider candidate parts of the code for identification. In particularthe common function "definition" in binary programs - based on callingconvention and prologue-epilogue code - is not reliable in obfuscatedcode. Secondly, the numerous implementation-dependent data manipulatedby machine instructions (control registers, return address...), combinedwith the absence of clear data structures, make the retrieval ofcryptographic input-output parameters complicated.Therefore we will discuss in this talk the way we implemented acryptographic function identification technique based on theinput-output relationship comparison for obfuscated binary programs. Wewill insist on the building process leading to the final tool, as webelieve it is a generic way of tackling such identification problems,whereas the tool itself is suitable for *some* hard-to-detectcryptographic functions in *some* obfuscated binary programs. Amongseveral examples we will show how we automatically identified algorithmssuch as RC4 - very often missed by existing tools - and XTEA in heavilyobfuscated binary programs, with the appreciable side-effect of knowingprecisely their arguments. Finally we will show that our techniqueallows the recognition of modified versions of well-known cryptographicalgorithms.

Presenters:

Links:

Similar Presentations: