Deep Learning on Disassembly

Presented at Black Hat USA 2015, Aug. 6, 2015, 3:50 p.m. (50 minutes).

Recently, the application of deep learning techniques to natural language processing has led to state-of-the-art results for speech recognition, language modeling, and language translation. To some degree, disassembly can be considered an extension or augmentation of natural language. As an loose example, many experienced reverse engineers can read through disassembled code and understand the meaning in one pass, similar to their skill in reading text in natural languages.

In this talk, we show the effectiveness of applying deep learning techniques to disassembly in an effort to generate models designed to identify malware. Starting with a brief explanation of deep learning, we then work through the different pieces of the pipeline to go from a collection of raw binaries, to extraction and transformation of disassembly data, and training of a deep learning model. We then conclude by providing data on the efficacy of these models, and follow up with a live demo where we will evaluate the models against active malware feeds.


Presenters:

  • Matt Wolff - Cylance
    Matt Wolff is a computer scientist with a research focus on the areas of data science, machine learning, and information security. He has over 10 years of experience in the security industry, researching and developing tools to attack and defend systems. He currently serves as the Chief Data Scientist for Cylance. He holds a MS in Computer Science from Georgia Tech, and is a PhD candidate in Computer Science at the University of Hawaii.
  • Andrew Davis
    Andrew Davis is a machine learning researcher specializing in the application of deep neural networks to various problem domains. He currently works with the application of machine learning for the purposes of malware analysis. Prior to working at Cylance, Andrew worked at Oak Ridge National Laboratory, where he applied computer vision and machine learning methods to defense as well as neuroscience research. In addition to his work at ORNL, he worked with a small machine learning startup, applying algorithms in the space of speaker recognition, vehicle fault prediction, and financial modelling.

Links:

Similar Presentations: