Getting insight out of and back into deep neural networks

Presented at BSidesLV 2017, July 25, 2017, 2:30 p.m. (55 minutes).

Deep learning has emerged as a powerful tool for classifying malicious software artifacts, however the generic black-box nature of these classifiers makes it difficult to evaluate their results, diagnose model failures, or effectively incorporate existing knowledge into them.  In particular, a single numerical output - either a binary label or a ‘maliciousness' score - for some artifact doesn't offer any insight as to what might be malicious about that artifact, or offer any starting point for further analysis.  This is particularly important when examining such artifacts as malicious HTML pages, which often have small portions of malicious content distributed among much larger amounts of completely benign content. 

In this applied talk, we present the LIME method developed by Ribeiro, Singh, and Guestrin, and show - with numerous demonstrations - how it can be adapted from the relatively straightforward domain of "explaining" text or image classifications to the much harder problem of supporting analysts in performing forensic analysis of malicious HTML documents.  In particular, we can not only identify features of the document that are critical to performance of the model (as in the original work), but also use this approach to identify key components of the document that the model "thinks" are likely to contain malicious elements.  This allows analysts to quickly assess both the validity of the model's conclusion and rapidly identify regions that require additional inspection and evaluation.  In doing so the deep learning model is converted from a gnomic "black box" into a useful exploratory tool for malicious artifacts, even when the deep learning model itself may label the sample incorrectly. 

We complement this work by showing how knowledge extracted by this method - as well as existing expert knowledge - can be readily re-incorporated into deep learning models.


Presenters:

  • Richard Harang - Principal Data Scientist - Sophos
    Richard Harang is a Principal Data Scientist at Sophos with over seven years of research experience at the intersection of computer security, machine learning, and privacy. Prior to joining Sophos, he served as a scientist at the U.S. Army Research Laboratory, where he led the research group investigating the applications of machine learning and statistical analysis to problems in network security. He received his PhD in Statistics from the University of California, Santa Barbara. Research interests include randomized methods in machine learning, adversarial machine learning, and ways to use machine learning to support human analysis.

Links:

Similar Presentations: