Generating YARA Rules by Classifying Malicious Byte Sequences

Presented at Black Hat USA 2021, Aug. 5, 2021, 2:30 p.m. (30 minutes).

<div><span>While ML models for malware detection have become an industry standard for heuristically detecting malware, signature-based detection still proliferates thanks to ease of updates, transparency of detection logic, and operability in compute-constrained environments. In this work, we propose an interpretable machine learning model that can generate signatures tuned to optimize detection and minimize false positives on a given corpus of malware and benign samples. On a corpus of malicious and benign ELF executables targeting i386 and amd64, we observe detection rates in the 80% range with a false positive rate of 0% on the benign corpus with a few hundred YARA rules.</span></div><div><span><br></span></div><div><span>The approach is filetype-agnostic and can be applied anywhere YARA rules can be used -- whether it be simple static analysis of binaries, Cuckoo reports, network monitoring, or memory scanning. We will also share trained models, code to train and extract signatures on your own corpuses of bytestreams, as well as ready-to-go signatures for detecting recent PE, ELF, and Mach-O malware.</span></div>

Presenters:

  • Andrew Davis - Principal Data Scientist, Elastic
    Andrew Davis is a Security Data Scientist specializing in using machine learning methods to detect malware. Currently, he works on the Security Data Science team at Elastic, focusing on creating interpretable deep learning models that can be used to create signatures. Previously, he worked at Sophos and Cylance, where he focused on static machine learning models to detect maliciousness in a variety of file formats. He co-wrote "Introduction to Artificial Intelligence for Security Professionals" and has participated in the Defcon AI Village. You can follow him on Twitter @gradientjanitor

Links:

Similar Presentations: