Detecting Randomly Generated Strings; A Language Based Approach

Presented at DEF CON 23 (2015), Aug. 7, 2015, 4:30 p.m. (30 minutes).

Numerous botnets employ domain generation algorithms (DGA) to dynamically generate a large number of random domain names from which a small subset is selected for their command and control. A vast majority of DGA algorithms create random sequences of characters. In this work we present a novel language-based technique for detecting strings that are generate by chaining random characters. To evaluate randomness of a given string (domain name in this context) we lookup substrings of the string in the dictionary that we've built for this technique, and then we calculate a randomness score for the string based on several different factors including length of the string, number of languages that cover the substrings, etc. This score is used for determining whether the given string is a random sequence of characters. In order to evaluate the performance of this technique, on the one hand we use 9 known DGA algorithms to create random domain names as DGA domains, and on the other hand we use domain names from the Alexa 10,000 as likely non-DGA domains. The results show that our technique is more than 99% accurate in detecting random and non-random domain names.


Presenters:

  • Mahdi Namazifar - Senior Data Scientist, Talos Team, Cisco Systems
    Mahdi Namazifar is currently a Senior Data Scientist with Talos team of Cisco Systems' San Francisco Innovation Center (SFIC). He graduated his PhD in Operations Research from the University of Wisconsin-Madison in 2011. His PhD work was on theoretical and computational aspects of mathematical optimization. During his PhD Mahdi was also affiliated with Wisconsin Institute for Discovery (WID) and the French Institute for Research in Computer Science and Automation (INRIA). Also he was a National Science Foundation (NFS) Grantee at the San Diego Supercomputer Center in 2007 and a Research Intern at IBM T.J. Watson Research Lab in 2008. After graduate school and before his current position at Cisco he was a Scientist at Opera Solutions working on applications of machine learning in a variety of problems coming from industries such as healthcare and finance.

Links:

Similar Presentations: