Botnet domains at their core aren't necessarily lexical nor defined by query volume. Botnets are graph-based. From a sequence of DNS graphs, one can mine subgraphs distinctive of botnets spreading malware, harvesting credentials, or delivering DDoS attacks to cripple high-value online assets.
Typical methods for detecting botnets using DNS leverage either 1) lexical characteristics of domain names (n-gram entropy, perplexity), or 2) traffic and static graph properties (measuring burst in query volumes and similarity of inter-client traffic, respectively). These insights build on the characteristic of algorithmically generated domains (AGDs) but miss the temporal nature of machines surfing the internet: i.e. graphs change from one time window to the next.
In this talk, we propose a novel method unifying the interactions between client machines, hostnames and hosting IPs by building a tripartite graph consisting of tens of millions of vertices and edges. We then propose methods to represent a sequence of graphs as signals to be mined in order to detect botnet attacks and online threats in general.
As our first use case, we ignore the lexical and move beyond traditional degree and centrality graph metrics. Instead, we pair client machines to hostnames and reveal that the trademark of a bot in a botnet is three things: 1) the variety of hostnames it queries, 2) the popularity of the hostnames, and 3) the frequency with which the bot repeats itself. Using Hadoop technologies, we show that these signals are scalable (to the millions) and distinguish Necurs, Conficker, Suppobox, PykSpa, and more.
In our second use case, we tackle the difficult task of predicting the number of domains within a family of AGDs. Introducing a measure involving the popularity of domains and repetition of a bot, we can approximate the number of domains an ideal classifier should catch.
We also show how we combine botnet detection derived from monitoring hosting IP space, e.g. fast flux with detection based on infected clients' behaviour. This provides a unified model to track botnet threats. In closing, we'll explain how to monitor these threats using various forms of cohort analysis and analysis of variance techniques.
This talk will be very useful to data analysts and security researchers as our new methods proved to be very efficient and scalable at uncovering internet-scale trends and tracking highly dispersed and massive threats.