Domain Generation Algorithms (DGA) have been key in the success of many generations of malware. Traditionally, these are handled by reverse engineering the malware sample to extract the seed along with the DGA itself, if the algorithm was not already known. The seeds and algorithm are combined to generate a domain list, which is subsequently added to a block list. This outdated technique has the potential to leave any infected system vulnerable for days when irreversible damage can be done in minutes. By using the DGA against itself, we can use live log data to identify new seed configurations and block them in near real time, closing the gap between threat discovery and mitigation.
In the eyes of a malware author, the use of a DGA is beneficial, in that it can produce a large set of domains that in and of themselves are as random as the seed used to create them. The benefit to malware researchers and analysts is that, in order to have a successful DGA, the algorithm must be deterministic and cannot incorporate any truly random data. This is the Achilles’ heel of a DGA, which is rarely exploited. Looking at a single domain as evidence of a DGA may only expose it as a potential candidate of said DGA, but having access to the sequence of domains, even so little as a single pair of domains, can increase the value of domain data exponentially. From verification that a domain came from a particular DGA, to cracking a seed without the need of brute force, reversing the flow of the DGA can be used to block an entire set of configurations of a malware family without delay.