In spring 2019, the Emotet trojan began using old email conversation threads siphoned from infected hosts to propagate. Attacker-controlled servers would send emails that were spoofed to appear as a continuation of these threads to the contacts involved. These emails would contain a malicious URL or attachment containing the trojan.
This method of propagating malware was first reported in 2017 and our nodes first saw traces of it in April 2018. We had noticed some emails that had been taken out of quarantine by users and that looked like unrelated legitimate mail. On closer examination, these emails turned out to be malware‑carrying.
As these propagating messages were sent in waves of increasing volume, like most spam email, the waves of messages became detectable by applying clustering techniques to the email flow. These emails could then be seen as part of a larger malspam campaign. This appears to have happened for spreading Emotet on 2 August 2019, when several of our customers started receiving a sizeable number of emails with identically named attached files. One particular infected host had contacts with several of our customers and was the source of the first noticeable burst. While the trojan was unknown to our anti-virus, we were able to block part of it based on some of the emails containing executable file types. The intelligence provided by clustering enabled us to refine our filters further.
Because such a clustering-based campaign detection scheme can be useful in blocking a variety of malspam, the presentation will describe the steps needed to develop it. We will explain what characteristics of emails we found to be most relevant for clustering. We will share what clustering metric we use to evaluate clustering results. The automatic optimization of the combination of those characteristics from email metadata corpora will also be discussed.