Weaponizing Data Science for Social Engineering: Automated E2E Spear Phishing on Twitter

Presented at Black Hat USA 2016, Aug. 4, 2016, 12:10 p.m. (50 minutes)

Historically, machine learning for information security has prioritized defense: think intrusion detection systems, malware classification and botnet traffic identification. Offense can benefit from data just as well. Social networks, especially Twitter with its access to extensive personal data, bot-friendly API, colloquial syntax and prevalence of shortened links, are the perfect venues for spreading machine-generated malicious content. We present a recurrent neural network that learns to tweet phishing posts targeting specific users. The model is trained using spear phishing pen-testing data, and in order to make a click-through more likely, it is dynamically seeded with topics extracted from timeline posts of both the target and the users they retweet or follow. We augment the model with clustering to identify high value targets based on their level of social engagement such as their number of followers and retweets, and measure success using click-rates of IP-tracked links. Taken together, these techniques enable the world's first automated end-to-end spear phishing campaign generator for Twitter.

Presenters:

  • John Seymour / Delta Zero - ZeroFOX   as John Seymour
    John Seymour is a Data Scientist at ZeroFOX, Inc. by day, and Ph.D. student at University of Maryland, Baltimore County by night. He researches the intersection of machine learning and InfoSec in both roles. He's mostly interested in avoiding and helping others avoid some of the major pitfalls in machine learning, especially in dataset preparation (seriously, do people still use malware datasets from 1998?) He has spoken at both DEFCON and BSides, and aims to add BlackHat to the list in the near future.
  • Philip Tully / KingPhish3r - ZeroFOX   as Philip Tully
    Philip Tully is a Senior Data Scientist at ZeroFOX, a social media security company based in Baltimore. He employs natural language processing and computer vision techniques in order to develop predictive models for combating threats emanating from social media. His pivot into the realm of infosec is recent, but his depth of knowledge in machine learning and artificial neural networks is not. Rather than learning patterns within text and image data, his previous work focused on learning patterns of spikes in large-scale recurrently connected neural circuit models. He is an all-but-defended computer science PhD student, in the final stages of completing a joint degree at the Royal Institute of Technology (KTH) and the University of Edinburgh.

Links:

Tags:

Similar Presentations: