Taming the Beast: Inside the Llama 3 Red Team Process

Presented at DEF CON 32 (2024), Aug. 9, 2024, 3:30 p.m. (45 minutes).

In this presentation, the core AI Red Team at Meta will take you on a journey through the story of Red Teaming the Llama 3 Large Language Model. This talk is perfect for anyone eager to delve into the complexity of advanced model Red Teaming and safety, as well as how to perform their own research to find new attacks should attend this talk. We’ll begin by exploring what AI Red Teaming is truly about, before exploring Meta’s process and approaches on the topic. The team will detail our methodology for discovering new risks within complex AI capabilities, how emergent capabilities may breed emergent risks, what types of attacks we’re looking to perform across different model capabilities and how or why the attacks even work. Moreover, we’ll explore insights into which lessons from decades of security expertise can – and cannot – be applied as we venture into a new era of AI trust and safety. The team will then move on to how we used automation to scale attacks up, our novel approach to multi-turn adversarial AI agents and the systems we built to benchmark safety across a set of different high-risk areas. We also plan to discuss advanced cyber-attacks (both human and automated), Meta’s open benchmark CyberSecEvals and touch on Red Teaming for national security threats presented by state-of-the-art models. For each of these areas we’ll touch on various assessment and measurement challenges, ending on where we see the AI Red Teaming industry gaps, as well as where AI Safety is heading at a rapid pace.

Presenters:

  • Maya Pavlova - Software Engineer, GenAI Trust & Safety at Meta
    Currently a software engineer on Meta’s GenAI Trust & Safety, Maya Pavlova’s main work these days has been on understanding how to bridge the gap between manual red teaming processes and automated solutions. Maya originally entered this world from the safety testing lens, previously working on scaling Responsible AI’s fairness evaluation platforms, she has now pivoted to the interesting problem of how to automate AI red teaming attacks to build robust adversarial stress testing platforms.
  • Ivan Evtimov - Red Teaming Research Scientist, Gen AI Trust & Safety at Meta
    Currently a red teaming research scientist at Meta Gen AI Trust & Safety. Ivan has been the tech lead for red teaming Llama 3, Code Llama, AudioBox, Seamless and participated as a red teamer in many other model and product releases. Ivan has also carried out AI research on cybersecurity safety, robustness to spurious correlations, and fairness in AI systems. Before Meta, Ivan was a member of the Computer Security and Privacy Lab and the Tech Policy Lab at the University of Washington, carrying out research on adversarial machine learning. He has also been spotted on a bike in the general vicinity of New York City.
  • Joanna Bitton - Software Engineer, GenAI Trust & Safety at Meta
    Currently a software engineer on Meta’s GenAI Trust & Safety, Joanna has been the lead for automation, safety and red teaming across many internal projects at Meta. An original member of the Facebook AI Red Team, she has worked on critical Responsible AI issues for over five years. She is also the author of AugLy, a data augmentation library for audio, image, text, and video to bypass classifiers and perform other attacks with over 5k GitHub stars. Joanna takes red teaming to heart, and can neither confirm nor deny she was raised on a submarine.
  • Aaron Grattafiori / dyn - Lead, AI Red Teaming at Meta   as Aaron "dyn" Grattafiori
    Aaron “dyn” Grattafiori is currently a lead for AI Red Teaming at Meta, leading the fight against the machines. Previously he spent over six years leading the “cyber” Red Team at Meta performing full-scale Operations against a wide array of objectives from insider threats and edge device compromises to simulated supply chain attacks, ransomware, custom rootkits and malware. Before working at Meta, Aaron was a Principal Consultant at NCC Group for many years working on application security assessments for leading software companies across web, mobile, cryptography, virtualization, containers as well as network security assessments. Aaron has spoken on a wide range of topics at security conferences such as BlackHat, DEF CON, Enigma, Toorcon, Source Seattle, Red Team Summit and more. When not hacking the LLM gibson, Aaron can be found on the slopes, the garage working on an old car or hiking the front range in Colorado.

Similar Presentations: