The Unbelievable Insecurity of the Big Data Stack: An Offensive Approach to Analyzing Huge and Complex Big Data Infrastructures

Presented at DEF CON 29 (2021), Aug. 6, 2021, 4 p.m. (45 minutes)

Honoring the term, the variety of technologies in the Big Data stack is hugely BIG. Many complex components in charge of transport, storing, and processing millions of records make up Big Data infrastructures. The speed at which data needs to be processed and how quickly the implemented technologies need to communicate with each other make security lag behind. Once again, complexity is the worst enemy of security. Today, when conducting a security assessment on Big Data infrastructures, there is currently no methodology for it and there are hardly any technical resources to analyze the attack vectors. On top of that, many things that are considered vulnerabilities in conventional infrastructures, or even in the Cloud, are not vulnerabilities in this stack. What is a security problem and what is not a security problem in Big Data infrastructures? That is one of the many questions that this research answers. Security professionals need to count on a methodology and acquire the necessary skills to competently analyze the security of such infrastructures. This talk presents a methodology, and new and impactful attack vectors in the four layers of the Big Data stack: Data Ingestion, Data Storage, Data Processing and Data Access. Some of the techniques that will be exposed are the remote attack of the centralized cluster configuration managed by ZooKeeper; packet crafting for remote communication with the Hadoop RPC/IPC to compromise the HDFS; development of a malicious YARN application to achieve RCE; interfering data ingestion channels as well as abusing the drivers of HDFS-based storage technologies like Hive/HBase, and platforms to query multiple data lakes as Presto. In addition, security recommendations will be provided to prevent the attacks explained. REFERENCES: I plan to release a white paper at the conference, in the white paper there will be all the references. Anyway, as the attacks are novel, the references are related to infrastructure stuff mostly, not so much about security.

Presenters:

  • Sheila A. Berta - Head of Research at Dreamlab Technologies
    Sheila A. Berta is an offensive security specialist who started at 12 years-old by learning on her own. At the age of 15, she wrote her first book about Web Hacking, published in several countries. Over the years, Sheila has discovered vulnerabilities in popular web applications and software, as well as given courses at universities and private institutes in Argentina. She specializes in offensive techniques, reverse engineering, and exploit writing and is also a developer in ASM (MCU and MPU x86/x64), C/C++, Python and Go. The last years she focused on Cloud Native and Big Data security. As an international speaker, she has spoken at important security conferences such as Black Hat Briefings, DEF CON, HITB, Ekoparty, IEEE ArgenCon and others. Sheila currently works as Head of Research at Dreamlab Technologies. @UnaPibaGeek

Links:

Similar Presentations: