Hadoop: Apache's Open Source Implementation of Google's MapReduce Framework

Presented at DEF CON 17 (2009), Aug. 1, 2009, 4 p.m. (50 minutes)

This presentation will begin with a brief overview of Google's Map/Reduce Framework. Map/Reduce is built to analyze extremely large datasets. We will first look at what a Mapper and Reducer are, the inputs they take and the outputs they generate. From there, we will look at the open source java based implementation of the Map/Reduce Framework by the Apache Team's Hadoop Project. Since Hadoop is Java based, we will then look at using the Hadoop framework in order to build Mappers and Reducers in Python, as well as running Mappers written in the AWK scripting language. A brief comparison of compile times and efficiencies between the three will be shown, as well as the results from running our code on ASU's Saguaro Cluster. After that, we will brush over HBase, the Hadoop equivalent to Google's BigTable, a non-relational database for Map/Reduce. Finally, we will look at some demo code, including a machine learning algorithm based on the Netflix Prize Dataset, a 2 gigabyte dataset of movie ratings from the Netflix Database. Also, we will not present on, but will include source code from different team's projects, including Map/Reduce programs for image analysis and recognition, analyzing air traffic data, analyzing package delivery systems for use with swarm theory and a Map/Reduce program that analyzes patterns in large literature as a response to "The Bible Code", most of which use public datasets as inputs.

Presenters:

  • Ryan Anguiano - Hacked Existence Team
    Ryan Anguiano is a member of the Hacked Existence team. He is also a web application developer employed by The Forum Agency (theforumagency.com), having over 7 years of experience in the industry. Ryan is currently an undergraduate student at ASU, and is leading the web development team for MH+L Magazine (mhlmag.com).
  • Joey Calca - Hacked Existence Team
    Joey Calca is a member of the Hacked Existence Team. He is a recent graduate from Arizona State University with a Business Degree in Computer Information Systems and a background in Computer Systems Engineering. He has done various consulting projects including the design and implementation of large scale multimedia delivery systems, and is currently pursuing Cloud Computing as the next big technology boom.

Links:

Similar Presentations: