Entropy-based data organization tricks for log and packet capture browsing

Presented at DEF CON 15 (2007), Aug. 5, 2007, 2 p.m. (50 minutes)

I will show how entropy, a measure of information content defined by Shannon in 1948, can provide useful ways of organizing and analyzing log data. In particular, we use entropy and mutual information heuristics to group syslog records and packet captures in such a way as to bring out anomalies and summarize the overall structure in each particular data set. I will show a modification of Ethereal that is based on these heuristics, and a separate tool for browsing syslogs. Our data organization heuristics produce decision trees that can be saved and applied to building views of other data sets. Our tools also allow the user to mark records based on relevance, and use this feedback to improve the data views. Our tools and algorithm descriptions can be found at http://kerf.cs.dartmouth.edu


  • Sergey Bratus - Department of Computer Science, Institute for Security Technology Studies, Dartmouth College
    Sergey Bratus: For the past five years, my research at Dartmouth's Institute for Security Technology Studies was related to application of information theory and machine learning to log analysis and other security topics. Before that, I worked as a research scientist at BBN Technologies on applications of similar techniques to Natural Language Processing, English text and speech.