Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier

Presented at Black Hat USA 2010, July 29, 2010, 10 a.m. (60 minutes)

In this whitepaper we consider the problem of outbound-filtering of emails to prevent accidental leakage of confidential information, We examine how to do this with the GPLed open-source spam filter CRM114 and test the accuracy of this filter against a 10,000+ document corpus of hand-classified emails (both confidential and non-confidential) in Japanese. We look into what moving parts are involved in these filters, and how they can be set up. The results show that a hybrid of multiple CRM114 filters outperforms a human-crafted regular-expression filter by nearly 100x in recall, by detecting > 99.9% of confidential documents, and with a simultaneous false alarm rate of less than 5.3%. As the programmers creating the machine-learning programs don't know how to read or write Japanese, this problem is an almost ideal case of the Searle "Chinese Room" problem.

Presenters:

Links:

Similar Presentations: