Social engineering is a big problem but very little progress has been made in stopping it, aside from the detection of email phishing. Social engineering attacks are launched via many vectors in addition to email, including phone, in-person, and via messaging. Detecting these non-email attacks requires a content-based approach that analyzes the meaning of the attack message.
We observe that any social engineering attack must either ask a question whose answer is private, or command the victim to perform a forbidden action. Our approach uses natural language processing (NLP) techniques to detect questions and commands in the messages and determine whether or not they are malicious.
Question answering approaches, a hot topic in information extraction, attempt to provide answers to factoid questions. Although the current state-of-the-art in question answering is imperfect, we have found that even approximate answers are sufficient to determine the privacy of an answer. Commands are evaluated by summarizing their meaning as a combination of the main verb and its direct object in the sentence. The verb-object pairs are compared against a blacklist to see if they are malicious.
We have tested this approach with over 187,000 phishing and non-phishing emails. We discuss the false positives and false negatives and why this is not an issue in a system deployed for detecting non-email attacks. In the talk, demos will be shown and tools will be released so that attendees can explore our approach for themselves.