TLS fingerprinting maps data contained within the TLS ClientHello to a set of possible applications or TLS libraries such as Chrome 74.0 or OpenSSL 1.1.0k. We have developed a system that continuously fuses endpoint and network data from real-world networks and a malware analysis sandbox to automatically generate up-to-date and representative TLS fingerprint databases. Each fingerprint has a list of processes observed using the fingerprint, where each process object contains the SHA-256, process name, a sorted list of destinations/counts, a sorted list of OSes/count, and any antivirus signatures associated with the SHA-256.
Recently, TLS fingerprinting has gained traction as a mean to efficiently identify encrypted malicious traffic. In this talk, we use our databases to highlight some limitations of TLS fingerprint-only malware detection due to the large number of false positives introduced when malicious and benign applications use the same TLS libraries. To overcome these limitations, we have developed a simple and explainable method using naïve Bayes that incorporates destination information and leverages the additional details introduced by our TLS fingerprint database. Finally, we show how to generalize these techniques by defining equivalence classes for the destinations, e.g., by mapping destination IP address to autonomous systems. Real-world examples and results based on our open source project will be presented throughout the talk.