Beyond the Blacklists: Detecting Malicious URL Through Machine Learning

Presented at Black Hat Asia 2017, March 31, 2017, 10:15 a.m. (60 minutes)

Many types of modern malware utilize HTTP-based communications. Network-level behavioral signature/modeling in malware detection has some advantages, compared with traditional AV signature, or system-level behavioral models. Here we present a novel malware detection method based on URL behavioral modeling. The method has taken advantage of common practices of code re-use among many types of malware. Based on big data of known malware samples, we can distill concise feature models that represent common similarities in many different malware connection behaviors; the model can be used to detect unknown malware variants that share common network traits. <br> <br> We focused on HTTP connections because the protocol is the most used connection type for malicious software to phone home, get update, and receive command to start attack. Examining traits at http connection level have proved to be an efficient way to detect malicious connections. <br> <br> In our next generation firewall appliance, we had algorithms to examine connection domain name, URL path and user-agent using static blacklist and signatures to determine malicious user-agents, URL connection path. Combined with machine learning algorithm for DGA domain detection, we had achieved pretty good malicious URL detection rate. However, the most complex and challenging part is the dynamic content in the URL connection query string. Static signature rules become less effective because those strings are so diversified that they virtually can be anything. Variance and evolution of connection parameters can make signature generation time consuming. It also requires signature library performing frequent updates to emerging new connections features.<br> <br> The novel clustering algorithm we present in this talk is highly efficient - it could not only detect known malicious URL, but also new variants yet to be exposed (0-day). The model was machine learned from 800,000 URLs from malware samples with about 10k weekly update.

Presenters:

  • Jin Shang - Chief Scientist & Technical Fellow, Hillstone Networks
    Dr. Shang has more than 15 years network security industry experiences such as Hillstone Networks, Juniper, Fortinet, Sonicwall, etc., and now is the chief scientist in Hillstone Networks. In recent years, he lead the research team to utilize machine learning, data statistics and analysis on malware detection and network anomaly behavior analysis. He also lead the research and product development in cloud security and micro-segmentation technology, and published several security research papers and hold 9 US network security patents.
  • Hao Dong - Senior Principle Engineer, Hillstone Networks
    Hao Dong is Network Security Engineer at Hillstone Networks. His interests include malware research, network and application vulnerability. Prior to Hillstone, Hao worked at Juniper Networks developing high end security gateway products.
  • David Yu - Engineer, Hillstone Networks
    David Yu is a distinguished engineer at Hillstone Networks Inc., a company specialized in developing networking security products including firewall, next generation firewall and other malware and attacks detections and protections using machine learning and data analysis. He has over 20 years experiences in networking and security industry including developing next generation high performance firewalls and advanced solutions against today’s sophisticated malware attacks.
  • Chenghuai Lu - Senior Principle Software Engineer, Hillstone Networks
    Chenghuai Lu is a senior principle software engineer at Hillstone Networks Inc., where he leads efforts to develop solutions to defend against advanced threats in next generation firewall.

Links:

Similar Presentations: