"$ echo Internet $>_...": Towards Practical Internet-wide Probing and Crawling

Presented at VB2016, Oct. 5, 2016, 2 p.m. (30 minutes)

With the technical evolution of large-scale computing and data collection, Internet-wide probing and crawling has gained more and more public acceptance in the security community. For instance, we can probe/scan network servers with delicately constructed requests to expose C&C servers. Also, crawling specific URLs can help us to identify compromised websites and Internet-wide malicious campaigns. Unlike passive monitoring and detection of ongoing attacks, the philosophy of probing and crawling aims at active and progressive searching for malicious infrastructure. However, along with the awareness of the black side, the effectiveness and efficiency of the methodology has been greatly questioned in practice. Some fundamental challenges include: how to generate a good request that detects malicious infrastructure; how to conduct a large-scale probing in a polite and effective way without disturbing normal Internet traffic; how to avoid being fingerprinted by malicious entities; how to coordinate the probing operation in a distributed fashion. Based on our two-year experience of conducting Internet-wide operations, in this paper, we discuss the challenges, methodology, system design and evaluation schemes of practical probing and crawling. In particular: 1. We discuss the general limitations and challenges of Internet-wide probing and crawling. More importantly, we show the problems we encounter in practice and propose our solution. 2. We propose a set of systematic approaches to generate probing and crawling requests. With the design of a novel feedback system, we iteratively improve the effectiveness of our probing in the long run. 3. We design a coordinated and distributed infrastructure to overcome multiple practical limitations. In particular, our design follows the principles of conducting a polite probing without disturbing normal traffic. 4. We reveal our most recent probing and crawling results and discuss the insights gained through our result. Our detection results show that our system can effectively track and expose the malicious infrastructure of infamous campaigns.


  • Kyle Sanders - Palo Alto Networks
    Kyle Sanders Kyle Sanders has worked in the IT industry for the last 11 years and is currently the team lead for malware research at Palo Alto Networks. His research interests are in automated malware detection, network forensics and code analysis.
  • Wei Xu - Palo Alto Networks
    Wei Xu Wei Xu is a security researcher at Palo Alto Networks. His current research interests include web security, network security and security data analysis. His past research works have been published in both academic and industry circles. He was a speaker at VB 2012/2014/2015 and Blackhat 2013. He received his B.S and M.S. degrees in electrical engineering from Tsinghua University, Beijing, China, in 2005 and 2007 respectively. He obtained his Ph.D. degree in computer science from Penn State University in 2013.
  • Yucheng Zhou - Palo Alto Networks
    Yuchen Zhou Yuchen Zhou is a web security researcher in the Internet Security Research group at Palo Alto Networks. His current research interest covers the web-based threat landscape such as malicious JavaScript analysis, exploit kit detection, malvertising, and browser emulation. Before joining Palo Alto Networks, Yuchen obtained his Ph.D., specializing in security and privacy of web applications and single sign-on systems, from the University of Virginia with Prof. David Evans.
  • Jun Wang - Palo Alto Networks
    Jun Wang Jun Wang is a security researcher at Palo Alto Networks. His research interests include systems and network security. He earned his Ph.D. degree from Penn State University in 2015 and his B.S. degree from Nanjing University, China, in 2010. His past research works have been published in major systems and security conferences including USENIX ATC and USENIX Security.
  • Zhaoyan Xu - Palo Alto Networks
    Zhaoyan Xu Zhaoyan Xu is a research engineer at Palo Alto Networks, CA, United States. He joined Palo Alto Networks in 2014 and worked in the area of Internet security. He earned his Ph.D. degree from Texas A&M University, College Station in 2014. His research interests include web security, malware analysis, detection and system security.