Effective Vulnerability Discovery with Machine Learning

Presented at Black Hat Europe 2020 Virtual, Dec. 10, 2020, 11:20 a.m. (30 minutes)

Software Composition Analysis (SCA) products report vulnerabilities in third-party dependencies by comparing libraries detected in an application against a database of known vulnerabilities. These databases typically incorporate multiple sources, such as bug tracking systems, source code commits, and mailing lists, and must be curated by security researchers to maximize accuracy.<br /> <br /> We designed and implemented a machine learning system which features a complete pipeline, from data collection, model training, and prediction on data item, to validation of new models before deployment. The process is executed iteratively to generate better models with newer labels, and it incorporates self-training to automatically increase its training dataset.<br /> <br /> The deployed model is used to automatically predict the vulnerability-relatedness of each data item. This allows us to effectively discover vulnerabilities across the open-source library ecosystem.<br /> <br /> To help in performance stability, our methodology also includes an additional evaluation step to automatically determine how well the model from a new iteration would fare. In particular, the evaluation helps to see how much it agrees with the old model, while trying to increase metrics such as precision and/or recall.<br /> <br /> This is the first study of its kind across a variety of data sources, and our paper was recently awarded the ACM SIGSOFT Distinguished Paper Award at the Mining Software Repositories Conference (MSR) 2020.

Presenters:

  • Asankhaya Sharma - Director, Software Engineering, Veracode
    Asankhaya Sharma is a cyber security expert and technology leader with over a decade of experience in creating security products for industry, academia, and open-source community. He is passionate about building high performing teams and taking innovative products to market.
  • Ming Yi Ang - Security Research Engineer, Veracode
    Ming Yi Ang is a lead researcher at Veracode. He designs and implements security automation tools to discover new security issues.

Links:

Similar Presentations: