Mining Vulnerability and Code Repositories to Study Software Security

  • João Rafael Henriques University of Coimbra
  • José D’Abruzzo Pereira University of Coimbra
  • Marco Vieira UNC Charlotte

Resumo


Software vulnerabilities are present in most software applications. However, vulnerability detection techniques usually suffer from the same issues: reporting items that are not actual vulnerabilities or not detecting all vulnerabilities. There are datasets to support the development of new vulnerability detection techniques. Nevertheless, their data are usually frozen and must be frequently updated with the newly disclosed vulnerabilities. Hence, we propose an automated solution to mine vulnerability. To do that, we use a known vulnerability database with static information with data from open-source C/C++ projects (Linux Kernel, Mozilla, Xen, httpd, and Glibc). The novel automated solution allows adding vulnerability information from other open-source projects (e.g., Cassandra, MongoDB, MySQL, Neo4J, Postgres, Django). A dashboard was created to support the database analysis. We investigate why the vulnerabilities change after being fixed, and we compare the original dataset and the current version. Results show that changes in the vulnerability information from online vulnerability databases can affect the vulnerability data over the years. Additional 6,617 vulnerabilities have been collected (both for the projects originally in the database and the new projects) since the release of the original database.
Palavras-chave: Software Vulnerability, Software Security, Mining Software Repositories, Software Metrics
Publicado
26/11/2024
HENRIQUES, João Rafael; PEREIRA, José D’Abruzzo; VIEIRA, Marco. Mining Vulnerability and Code Repositories to Study Software Security. In: LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING (LADC), 13. , 2024, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 11–16.