Um estudo sobre reutilização de treinamento em Modelos de Previsão de Vulnerabilidade

  • Matheus Vinícius Todescato
  • Guilherme Dal Bianco UFFS

Abstract


Finding bugs or code failures in systems can be a complex and costly task. An alternative to reduce user effort is to apply the Vulnerability Prediction Model (VPM). A VPM uses machine learning techniques to identify code parts with possible bugs. For this, VPM needs training (source code files containing bugs) to build a prediction model. Such a problem is known as cold-start, in which the method has no information to start the bugs identification process. In this work, the objective is to experimentally evaluate the reuse of training between projects to reduce the manual cost of the process when we aim to identify all (or almost) bug codes.

References

Cormack, G. V. and Grossman, M. R. (2016a). EngiIn ACM SIGIR, pages neering quality and reliability in technology-assisted review. 75–84.

Cormack, G. V. and Grossman, M. R. (2016b). Scalability of continuous active learning for reliable high-recall text classification. pages 1039–1048.

Cruz, L. A. (2019). Modelo para recuperação de informação em repositórios institucionais utilizando a técnica de sumarização a partir da seleção de atributos do cassiopeia.

Fisichella, M., Kawase, R., and Gadiraju, U. (2009). Automatic classification of documents in cold-start scenarios.

Goodfellow, Y. and Courville, A. (2016). Machine Learning Basics, page 95–160. The MIT Press.

Li, D. and Kanoulas, E. (2020). When to stop reviewing in technology-assisted reviews: Sampling from an adaptive distribution to estimate residual relevant documents. ACM Trans. Inf. Syst., 38(4).

Manning, C. D., Raghavan, P., and Schütze, H. (2018). Introduction to information retrieval. Cambridge University Press.

Morais, C., Meirelles, P., and Morais, E. (2012). Kalibro metrics: um serviço para monitoramento e interpretaç ao de métricas de código-fonte.

Settles, B. (2009a). Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences.

Settles, B. (2009b). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.

Shamal, P., Rahamathulla, K., and Akbar, A. (2017). A study on software vulnerability prediction model. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pages 703–706. IEEE.

Tóth, Z., Gyimesi, P., and Ferenc, R. (2016). A public bug database of github projects and its application in bug prediction. In International Conference on Computational Science and Its Applications, pages 625–638. Springer.

Yu, Z., Kraft, N. A., and Menzies, T. (2018). Finding better active learners for faster literature reviews. Empirical Software Engineering, 23(6):3161–3186. [Yu et al. 2019] Yu, Z., Theisen, C., Williams, L., and Menzies, T. (2019). Improving vulnerability inspection efficiency using active learning. IEEE Transactions on Software Engineering.

Zhang, J., Wu, J., Chen, C., Zheng, Z., and Lyu, M. R. (2020). Cds: A cross–version software defect prediction model with data selection. IEEE Access, 8:110059–110072.

Zhang, Y., Lo, D., Xia, X., Xu, B., Sun, J., and Li, S. (2015). Combining software metrics and text features for vulnerable file prediction. In 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS), pages 40–49. IEEE.
Published
2021-09-13
TODESCATO, Matheus Vinícius; DAL BIANCO, Guilherme. Um estudo sobre reutilização de treinamento em Modelos de Previsão de Vulnerabilidade. In: REGIONAL DATABASE SCHOOL (ERBD), 16. , 2021, Santa Maria. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 99-108. ISSN 2595-413X. DOI: https://doi.org/10.5753/erbd.2021.17243.