A machine learning approach to detect misuse of cryptographic APIs in source code
Resumo
Cryptography is an indispensable tool for achieving security requirements such as software security. However, most software developers do not have enough knowledge regarding the proper use of cryptography and its APIs. This leads to incorrect use and exploitable vulnerabilities in software applications. Here, we propose an approach based on machine learning techniques to detect different kinds of cryptographic misuse in known java source code representations, achieving an average 52 percentage points improvement with respect to previous works.Referências
Alon, U., Zilberstein, M., Levy, O., and Yahav, E. (2019). Code2vec: Learning distributed representations of code. Proc. ACM Program. Lang., 3(POPL):40:1–40:29.
Antunes, N. and Vieira, M. (2014). Assessing and comparing vulnerability detection tools for web services: Benchmarking approach and examples. IEEE Transactions on Services Computing, 8(2):269–283.
Braga, A. and Dahab, R. (2016). Mining cryptography misuse in online forums. In 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 143–150. IEEE.
Braga, A. and Dahab, R. (2017). A longitudinal and retrospective study on how developers misuse cryptography in online communities. XVII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais (SBSeg’17), Brasília, DF, Brazil.
Braga, A., Dahab, R., Antunes, N., Laranjeiro, N., and Vieira, M. (2017). Practical evaluation of static analysis tools for cryptography: Benchmarking method and case study. In 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), pages 170–181. IEEE.
Braga, A., Dahab, R., Antunes, N., Laranjeiro, N., and Vieira, M. (2019). Understanding how to use static analysis tools for detecting cryptography misuse in software. IEEE Transactions on Reliability, 68(4):1384–1403.
Dam, H. K., Pham, T., Ng, S. W., Tran, T., Grundy, J., Ghose, A., Kim, T., and Kim, C.-J. (2018). A deep tree-based model for software defect prediction. arXiv preprint arXiv:1802.00921.
Díaz, G. and Bermejo, J. R. (2013). Static analysis of source code security: Assessment of tools against samate tests. Information and software technology, 55(8):1462–1476.
Fischer, F., Xiao, H., Kao, C.-Y., Stachelscheid, Y., Johnson, B., Razar, D., Fawkesley, P., Buckley, N., Böttinger, K., Muntean, P., et al. (2019). Stack overflow considered helpful! deep learning security nudges towards stronger cryptography. In 28th {USENIX} Security Symposium ({USENIX} Security 19), pages 339–356.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. biometrics, 21:768–769.
Giry, F. (2020). Keylenght nist report on cryptographic key length and reccomendation. URL: https://www.keylength.com/en/4/.
Goseva-Popstojanova, K. and Perhinschi, A. (2015). On the capability of static code analysis to detect security vulnerabilities. Information and Software Technology, 68:18–33.
Hagberg, A., Schult, D., and Swart, P. (2020). Networkx network analysis in python. URL: https://networkx.github.io/.
Lazar, D., Chen, H., Wang, X., and Zeldovich, N. (2014). Why does cryptographic software fail?: a case study and open problems. In Proceedings of 5th Asia-Pacific Workshop on Systems, page 7. ACM.
Long, F. and Rinard, M. (2016). Automatic patch generation by learning correct code. In ACM SIGPLAN Notices, volume 51, pages 298–312. ACM.
Mogensen, T. Æ. (2017). Introduction to compiler design. Springer.
Nadi, S., Krüger, S., Mezini, M., and Bodden, E. (2016). Jumping through hoops: Why do java developers struggle with cryptography apis? In Proceedings of the 38th International Conference on Software Engineering, pages 935–946. ACM.
Navarro, L. C., Navarro, A. K., Grégio, A., Rocha, A., and Dahab, R. (2018). Leveraging ontologies and machine-learning techniques for malware analysis into android permissions ecosystems. Computers & Security, 78:429–453.
Oracle (2020). Java cryptography architecture (jca) reference guide. URL: [link].
Parr, T. (2013). The definitive ANTLR 4 reference. Pragmatic Bookshelf.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2020). Scikit: Tunning the hyper-parameters of an estimator. URL: https://scikitlearn.org/stable/modules/grid_search.html.
Shippey, T., Bowes, D., and Hall, T. (2019). Automatically identifying code features for software defect prediction: Using ast n-grams. Information and Software Technology, 106:142–160.
Silva, F. B. et al. (2014). Bag of graphs= definition, implementation, and validation in classification tasks. URL: http://repositorio.unicamp.br/handle/REPOSIP/275527.
Silva, F. B., Werneck, R. d. O., Goldenstein, S., Tabbone, S., and Torres, R. d. S. (2018). Graph-based bag-of-words for classification. Pattern Recognition, 74:266–285.
Antunes, N. and Vieira, M. (2014). Assessing and comparing vulnerability detection tools for web services: Benchmarking approach and examples. IEEE Transactions on Services Computing, 8(2):269–283.
Braga, A. and Dahab, R. (2016). Mining cryptography misuse in online forums. In 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 143–150. IEEE.
Braga, A. and Dahab, R. (2017). A longitudinal and retrospective study on how developers misuse cryptography in online communities. XVII Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais (SBSeg’17), Brasília, DF, Brazil.
Braga, A., Dahab, R., Antunes, N., Laranjeiro, N., and Vieira, M. (2017). Practical evaluation of static analysis tools for cryptography: Benchmarking method and case study. In 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), pages 170–181. IEEE.
Braga, A., Dahab, R., Antunes, N., Laranjeiro, N., and Vieira, M. (2019). Understanding how to use static analysis tools for detecting cryptography misuse in software. IEEE Transactions on Reliability, 68(4):1384–1403.
Dam, H. K., Pham, T., Ng, S. W., Tran, T., Grundy, J., Ghose, A., Kim, T., and Kim, C.-J. (2018). A deep tree-based model for software defect prediction. arXiv preprint arXiv:1802.00921.
Díaz, G. and Bermejo, J. R. (2013). Static analysis of source code security: Assessment of tools against samate tests. Information and software technology, 55(8):1462–1476.
Fischer, F., Xiao, H., Kao, C.-Y., Stachelscheid, Y., Johnson, B., Razar, D., Fawkesley, P., Buckley, N., Böttinger, K., Muntean, P., et al. (2019). Stack overflow considered helpful! deep learning security nudges towards stronger cryptography. In 28th {USENIX} Security Symposium ({USENIX} Security 19), pages 339–356.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. biometrics, 21:768–769.
Giry, F. (2020). Keylenght nist report on cryptographic key length and reccomendation. URL: https://www.keylength.com/en/4/.
Goseva-Popstojanova, K. and Perhinschi, A. (2015). On the capability of static code analysis to detect security vulnerabilities. Information and Software Technology, 68:18–33.
Hagberg, A., Schult, D., and Swart, P. (2020). Networkx network analysis in python. URL: https://networkx.github.io/.
Lazar, D., Chen, H., Wang, X., and Zeldovich, N. (2014). Why does cryptographic software fail?: a case study and open problems. In Proceedings of 5th Asia-Pacific Workshop on Systems, page 7. ACM.
Long, F. and Rinard, M. (2016). Automatic patch generation by learning correct code. In ACM SIGPLAN Notices, volume 51, pages 298–312. ACM.
Mogensen, T. Æ. (2017). Introduction to compiler design. Springer.
Nadi, S., Krüger, S., Mezini, M., and Bodden, E. (2016). Jumping through hoops: Why do java developers struggle with cryptography apis? In Proceedings of the 38th International Conference on Software Engineering, pages 935–946. ACM.
Navarro, L. C., Navarro, A. K., Grégio, A., Rocha, A., and Dahab, R. (2018). Leveraging ontologies and machine-learning techniques for malware analysis into android permissions ecosystems. Computers & Security, 78:429–453.
Oracle (2020). Java cryptography architecture (jca) reference guide. URL: [link].
Parr, T. (2013). The definitive ANTLR 4 reference. Pragmatic Bookshelf.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2020). Scikit: Tunning the hyper-parameters of an estimator. URL: https://scikitlearn.org/stable/modules/grid_search.html.
Shippey, T., Bowes, D., and Hall, T. (2019). Automatically identifying code features for software defect prediction: Using ast n-grams. Information and Software Technology, 106:142–160.
Silva, F. B. et al. (2014). Bag of graphs= definition, implementation, and validation in classification tasks. URL: http://repositorio.unicamp.br/handle/REPOSIP/275527.
Silva, F. B., Werneck, R. d. O., Goldenstein, S., Tabbone, S., and Torres, R. d. S. (2018). Graph-based bag-of-words for classification. Pattern Recognition, 74:266–285.
Publicado
13/10/2020
Como Citar
RODRIGUES, Gustavo Eloi de P.; BRAGA, Alexandre M.; DAHAB, Ricardo.
A machine learning approach to detect misuse of cryptographic APIs in source code. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 20. , 2020, Petrópolis.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 1-14.
DOI: https://doi.org/10.5753/sbseg.2020.19223.