Detecção de vulnerabilidades em bytecodes de contratos inteligentes no Ethereum via embeddings do CodeBERT
Resumo
O Ethereum é uma plataforma de criptomoedas que permite a execução de contratos inteligentes, programas autônomos que operam em uma rede descentralizada. As vulnerabilidades nesses contratos representam grandes riscos financeiros e de segurança nos ecossistemas blockchain, motivando a automatização do processo de detectá-las. Este trabalho estuda a detecção de vulnerabilidades em contratos inteligentes Ethereum usando embeddings derivados de bytecode. Embeddings são representações vetoriais geradas por modelos de linguagem, que capturam as características estruturais de texto. Essas representações foram usadas como entrada para os algoritmos de regressão logística, árvore de decisão e floresta aleatória, com o fim de detectar quais contratos possuem vulnerabilidades. Os resultados mostram que os embeddings contêm informações úteis para distinguir contratos vulneráveis de não vulneráveis. O estudo também constata que a alteração da distribuição original dos dados durante o treinamento afeta significativamente o desempenho, destacando a sensibilidade das abordagens baseadas em embeddings às estratégias de amostragem.Referências
Boughorbel, S., Jarray, F., and El-Anbari, M. (2017). Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS one, 12(6):e0177678.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Chu, H., Zhang, P., Dong, H., Xiao, Y., Ji, S., and Li, W. (2023). A survey on smart contract vulnerabilities: Data sources, detection and repair. Information and Software Technology, 159:107221.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186.
Di Angelo, M., Durieux, T., Ferreira, J. F., and Salzer, G. (2023). Smartbugs 2.0: An execution framework for weakness detection in ethereum smart contracts. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 2102–2105. IEEE.
Di Angelo, M. and Salzer, G. (2019). A survey of tools for analyzing ethereum smart contracts. In Proceedings of the 2019 IEEE international conference on decentralized applications and infrastructures (DAPPCON), pages 69–78.
Dong, Z., Hu, Q., Guo, Y., Zhang, Z., Cordy, M., Papadakis, M., Le Traon, Y., and Zhao, J. (2025). Boosting source code learning with text-oriented data augmentation: an empirical study. Empirical Software Engineering, 30(3):68.
Durieux, T., Ferreira, J. F., Abreu, R., and Cruz, P. (2020). Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International conference on software engineering, pages 530–541.
Feist, J., Grieco, G., and Groce, A. (2019). Slither: a static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pages 8–15.
Feng, Z. (2020). Codebert: A pre-trained model for program-ming and natural languages. arXiv preprint arXiv:2002.08155.
Hwang, S.-J., Ju, S. H., and Choi, Y.-H. (2024). Cggnet: compiler-guided generation network for smart contract data augmentation. IEEE Access, 12:97515–97532.
Safavian, S. R. and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3):660–674.
Sharma, N., Sharma, S., et al. (2022). A survey of mythril, a smart contract security analysis tool for evm bytecode. Indian Journal of Natural Sciences, 13(75):51003–51010.
Szabo, N. (1996). Smart contracts: building blocks for digital markets. EXTROPY: The journal of transhumanist thought, 18(2):28.
Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., and Vechev, M. (2018). Securify: Practical security analysis of smart contracts. In Proc. of the 2018 ACM SIGSAC conference on computer and communications security, pages 67–82.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wood, G. (2014). Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151:1–32.
Wright, R. E. (1995). Logistic regression.
Zhang, H., Cisse, M., Dauphin, Y. N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.
Chu, H., Zhang, P., Dong, H., Xiao, Y., Ji, S., and Li, W. (2023). A survey on smart contract vulnerabilities: Data sources, detection and repair. Information and Software Technology, 159:107221.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186.
Di Angelo, M., Durieux, T., Ferreira, J. F., and Salzer, G. (2023). Smartbugs 2.0: An execution framework for weakness detection in ethereum smart contracts. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 2102–2105. IEEE.
Di Angelo, M. and Salzer, G. (2019). A survey of tools for analyzing ethereum smart contracts. In Proceedings of the 2019 IEEE international conference on decentralized applications and infrastructures (DAPPCON), pages 69–78.
Dong, Z., Hu, Q., Guo, Y., Zhang, Z., Cordy, M., Papadakis, M., Le Traon, Y., and Zhao, J. (2025). Boosting source code learning with text-oriented data augmentation: an empirical study. Empirical Software Engineering, 30(3):68.
Durieux, T., Ferreira, J. F., Abreu, R., and Cruz, P. (2020). Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International conference on software engineering, pages 530–541.
Feist, J., Grieco, G., and Groce, A. (2019). Slither: a static analysis framework for smart contracts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pages 8–15.
Feng, Z. (2020). Codebert: A pre-trained model for program-ming and natural languages. arXiv preprint arXiv:2002.08155.
Hwang, S.-J., Ju, S. H., and Choi, Y.-H. (2024). Cggnet: compiler-guided generation network for smart contract data augmentation. IEEE Access, 12:97515–97532.
Safavian, S. R. and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3):660–674.
Sharma, N., Sharma, S., et al. (2022). A survey of mythril, a smart contract security analysis tool for evm bytecode. Indian Journal of Natural Sciences, 13(75):51003–51010.
Szabo, N. (1996). Smart contracts: building blocks for digital markets. EXTROPY: The journal of transhumanist thought, 18(2):28.
Tsankov, P., Dan, A., Drachsler-Cohen, D., Gervais, A., Buenzli, F., and Vechev, M. (2018). Securify: Practical security analysis of smart contracts. In Proc. of the 2018 ACM SIGSAC conference on computer and communications security, pages 67–82.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wood, G. (2014). Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151:1–32.
Wright, R. E. (1995). Logistic regression.
Zhang, H., Cisse, M., Dauphin, Y. N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
Publicado
25/05/2026
Como Citar
OLIVEIRA, Pedro Henrique F. S.; BERNARDINO, Heder S.; VILLELA, Saulo Moraes; SILVA, Edelberto Franco; SOUZA, Jairo Francisco de; VIEIRA, Alex B..
Detecção de vulnerabilidades em bytecodes de contratos inteligentes no Ethereum via embeddings do CodeBERT. In: SIMPÓSIO BRASILEIRO DE REDES DE COMPUTADORES E SISTEMAS DISTRIBUÍDOS (SBRC), 44. , 2026, Praia do Forte/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 631-644.
ISSN 2177-9384.
DOI: https://doi.org/10.5753/sbrc.2026.19804.
