Building a Labeled Smart Contract Dataset for Evaluating Vulnerability Detection Tools’ Effectiveness

  • Ryan Weege Achjian USP
  • Marcos Antonio Simplicio Junior USP

Resumo


In recent years, surveys on vulnerability detection tools for Solidity-based smart contracts have shown that many of them display poor capabilities. One of the causes for such deficiencies is the absence of quality benchmarking datasets, where bugs typically found in smart contracts are present in quantity and accurately labeled. VulLab’s main aim is to help tackle this issue as a framework that incorporates both, state-of-the-art vulnerability insertion and vulnerability detection tools. Such capabilities empower users to seamlessly generate benchmark capable datasets from collected contracts and employ them to validate novel analysis tool and obtain an accurate comparison with current state-of-the-art solutions. The framework was able to, from 50 smart contracts collected from the Ethereum mainnet, generate an annotated dataset more than 300 entries which included 20 unique vulnerabilities, and use them to compare 14 analysis tools in approximately 24 hours. VulLab is open-source and is available at https://github.com/lsRyan/vullab.

Referências

Ashizawa, N., Yanai, N., Cruz, J. P., and Okamura, S. (2021). Eth2vec: Learning contract-wide code representations for vulnerability detection on ethereum smart contracts. In Proceedings of the 3rd ACM International Symposium on Blockchain and Secure Critical Infrastructure, BSCI ’21, page 47–59, New York, NY, USA. Association for Computing Machinery.

Chaliasos, S., Charalambous, M., Zhou, L., Galanopoulou, R., Gervais, A., Mitropoulos, D., and Livshits, B. (2024). Smart contract and defi security tools: Do they meet the needs of practitioners? In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pages 705–717, Los Alamitos, CA, USA. IEEE Computer Society.

Chen, H., Pendleton, M., Njilla, L., and Xu, S. (2020). A survey on ethereum systems security: Vulnerabilities, attacks, and defenses. ACM Comput. Surv., 53(3).

Chu, H., Zhang, P., Dong, H., Xiao, Y., Ji, S., and Li, W. (2023). A survey on smart contract vulnerabilities: Data sources, detection and repair. Information and Software Technology, 159:107221.

di Angelo, M., Durieux, T., Ferreira, J. F., and Salzer, G. (2023). Smartbugs 2.0: An execution framework for weakness detection in ethereum smart contracts.

Feist, J., Grieco, G., and Groce, A. (2019). Slither: A static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pages 8–15.

Ferreira, J. a. F., Cruz, P., Durieux, T., and Abreu, R. (2021). Smartbugs: A framework to analyze solidity smart contracts. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, ASE ’20, page 1349–1352, New York, NY, USA. Association for Computing Machinery.

Ghaleb, A. and Pattabiraman, K. (2020). How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020, page 415–427, New York, NY, USA. Association for Computing Machinery.

GitHub (Access in 2025). Sarif documentation.

Jiamao (2021). Huanggai. [link].

Jin, L., Cao, Y., Chen, Y., Zhang, D., and Campanoni, S. (2023). Exgen: Crossplatform, automated exploit generation for smart contract vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 20(1):650–664.

Kushwaha, S. S., Joshi, S., Singh, D., and Kaur (2022a). Systematic review of security vulnerabilities in ethereum blockchain smart contract. IEEE Access, 10:6605–6621.

Kushwaha, S. S., Joshi, S., Singh, D., Kaur, M., and Lee, H.-N. (2022b). Ethereum smart contract analysis tools: A systematic review. IEEE Access, 10:57037–57062.

Lin (2022). A survey of application research based on blockchain smart contract. page 635–690.

Morello, G., Eshghie, M., Bobadilla, S., and Monperrus, M. (2024). Disl: Fueling research with a large dataset of solidity smart contracts.

Naeem, H. and Alalfi, M. H. (2024). Machine learning for cross-vulnerability prediction in smart contracts. In 2024 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pages 21–28.

Ortner, M. and Eskandari, S. Smart contract sanctuary.

OWASP (Access in 2025). Smart contract security weakness enumeration.

Ren, M., Yin, Z., Ma, F., Xu, Z., Jiang, Y., Sun, C., Li, H., and Cai, Y. (2021). Empirical evaluation of smart contract testing: what is the best choice? In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, page 566–579, New York, NY, USA. Association for Computing Machinery.

Shen, Z., Chen, Y., and Zhang, W. (2023). Gsvd: Common vulnerability dataset for smart contracts on bsc and polygon. pages 01–16.

Sun, Y., Wu, D., Xue, Y., Liu, H., Wang, H., Xu, Z., Xie, X., and Liu, Y. (2024). Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, ICSE ’24, New York, NY, USA. Association for Computing Machinery.

Tann, W. J.-W., Han, X. J., Gupta, S. S., and Ong, Y.-S. (2019). Towards safer smart contracts: A sequence learning approach to detecting security threats.

Wei, Z., Sun, J., Zhang, Z., Zhang, X., Li, M., and Hou, Z. (2024). Llm-smartaudit: Advanced smart contract vulnerability detection.

Zhou, H., Milani Fard, A., and Makanju, A. (2022). The state of ethereum smart contracts security: Vulnerabilities, countermeasures, and tool support. Journal of Cybersecurity and Privacy, 2(2):358–378.
Publicado
01/09/2025
ACHJIAN, Ryan Weege; SIMPLICIO JUNIOR, Marcos Antonio. Building a Labeled Smart Contract Dataset for Evaluating Vulnerability Detection Tools’ Effectiveness. In: SALÃO DE FERRAMENTAS - SIMPÓSIO BRASILEIRO DE CIBERSEGURANÇA (SBSEG), 25. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1-10. DOI: https://doi.org/10.5753/sbseg_estendido.2025.11380.