Evaluating Zero-shot Reasoning with Agentic LLMs for Smart Contract Vulnerability Detection

Eduardo Sardenberg Tavares; Antonio José. G. Busson; Sérgio Colcher

doi:10.5753/webmedia_estendido.2025.15858

Eduardo Sardenberg Tavares PUC-Rio
Antonio José. G. Busson BTG Pactual
Sérgio Colcher PUC-Rio

DOI: https://doi.org/10.5753/webmedia_estendido.2025.15858

Resumo

Smart contracts are fundamental to blockchain ecosystems, but remain susceptible to security vulnerabilities that can lead to severe financial losses. Recent advances in agentic AI systems, powered by large language models (LLMs), enable autonomous code analysis and decision-making without explicit task-specific supervision. These systems leverage prompt engineering and zero-shot reasoning to detect vulnerabilities in smart contracts without prior fine-tuning. In this work, we evaluate the effectiveness of agentic LLM-based approaches in identifying vulnerabilities using prompt engineering and zero-shot reasoning across a curated dataset of Solidity smart contracts. Our findings highlight the limitations of current LLMs in automated vulnerability detection, providing insights into their practical applicability for securing decentralized applications. Our best-performing configuration, which integrates zero-shot reasoning with the Tree of Thoughts framework, achieved an F1-score of 73.66%.

Palavras-chave: Smart contracts, Large Language Models, Vulnerability detection, Solidity, Prompt engineering, Zero-Shot

Referências

Nicola Atzei, Massimo Bartoletti, and Tiziana Cimoli. 2017. A survey of attacks on ethereum smart contracts (sok). In International conference on principles of security and trust. Springer, 164–186.

Paulo Victor Borges, Adeoye Sunday Ladele, Yan MBG Cunha, Daniel de S Moraes, Polyana B da Costa, Pedro TC dos Santos, Rafael Rocha, Antonio JG Busson, Julio Cesar Duarte, and Sérgio Colcher. [n. d.]. Multimodal Prompt Engineering for Multimedia Applications using the GPT Model. ( [n. d.]).

Vitalik Buterin et al. 2014. A next-generation smart contract and decentralized application platform. white paper 3, 37 (2014), 2–1.

Chong Chen, Jianzhong Su, Jiachi Chen, Yanlin Wang, Tingting Bi, Jianxing Yu, Yanli Wang, Xingwei Lin, Ting Chen, and Zibin Zheng. 2025. When chatgpt meets smart contract vulnerability detection: How far are we? ACM Transactions on Software Engineering and Methodology 34, 4 (2025), 1–30.

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. arXiv preprint arXiv:2402.02716 (2024).

InPlusLab. 2023. DAppSCAN: Vulnerability Dataset for Smart Contracts. [link].

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.

Jieyi Long. 2023. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291 (2023).

Daniel Moraes, Polyana Costa, Pedro Santos, Ivan Pinto, Sérgio Colcher, Antonio Busson, Matheus Pinto, Rafael Rocha, Rennan Gaio, Gabriela Tourinho, Marcos Rabaioli, and David Favaro. 2024. Tagging Enriched Bank Transactions Using LLM-Generated Topic Taxonomies. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web (Juiz de Fora/MG). SBC, Porto Alegre, RS, Brasil, 267–274. DOI: 10.5753/webmedia.2024.243267

OpenZeppelin. 2023. OpenZeppelin Contracts: A library for secure smart contract development. [link].

Peng Qian, Zhenguang Liu, Qinming He, Butian Huang, Duanzheng Tian, and Xun Wang. 2022. Smart contract vulnerability detection technique: A survey. arXiv preprint arXiv:2209.05872 (2022).

Onur Sürücü, Uygar Yeprem, Connor Wilkinson, Waleed Hilal, S Andrew Gadsden, John Yawney, Naseem Alsadi, and Alessandro Giuliano. 2022. A survey on ethereum smart contract vulnerability detection using machine learning. Disruptive Technologies in Information Sciences VI 12117 (2022), 110–121.

Uniswap. 2020. Uniswap V2: Non-Bug Contracts Dataset. [link].

Uniswap. 2021. Uniswap V3: Non-Bug Contracts Dataset. [link].

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. [n. d.]. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.

GavinWood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32.

Xf97. 2020. JiuZhou: A Dataset for Smart Contract Bug Localization. [link].

ZeKe Xiao, Qin Wang, Hammond Pearce, and Shiping Chen. 2025. Logic meets magic: Llms cracking smart contract vulnerabilities. arXiv preprint arXiv:2501.07058 (2025).