An empirical evaluation of fuzz targets using mutation testing

Bruno E. R. Garcia; Simone R. S. Souza

doi:10.5753/sast.2025.14183

Bruno E. R. Garcia USP
Simone R. S. Souza USP

DOI: https://doi.org/10.5753/sast.2025.14183

Resumo

Software testing through fuzzing has gained widespread adoption for discovering security vulnerabilities, yet questions remain about its effectiveness in detecting subtle behavioral faults. This paper presents an empirical evaluation investigating the intersection of fuzzing and mutation testing, specifically examining how well fuzz targets perform when evaluated through mutation analysis. We conducted a systematic study using Bitcoin Core as our subject system, analyzing 10 different fuzz targets across various modules and evaluating their ability to detect 726 generated mutants. Our methodology involved executing fuzz targets with existing seed corpora and measuring mutation scores both with and without assertion statements to understand the role of explicit oracles in fault detection. Our findings reveal that contrary to previous studies suggesting fuzzing’s limited effectiveness in mutation testing, several fuzz targets achieved high mutation scores, with two targets reaching 100% mutant detection rates. We identified three key design patterns that significantly enhance mutant detection capabilities: (1) round-trip testing approaches that verify data integrity through serialization-deserialization cycles, (2) mathematical oracles that implement exact behavioral verification through redundant calculations, and (3) metamorphic relations that validate expected relationships between inputs and outputs. Our analysis demonstrates a positive correlation between assertion density in fuzz targets and mutation scores, with assertion removal causing substantial drops in detection rates across all targets. The study contributes empirical evidence that well-designed fuzz targets can effectively detect subtle behavioral faults beyond traditional crash-based vulnerabilities. Our results suggest that incorporating explicit oracles, metamorphic properties, and roundtrip verification mechanisms into fuzz target design significantly improves their mutation testing performance. These findings have practical implications for improving fuzzing methodologies and developing more comprehensive automated testing strategies for safety-critical software systems.

Palavras-chave: Mutation testing, fuzzing, empirical software engineering

Referências

Marcio Delamaro, Mario Jino, and Jose Maldonado. 2016. Introdução ao teste de software - 2ed. Elsevier Brasil.

Andrea Fioraldi, Daniele Cono D’Elia, and Emilio Coppa. 2020. WEIZZ: automatic grey-box fuzzing for structured binary formats. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual Event, USA) (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, 1–13. DOI: 10.1145/3395363.3397372

Bruno Garcia, Marcio Delamaro, and Simone Souza. 2024. Towards differential fuzzing to reduce manual efforts to identify equivalent mutants: A preliminary study. In Anais do XXXVIII Simpósio Brasileiro de Engenharia de Software (Curitiba/ PR). SBC, Porto Alegre, RS, Brasil, 568–573. DOI: 10.5753/sbes.2024.3557

Rahul Gopinath, Philipp Görz, and Alex Groce. 2022. Mutation Analysis: Answering the Fuzzing Challenge. arXiv:2201.11303 [cs.SE] [link]

Alex Groce, Kush Jain, Rijnard van Tonder, Goutamkumar Tulajappa Kalburgi, and Claire Le Goues. 2022. Looking for Lacunae in Bitcoin Core’s Fuzzing Efforts. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 185–186. DOI: 10.1145/3510457.3513072

Pieter Hartel and Richard Schumi. 2020. Mutation Testing of Smart Contracts at Scale. In Tests and Proofs, Wolfgang Ahrendt and Heike Wehrheim (Eds.). Springer International Publishing, Cham, 23–42.

Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. Deep-Mutation++: A Mutation Testing Framework for Deep Learning Systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1158–1161. DOI: 10.1109/ASE.2019.00126

Gunel Jahangirova. 2017. Oracle problem in software testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (Santa Barbara, CA, USA) (ISSTA 2017). Association for Computing Machinery, New York, NY, USA, 444–447. DOI: 10.1145/3092703.3098235

K. Jain, G. Kalburgi, C. Le Goues, and A. Groce. 2023. Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, Los Alamitos, CA, USA, 102–113. DOI: 10.1109/ISSRE59848. 2023.00036

Jaekwon Lee, Enrico Vigano, Fabrizio Pastore, and Lionel Briand. 2024. MOTIF: A tool for Mutation Testing with Fuzzing . In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE Computer Society, Los Alamitos, CA, USA, 451–453. DOI: 10.1109/ICST60714.2024.00052

Amol Saxena, Roheet Bhatnagar, and Devesh Kumar Srivastava. 2021. Improving Effectiveness of Spectrum-based Software Fault Localization using Mutation Testing. In 2021 2nd International Conference for Emerging Technology (INCET). 1–7. DOI: 10.1109/INCET51464.2021.9456109

Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: brute force vulnerability discovery. Pearson Education.

Vasudev Vikram, Isabella Laybourn, Ao Li, Nicole Nair, Kelton OBrien, Rafaello Sanna, and Rohan Padhye. 2023. Guiding Greybox Fuzzing with Mutation Testing. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 929–941. DOI: 10.1145/3597926.3598107

Pieter Wuille. 2017. BIP 173: Base32 address format for native v0-16 witness outputs. [link] Bitcoin Improvement Proposal.