Leveraging N-Version Testing to Define Approximate Oracles for Performance Testing

Abstract


Performance testing plays a critical role in maintaining software quality by ensuring systems meet their expected efficiency and responsiveness. However, defining precise test oracles for performance testing remains a significant challenge. As a result, many software projects lack reliable performance test oracles, hindering the development of comprehensive test suites. Approximate test oracles have emerged as a promising alternative, offering practical means of validation in the absence of exact specifications. In this work, we explore the use of n-version testing, a technique traditionally used for fault detection through the comparison of multiple system versions, as a foundation for constructing approximate performance test oracles. Our approach leverages the performance history of recent versions of the system under test (SUT) to define an acceptable performance range. Testers configure key parameters such as the number of prior versions to consider, the strategy for computing reference performance, and the tolerance margin. When the current version’s performance falls outside the derived tolerance band, an alert is raised to trigger further investigation. In our preliminary investigation using a real-world proprietary software system (an image gallery application), we used historical performance data to demonstrate that our proposed approach would have been capable of detecting a known performance bug, previously confirmed by the development team.
Keywords: Performance, Test Oracle, N-version testing

References

Sheeva Afshan, Phil McMinn, and Mark Stevenson. 2013. Evolving Readable String Test Inputs Using a Natural Language Model to Reduce Human Oracle Cost. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 352–361. DOI: 10.1109/ICST.2013.11

Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525. DOI: 10.1109/TSE.2014.2372785

Clements P. Kazman R Bass, L. 2012. Software Architecture in Practice (3rd ed.). Addison-Wesley. Leveraging N-Version Testing to Define Approximate Oracles for Performance Testing SBES’25, September 22-26, 2025, Recife, PE

Ilene Burnstein. 2006. Practical software testing: a process-oriented approach (1st ed.). Springer Science Business Media.

Jinfu Chen and Weiyi Shang. 2017. An Exploratory Study of Performance Regression Introducing Code Changes. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 341–352. DOI: 10.1109/ICSME.2017.13

Jinfu Chen and Weiyi Shang. 2017. An Exploratory Study of Performance Regression Introducing Code Changes. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 341–352. DOI: 10.1109/ICSME.2017.13

Luciana Brasil Rebelo dos Santos, Érica Ferreira de Souza, André Takeshi Endo, Catia Trubiani, Riccardo Pinciroli, and Nandamudi Lankalapalli Vijaykumar. 2025. Performance regression testing initiatives: a systematic mapping. Information and Software Technology 179 (2025), 107641. DOI: 10.1016/j.infsof.2024.107641

Henrik Ingo and David Daly. 2020. Automated system performance testing at MongoDB. In Proceedings of the workshop on Testing Database Systems. 1–6.

Mark Leznik, Md Shahriar Iqbal, Igor Trubin, Arne Lochner, Pooyan Jamshidi, and André Bauer. 2022. Change point detection for MongoDB time series performance regression. In Companion of the 2022 ACM/SPEC International Conference on Performance Engineering. 45–48.

Lizhi Liao. 2023. Addressing Performance Regressions in DevOps: Can We Escape from System Performance Testing?. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 203–207. DOI: 10.1109/ICSE-Companion58688.2023.00056

Haroon Malik, Hadi Hemmati, and Ahmed E. Hassan. 2013. Automatic detection of performance deviations in the load testing of Large Scale Systems. In 2013 35th International Conference on Software Engineering (ICSE). 1012–1021. DOI: 10.1109/ICSE.2013.6606651

Williams L. G Smith, C. U. 2002. Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software (1st ed.). Addison-Wesley.

I. Sommerville. 2011. Software Engineering (9th ed.). Addison-Wesley.

Luca Traini, Federico Di Menna, and Vittorio Cortellessa. 2024. AI-driven Java Performance Testing: Balancing Result Quality with Testing Time. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 443–454.

Tao Xie. 2006. Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking. In ECOOP 2006 – Object-Oriented Programming, Dave Thomas (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 380–403.

Yoo and M. Harman. 2012. Regression testing minimization, selection and prioritization: A survey. Software Testing, Verification and Reliability 22, 2 (2012), 65–120. DOI: 10.1002/stvr.1466
Published
2025-09-22
MARCELINO, Críssia; MIRANDA, Breno. Leveraging N-Version Testing to Define Approximate Oracles for Performance Testing. In: BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 818-824. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.11583.