ABSTRACT
Flaky tests are tests with non-deterministic behavior that can produce random and inconsistent results. While researchers has investigated Flaky tests in traditional unit tests, less is known about their occurrence in user interface (UI) tests. This study proposes an empirical analysis of 24 open-source projects to identify the main causes and correction strategies for Flaky tests in UI tests. Our analysis identified 8 categories of causes and 7 categories of correction strategies, with Race Condition, Logic Issues, and Test Dependency being the main causes of flaky tests in UI tests. Addition of Wait, Correction of Logic, and Ignored Test were the most commonly applied correction categories for resolving Flaky tests in UI tests. Specifically, 89% of Flaky tests caused by a Race Condition were corrected by adding Wait, while 100% of Flaky tests caused by Test Logic and Dependency Issues were fixed by patching the test logic. Our results provide insights into the occurrence of flaky tests in UI tests and can help Test Analysts and testers develop UI test automation projects with higher quality. This study contributes to the literature on flaky tests by providing empirical evidence of their occurrence in UI tests and identifying their main causes and correction strategies.
- [n.d.]. GHTorrent: GitHub’s Data from a Firehose. In MSR ’12, Jim Godfrey, Michael W. e Whitehead (Ed.). https://doi.org/10.1109/MSR.2012.6224294Google ScholarCross Ref
- Mike Cohn. 2009. Succeeding with agile (1st ed.). Addison-Wesley Professional.Google Scholar
- Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding flaky tests. In 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). ACM, 830–840. https://doi.org/10.1145/3338906.3338945Google ScholarDigital Library
- Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer’s Perspective. In 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE 2019). ACM, New York, NY, USA, 830–840. https://doi.org/10.1145/3338906.3338945Google ScholarDigital Library
- Khaled El-Morabea and Hassaan El-Garem. 2021. Testing pyramid. In Modularizing legacy projects using TDD. Springer, 65–83. https://doi.org/10.1007/978-1-4842-7428-6_4Google ScholarCross Ref
- Emad Fallahzadeh and Peter C. Rigby. 2022. The Impact of Flaky Tests on Historical Test Prioritization on Chrome. In 44th International Conference on Software Engineering: Software Engineering in Practice(ICSE-SEIP). ACM, New York, NY, USA, 273–282. https://doi.org/10.1145/3510457.3513038Google ScholarDigital Library
- Gordon Fraser and Andrea Arcuri. 2014. A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite. ACM Trans. Softw. Eng. Methodol. 24, 2, Article 8 (dec 2014), 42 pages. https://doi.org/10.1145/2685612Google ScholarDigital Library
- Martin Gruber and Gordon Fraser. 2022. A survey on how test flakiness affects developers and what support they need to address it. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 82–92. https://doi.org/10.1109/ICST53961.2022.00020Google ScholarCross Ref
- Sarra Habchi, Guillaume Haben, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2022. A Qualitative Study on the Sources, Impacts, and Mitigation Strategies of Flaky Tests. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 244–255. https://doi.org/10.1109/ICST53961.2022.00034Google ScholarCross Ref
- Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA). ACM, New York, NY, USA, 101–111. https://doi.org/10.1145/3293882.3330570Google ScholarDigital Library
- Wing Lam, Kıvanç Muşlu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A study on the lifecycle of flaky tests. In ACM/IEEE 42nd International Conference on Software Engineering(ICSE). ACM, New York, NY, USA, 1471–1482. https://doi.org/10.1145/3377811.3381749Google ScholarDigital Library
- Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, and Jonathan Bell. 2020. A large-scale longitudinal study of flaky tests. Proc. ACM Program. Lang. 4, OOPSLA, Article 202 (nov 2020), 29 pages. https://doi.org/10.1145/3428270Google ScholarDigital Library
- Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering(FSE 2014). ACM, New York, NY, USA, 643–653. https://doi.org/10.1145/2635868.2635920Google ScholarDigital Library
- Abhishek Mishra. 2017. Testing the User Interface. Apress, Berkeley, CA, 407–432. https://doi.org/10.1007/978-1-4842-2689-6_13Google ScholarCross Ref
- Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2021. A survey of flaky tests. ACM Trans. Softw. Eng. Methodol. 31, 1, Article 17 (oct 2021), 74 pages. https://doi.org/10.1145/3476105Google ScholarDigital Library
- Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2022. Surveying the Developer Experience of Flaky Tests. In 44th International Conference on Software Engineering: Software Engineering in Practice(ICSE-SEIP). ACM, New York, NY, USA, 253–262. https://doi.org/10.1145/3510457.3513037Google ScholarDigital Library
- Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2022. What Do Developer-Repaired Flaky Tests Tell Us about the Effectiveness of Automated Flaky Test Detection?. In 3rd ACM/IEEE International Conference on Automation of Software Test(AST). ACM, New York, NY, USA, 160–164. https://doi.org/10.1145/3524481.3527227Google ScholarDigital Library
- Gustavo Pinto, Breno Miranda, Supun Dissanayake, Marcelo d’Amorim, Christoph Treude, and Antonia Bertolino. 2020. What is the vocabulary of flaky tests?. In 17th International Conference on Mining Software Repositories(MSR). ACM, New York, NY, USA, 492–502. https://doi.org/10.1145/3379597.3387482Google ScholarDigital Library
- Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. 2021. An empirical analysis of UI-based flaky tests. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1585–1597. https://doi.org/10.1109/ICSE43902.2021.00141Google ScholarDigital Library
- Arash Vahabzadeh, Amin Milani Fard, and Ali Mesbah. 2015. An empirical study of bugs in test code. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, USA, 101–110. https://doi.org/10.1109/ICSM.2015.7332456Google ScholarDigital Library
- Marco Tulio Valente. 2020. Modern software engineering. https://engsoftmoderna.info/Google Scholar
- Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.Google ScholarCross Ref
- Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the rest independence assumption. In International Symposium on Software Testing and Analysis (ISSTA). ACM, 385–396. https://doi.org/10.1145/2610384.2610404Google ScholarDigital Library
- Yixue Zhao, Justin Chen, Adriana Sejfia, Marcelo Schmitt Laser, Jie Zhang, Federica Sarro, Mark Harman, and Nenad Medvidovic. 2020. FrUITeR: A Framework for Evaluating UI Test Reuse. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). ACM, New York, NY, USA, 1190–1201. https://doi.org/10.1145/3368089.3409708Google ScholarDigital Library
- Wei Zheng, Guoliang Liu, Manqing Zhang, Xiang Chen, and Wenqiao Zhao. 2021. Research Progress of Flaky Tests. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 639–646. https://doi.org/10.1109/SANER50967.2021.00081Google ScholarCross Ref
Index Terms
- Flaky Tests in UI: Understanding Causes and Applying Correction Strategies
Recommendations
A Survey of Flaky Tests
Tests that fail inconsistently, without changes to the code under test, are described as flaky. Flaky tests do not give a clear indication of the presence of software bugs and thus limit the reliability of the test suites that contain them. A recent ...
An empirical analysis of flaky tests
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software EngineeringRegression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to ...
Mitigating the effects of flaky tests on mutation testing
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and AnalysisMutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test), where a test suite kills a mutant if any of the tests fail ...
Comments