Beyond Green Tests: Removing Smells From Natural Language Tests

Manoel Aranda III; Márcio Ribeiro

doi:10.5753/sbqs.2025.15884

Manoel Aranda III UFAL http://orcid.org/0000-0001-9540-1605
Márcio Ribeiro UFAL https://orcid.org/0000-0002-4293-4261

DOI: https://doi.org/10.5753/sbqs.2025.15884

Resumo

Test smells signal design flaws in tests, harming maintainability and reliability. While automated test smells are well-studied, natural language test smells remain underexplored. Prior work identified 13 such smells but lacked systematic removal strategies and automated tools.We bridge this gap by presenting a catalog of transformations for seven key natural language test smells and a NLP-based tool for automated detection and correction. We evaluated our approach through a survey of 15 professionals and empirical analysis of Ubuntu OS test cases. Results show high professional acceptance (91.43%) and strong tool precision (83.70% F-Measure). Our work is the first to systematically address natural language test smell removal.

Palavras-chave: Natural Language Test, Test Smells, Software Testing

Referências

Diogo Almeida, José Creissac Campos, João Saraiva, and João Carlos Silva. 2015. Towards a catalog of usability smells. In SAC 2015. 175–181.

Manoel Aranda, Naelson Oliveira, Elvys Soares, Márcio Ribeiro, Davi Romão, Ullyanne Patriota, Rohit Gheyi, Emerson Souza, and Ivan Machado. 2024. A Catalog of Transformations to Remove Smells From Natural Language Tests. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (Salerno, Italy) (EASE ’24). Association for Computing Machinery, New York, NY, USA, 7–16. DOI: 10.1145/3661167.3661225

James E. Bartlett II, JoeW. Kotrlik, and Chadwick C. Higgins. 2001. Organizational research: Determining appropriate sample size in survey research. Information technology, learning, and performance journal 19, 1 (2001), 43–50.

Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2015. Are test smells really harmful? an empirical study. Empirical Software Engineering 20 (2015), 1052–1094.

Martin Fowler and Kent Beck. 1997. Refactoring: Improving the design of existing code.

Vahid Garousi and Barış Küçük. 2018. Smells in software test code: A survey of knowledge in industry and academia. Journal of systems and software 138 (2018), 52–81.

Benedikt Hauptmann, Maximilian Junker, Sebastian Eder, Lars Heinemann, Rudolf Vaas, and Peter Braun. 2013. Hunting for smells in natural language tests. In ICSE 2013. 1217–1220.

Matthew Honnibal and Ines Montani. 2024. spaCy – Industrial-strength Natural Language Processing in Python. [link]

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2 (1993), 313–330. [link]

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In LREC 16. 1659–1666.

Elvys Soares, Manoel Aranda, Naelson Oliveira, Márcio Ribeiro, Rohit Gheyi, Emerson Souza, Ivan Machado, André Santos, Baldoino Fonseca, and Rodrigo Bonifácio. 2023. Manual tests do smell! cataloging and identifying natural language test smells. In 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–11.

Elvys Soares, Márcio Ribeiro, Rohit Gheyi, Guilherme Amaral, and André Santos. 2023. Refactoring Test Smells With JUnit 5: Why Should Developers Keep Up-to-Date? IEEE Transactions on Software Engineering 49, 3 (2023), 1152–1170.

Ubuntu. 2024. Ubuntu Manual Tests in Launchpad. [link]

Arie Van Deursen, Leon Moonen, Alex Van Den Bergh, and Gerard Kok. 2001. Refactoring test code. In XP 2001. 92–95.