Smart Prediction for Test Smell Refactorings
Abstract
Software plays a critical role in modern society, making software testing essential for ensuring quality. Poor testing practices, known as test smells, reduce test code quality and maintainability. Although refactoring is a widely used technique to address these issues, little is known about when developers choose to refactor test smells and whether these actions actually improve test code quality. This study introduces a machine learning approach to guide test smell refactoring. First, we aim to mine refactorings performed by developers to derive a catalog of test-specific refactorings. Findings show testing framework evolution helps address test smells. Second, we aim to understand whether developers target low-quality test codes to perform refactorings and their effects on code quality. Our findings reveal that developers tend to refactor structurally low-quality test code more often. Third, we aim to learn whether developers would perform refactorings and which refactorings they would apply to improve the test code quality. Results show that Support Vector Machine models predicted refactoring decisions with 30–100% accuracy.References
Aljedaani, W., Peruma, A., Aljohani, A., Alotaibi, M., Mkaouer, M. W., Ouni, A., Newman, C. D., Ghallab, A., and Ludi, S. (2021). Test smell detection tools: A systematic mapping study. In Evaluation and Assessment in Software Engineering, page 170–180, New York, NY, USA. ACM.
Aniche, M., Maziero, E., Durelli, R., and Durelli, V. H. S. (2022). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, 48(4):1432–1450.
Bavota, G., Qusef, A., Oliveto, R., De Lucia, A., and Binkley, D. (2012). An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In 2012 28th IEEE International Conference on Software Maintenance, pages 56–65, New York, NY, USA. IEEE.
Bavota, G., Qusef, A., Oliveto, R., Lucia, A., and Binkley, D. (2015). Are test smells really harmful? an empirical study. Empirical Software Engineering, 20(4):1052–1094.
Bell, J., Legunsen, O., Hilton, M., Eloussi, L., Yung, T., and Marinov, D. (2018). Deflaker: Automatically detecting flaky tests. In Proceedings of the 40th International Conference on Software Engineering, page 433–444, New York, NY, USA. ACM.
Campos, D., Rocha, L., and Machado, I. (2021). Developers perception on the severity of test smells: an empirical study. In Iberoamerican Conference on Software Engineering, pages 1–14, Costa Rica. arxiv.
Deursen, A., Moonen, L. M., Bergh, A., and Kok, G. (2001). Refactoring test code. Technical report, Centre for Mathematics and Computer Science, NLD.
Fowler, M. (1999). Refactoring - Improving the Design of Existing Code. Addison Wesley object technology series. Addison-Wesley, Upper Saddle River, NJ.
Garousi, V. and Küçük, B. (2018). Smells in software test code: A survey of knowledge in industry and academia. Journal of Systems and Software, 138:52–81.
Hadj-Kacem, M. and Bouassida, N. (2021). A multi-label classification approach for detecting test smells over java projects. Journal of King Saud University-Computer and Information Sciences.
Kim, D. J., Chen, T.-H. P., and Yang, J. (2021). The secret life of test smells-an empirical study on test smell evolution and maintenance. Empirical Software Engineering, 26(5):1–47.
Martins, L., Bezerra, C., Costa, H., and Machado, I. (2021a). Smart prediction for refactorings in the software test code. In Brazilian Symposium on Software Engineering, page 115–120, New York, NY, USA. ACM.
Martins, L., Brito, V., Feitosa, D., Rocha, L., Costa, H., and Machado, I. (2021b). From blackboard to the office: A look into how practitioners perceive software testing education. In Evaluation and Assessment in Software Engineering, page 211–220, New York, NY, USA. ACM.
Martins, L., Costa, H., and Machado, I. (2023a). On the diffusion of test smells and their relationship with test code quality of java projects. Journal of Software: Evolution and Process, page e2532.
Martins, L., Costa, H., Ribeiro, M., Palomba, F., and Machado, I. (2023b). Automating test-specific refactoring mining: A mixed-method investigation. In Proceedings of the 23rd IEEE Int.l Working Conf. on Source Code Analysis and Manipulation, page 12, Los Alamitos, CA, USA. IEEE Computer Society.
Martins, L., Ghaleb, T., Costa, H., and Machado, I. (2023c). Curated dataset of test-specific refactorings. Available at [link].
Martins, L., Ghaleb, T., Costa, H., and Machado, I. (2023d). Tsr-catalog: The catalog of test smells refactorings. Available at [link].
Martins, L., Ghaleb, T. A., Costa, H., and Machado, I. (2024a). A comprehensive catalog of refactoring strategies to handle test smells in java-based systems. Software Quality Journal, 32(2):641–679.
Martins, L., Pontillo, V., Costa, H., Ferrucci, F., Palomba, F., and Machado, I. (2024b). Toward classifying test refactoring opportunities using supervised machine learning: How far can we go? — online appendix. Available at [link].
Martins, L., Pontillo, V., Costa, H., Ferrucci, F., Palomba, F., and Machado, I. (2025). Test code refactoring unveiled: where and how does it affect test code quality and effectiveness? Empirical Software Engineering, 30(1):1–39.
Martins, L., Pontilo, V., Costa, H., Palomba, F., and Machado, I. (2023e). Online appendix — Test code refactoring unveiled: where and how does it affect test code quality and effectiveness? Available at [link].
Melo, S., Moreira, V., Paschoal, L. N., and Souza, S. (2020). Testing education: A survey on a global scale. In In 34th Brazilian Symposium on Software Engineering, page 554–563, New York, NY, USA. ACM.
Meszaros, G. (2007). xUnit test patterns: Refactoring test code. Addison-Wesley Signature Series. Addison-Wesley, Upper Saddle River, NJ.
Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3):370–384.
Palomba, F., Di Nucci, D., Panichella, A., Oliveto, R., and De Lucia, A. (2016). On the diffusion of test smells in automatically generated test code: An empirical study. In Proceedings of the 9th International Workshop on Search-Based Software Testing, page 5–14, New York, NY, USA. ACM.
Pecorelli, F., Catolino, G., Ferrucci, F., De Lucia, A., and Palomba, F. (2020a). Testing of mobile applications in the wild: A large-scale empirical study on android apps. In Proceedings of the 28th International Conference on Program Comprehension, page 296–307, New York, NY, USA. ACM.
Pecorelli, F., Di Lillo, G., Palomba, F., and De Lucia, A. (2020b). Vitrum: A plug-in for the visualization of test-related metrics. In Proceedings of the International Conference on Advanced Visual Interfaces, New York, NY, USA. ACM.
Peruma, A., Almalki, K., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2019). On the distribution of test smells in open source android applications: An exploratory study. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, page 193–202, USA. IBM Corp.
Peruma, A., Almalki, K., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2020a). Tsdetect: An open source test smells detection tool. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, page 1650–1654, NY, USA. ACM.
Peruma, A., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2020b). An exploratory study on the refactoring of unit test files in android applications. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pages 350–357, New York, NY, USA. IEEE.
Peruma, A., Simmons, S., AlOmar, E. A., Newman, C. D., Mkaouer, M. W., and Ouni, A. (2022). How do i refactor this? an empirical study on refactoring trends and topics in stack overflow. Empirical Software Engineering, 27(1):1–43.
Silva Junior, N., Soares, L. R., Martins, L., and Machado, I. (2020). A survey on test practitioners’ awareness of test smells. CoRR, abs/2003.05613:1–14.
Soares, E., Ribeiro, M., Amaral, G., Gheyi, R., Fernandes, L., Garcia, A., Fonseca, B., and Santos, A. (2020). Refactoring test smells: A perspective from open-source developers. In Proceedings of the 5th Brazilian Symposium on Systematic and Automated Software Testing, SAST 20, page 50–59, New York, NY, USA. ACM.
Soares, E., Ribeiro, M., Gheyi, R., Amaral, G., and Santos, A. M. (2022). Refactoring test smells with junit 5: Why should developers keep up-to-date. IEEE Transactions on Software Engineering, pages 1–1.
Spadini, D., Schvarcbacher, M., Oprescu, A.-M., Bruntink, M., and Bacchelli, A. (2020). Investigating severity thresholds for test smells. In Proceedings of the 17th International Conference on Mining Software Repositories, page 311–321, NY, USA. ACM.
Terragni, V., Salza, P., and Pezze, M. (2020). Measuring software testability modulo test quality. In Proceedings of the 28th International Conference on Program Comprehension, page 241–251, New York, NY, USA. ACM.
Tsantalis, N., Guana, V., Stroulia, E., and Hindle, A. (2013). A multidimensional empirical study on refactoring activity. In Proceedings of the 2013 Conf. of the Center for Advanced Studies on Collaborative Research, page 132–146, USA. IBM Corp.
Tufano, M., Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A., and Poshyvanyk, D. (2016). An empirical investigation into the nature of test smells. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, page 4–15, New York, NY, USA. ACM.
Tufano, M., Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A., and Poshyvanyk, D. (2017). There and back again: Can you compile that snapshot? Journal of Software: Evolution and Process, 29(4):e1838.
Aniche, M., Maziero, E., Durelli, R., and Durelli, V. H. S. (2022). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, 48(4):1432–1450.
Bavota, G., Qusef, A., Oliveto, R., De Lucia, A., and Binkley, D. (2012). An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In 2012 28th IEEE International Conference on Software Maintenance, pages 56–65, New York, NY, USA. IEEE.
Bavota, G., Qusef, A., Oliveto, R., Lucia, A., and Binkley, D. (2015). Are test smells really harmful? an empirical study. Empirical Software Engineering, 20(4):1052–1094.
Bell, J., Legunsen, O., Hilton, M., Eloussi, L., Yung, T., and Marinov, D. (2018). Deflaker: Automatically detecting flaky tests. In Proceedings of the 40th International Conference on Software Engineering, page 433–444, New York, NY, USA. ACM.
Campos, D., Rocha, L., and Machado, I. (2021). Developers perception on the severity of test smells: an empirical study. In Iberoamerican Conference on Software Engineering, pages 1–14, Costa Rica. arxiv.
Deursen, A., Moonen, L. M., Bergh, A., and Kok, G. (2001). Refactoring test code. Technical report, Centre for Mathematics and Computer Science, NLD.
Fowler, M. (1999). Refactoring - Improving the Design of Existing Code. Addison Wesley object technology series. Addison-Wesley, Upper Saddle River, NJ.
Garousi, V. and Küçük, B. (2018). Smells in software test code: A survey of knowledge in industry and academia. Journal of Systems and Software, 138:52–81.
Hadj-Kacem, M. and Bouassida, N. (2021). A multi-label classification approach for detecting test smells over java projects. Journal of King Saud University-Computer and Information Sciences.
Kim, D. J., Chen, T.-H. P., and Yang, J. (2021). The secret life of test smells-an empirical study on test smell evolution and maintenance. Empirical Software Engineering, 26(5):1–47.
Martins, L., Bezerra, C., Costa, H., and Machado, I. (2021a). Smart prediction for refactorings in the software test code. In Brazilian Symposium on Software Engineering, page 115–120, New York, NY, USA. ACM.
Martins, L., Brito, V., Feitosa, D., Rocha, L., Costa, H., and Machado, I. (2021b). From blackboard to the office: A look into how practitioners perceive software testing education. In Evaluation and Assessment in Software Engineering, page 211–220, New York, NY, USA. ACM.
Martins, L., Costa, H., and Machado, I. (2023a). On the diffusion of test smells and their relationship with test code quality of java projects. Journal of Software: Evolution and Process, page e2532.
Martins, L., Costa, H., Ribeiro, M., Palomba, F., and Machado, I. (2023b). Automating test-specific refactoring mining: A mixed-method investigation. In Proceedings of the 23rd IEEE Int.l Working Conf. on Source Code Analysis and Manipulation, page 12, Los Alamitos, CA, USA. IEEE Computer Society.
Martins, L., Ghaleb, T., Costa, H., and Machado, I. (2023c). Curated dataset of test-specific refactorings. Available at [link].
Martins, L., Ghaleb, T., Costa, H., and Machado, I. (2023d). Tsr-catalog: The catalog of test smells refactorings. Available at [link].
Martins, L., Ghaleb, T. A., Costa, H., and Machado, I. (2024a). A comprehensive catalog of refactoring strategies to handle test smells in java-based systems. Software Quality Journal, 32(2):641–679.
Martins, L., Pontillo, V., Costa, H., Ferrucci, F., Palomba, F., and Machado, I. (2024b). Toward classifying test refactoring opportunities using supervised machine learning: How far can we go? — online appendix. Available at [link].
Martins, L., Pontillo, V., Costa, H., Ferrucci, F., Palomba, F., and Machado, I. (2025). Test code refactoring unveiled: where and how does it affect test code quality and effectiveness? Empirical Software Engineering, 30(1):1–39.
Martins, L., Pontilo, V., Costa, H., Palomba, F., and Machado, I. (2023e). Online appendix — Test code refactoring unveiled: where and how does it affect test code quality and effectiveness? Available at [link].
Melo, S., Moreira, V., Paschoal, L. N., and Souza, S. (2020). Testing education: A survey on a global scale. In In 34th Brazilian Symposium on Software Engineering, page 554–563, New York, NY, USA. ACM.
Meszaros, G. (2007). xUnit test patterns: Refactoring test code. Addison-Wesley Signature Series. Addison-Wesley, Upper Saddle River, NJ.
Nelder, J. A. and Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3):370–384.
Palomba, F., Di Nucci, D., Panichella, A., Oliveto, R., and De Lucia, A. (2016). On the diffusion of test smells in automatically generated test code: An empirical study. In Proceedings of the 9th International Workshop on Search-Based Software Testing, page 5–14, New York, NY, USA. ACM.
Pecorelli, F., Catolino, G., Ferrucci, F., De Lucia, A., and Palomba, F. (2020a). Testing of mobile applications in the wild: A large-scale empirical study on android apps. In Proceedings of the 28th International Conference on Program Comprehension, page 296–307, New York, NY, USA. ACM.
Pecorelli, F., Di Lillo, G., Palomba, F., and De Lucia, A. (2020b). Vitrum: A plug-in for the visualization of test-related metrics. In Proceedings of the International Conference on Advanced Visual Interfaces, New York, NY, USA. ACM.
Peruma, A., Almalki, K., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2019). On the distribution of test smells in open source android applications: An exploratory study. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, page 193–202, USA. IBM Corp.
Peruma, A., Almalki, K., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2020a). Tsdetect: An open source test smells detection tool. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, page 1650–1654, NY, USA. ACM.
Peruma, A., Newman, C. D., Mkaouer, M. W., Ouni, A., and Palomba, F. (2020b). An exploratory study on the refactoring of unit test files in android applications. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pages 350–357, New York, NY, USA. IEEE.
Peruma, A., Simmons, S., AlOmar, E. A., Newman, C. D., Mkaouer, M. W., and Ouni, A. (2022). How do i refactor this? an empirical study on refactoring trends and topics in stack overflow. Empirical Software Engineering, 27(1):1–43.
Silva Junior, N., Soares, L. R., Martins, L., and Machado, I. (2020). A survey on test practitioners’ awareness of test smells. CoRR, abs/2003.05613:1–14.
Soares, E., Ribeiro, M., Amaral, G., Gheyi, R., Fernandes, L., Garcia, A., Fonseca, B., and Santos, A. (2020). Refactoring test smells: A perspective from open-source developers. In Proceedings of the 5th Brazilian Symposium on Systematic and Automated Software Testing, SAST 20, page 50–59, New York, NY, USA. ACM.
Soares, E., Ribeiro, M., Gheyi, R., Amaral, G., and Santos, A. M. (2022). Refactoring test smells with junit 5: Why should developers keep up-to-date. IEEE Transactions on Software Engineering, pages 1–1.
Spadini, D., Schvarcbacher, M., Oprescu, A.-M., Bruntink, M., and Bacchelli, A. (2020). Investigating severity thresholds for test smells. In Proceedings of the 17th International Conference on Mining Software Repositories, page 311–321, NY, USA. ACM.
Terragni, V., Salza, P., and Pezze, M. (2020). Measuring software testability modulo test quality. In Proceedings of the 28th International Conference on Program Comprehension, page 241–251, New York, NY, USA. ACM.
Tsantalis, N., Guana, V., Stroulia, E., and Hindle, A. (2013). A multidimensional empirical study on refactoring activity. In Proceedings of the 2013 Conf. of the Center for Advanced Studies on Collaborative Research, page 132–146, USA. IBM Corp.
Tufano, M., Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A., and Poshyvanyk, D. (2016). An empirical investigation into the nature of test smells. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, page 4–15, New York, NY, USA. ACM.
Tufano, M., Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A., and Poshyvanyk, D. (2017). There and back again: Can you compile that snapshot? Journal of Software: Evolution and Process, 29(4):e1838.
Published
2025-09-22
How to Cite
MARTINS, Luana; COSTA, Heitor; PALOMBA, Fabio; MACHADO, Ivan.
Smart Prediction for Test Smell Refactorings. In: SOFTWARE ENGINEERING DOCTORAL AND MASTER THESES COMPETITION (DOCTORAL) - BRAZILIAN CONFERENCE ON SOFTWARE: THEORY AND PRACTICE (CBSOFT), 16. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 37-51.
DOI: https://doi.org/10.5753/cbsoft_estendido.2025.12026.
