A Method for Regression Testing Plan Ordering for Non-Automated Executions in Black Box Testing

Vinicius Hernandes; André Carvalho; Eulanda Santos; Yan Soares; Hygo Oliveira; Adamor Barros; Ronaldo Soares; Alexandre Lima; Raoni Ferreira; Gabriel Martins; Lucas Carvalho; Nicolas Assumpção; José Nascimento; Eliane Collins; Silvia Ascate; Mateus Souza

doi:10.5753/cibse.2025.35296

Vinicius Hernandes UFAM
André Carvalho UFAM
Eulanda Santos UFAM
Yan Soares UFAM
Hygo Oliveira UFAM
Adamor Barros UFAM
Ronaldo Soares UFAM
Alexandre Lima UFAM
Raoni Ferreira UFAC
Gabriel Martins UFAC
Lucas Carvalho UFAC
Nicolas Assumpção Motorola Mobility LLC
José Nascimento Motorola Mobility LLC
Eliane Collins INDT
Silvia Ascate INDT
Mateus Souza INDT

DOI: https://doi.org/10.5753/cibse.2025.35296

Resumo

In this paper, we propose a method for prioritizing regression test cases based on the probability of detecting software execution failures without source code analysis. To achieve this, our method employs the SentenceBERT model to extract embeddings from textual information of development commits and test scripts. These embeddings are then used by machine learning models to predict the probability of detecting a failure. Our experiments show that the proposed method achieves results equal to or better than those of human experts in 92.52% to 94.24% of scenarios when evaluating the APFD (Average Percentage Faults Detected) metric, an overall gain of 10% in APFD mean and a potential gain of up to 6.03% in test plan prioritization counting cases.

Palavras-chave: Regression Test, Test Plan, APFD, Functionality Test, Black-Box Testing

Referências

Al-Sabbagh, K., Staron, M., Hebig, R., and Gomes, F. (2021). A classification of codechanges and test types dependencies for improving machine learning based test selection. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, pages 40–49.

An, G. and Yoo, S. (2022). Fdg: a precise measurement of fault diagnosability gain of test cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 14–26.

Batista, G., Silva, D. F., et al. (2009). How k-nearest neighbor parameters affect its performance. In Argentine symposium on artificial intelligence, pages 1–12. Citeseer.

Breiman, L. (2001). Random forests. Machine learning, 45:5–32.

Brzezinski, J. R. and Knafl, G. J. (1999). Logistic regression modeling for context-based classification. In Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99, pages 755–759. IEEE.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.

Mafra, J., Miranda, B., Iyoda, J., and Sampaio, A. (2009). Test case selector: Uma ferramenta para seleção de testes. Proceedings of SBMF/SAST.

Mehta, S., Farmahinifarahani, F., Bhagwan, R., Guptha, S., Jafari, S., Kumar, R., Saini, V., and Santhiar, A. (2021). Data-driven test selection at scale. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1225–1235.

Murtagh, F. (1991). Multilayer perceptrons for classification and regression. Neurocomputing, 2(5-6):183–197.

Omri, S. and Sinz, C. (2022). Learning to rank for test case prioritization. In Proceedings of the 15th Workshop on Search-Based Software Testing, pages 16–24.

Palma, F., Abdou, T., Bener, A., Maidens, J., and Liu, S. (2018). An improvement to test case failure prediction in the context of test case prioritization. In Proceedings of the 14th international conference on predictive models and data analytics in software engineering, pages 80–89.

Pan, C., Yang, Y., Li, Z., and Guo, J. (2020). Dynamic time window based reward for reinforcement learning in continuous integration testing. In Proceedings of the 12th Asia-Pacific Symposium on Internetware, pages 189–198.

Pradeepa, R. and VimalDevi, K. (2013). Effectiveness of testcase prioritization using apfd metric: Survey. In International Conference on Research Trends in Computer Technologies (ICRTCT—2013). Proceedings published in International Journal of Computer Applications®(IJCA), pages 0975–8887.

Ramírez, A., Feldt, R., and Romero, J. R. (2023). A taxonomy of information attributes for test case prioritisation: Applicability, machine learning. ACM Transactions on Software Engineering and Methodology, 32(1):1–42.

Reimers, N. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.

Rizwan, S., Ali Sobuj, M. S., and Akhond, M. R. (2022). A survey on software test case minimization. In Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, pages 679–684.

Shankar, R. and Sridhar, D. (2024). An improved deep learning based test case prioritization using deep reinforcement learning. International Journal of Intelligent Engineering & Systems, 17(1).

Spieker, H., Gotlieb, A., Marijan, D., and Mossige, M. (2017). Reinforcement learning for automatic test case prioritization and selection in continuous integration. In Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, pages 12–22.

Tahvili, S., Hatvani, L., Felderer, M., Afzal, W., Saadatmand, M., and Bohlin, M. (2018). Cluster-based test scheduling strategies using semantic relationships between test specifications. In Proceedings of the 5th International Workshop on Requirements Engineering and Testing, pages 1–4.

Wu, Z., Yang, Y., Li, Z., and Zhao, R. (2019). A time window based reinforcement learning reward for test case prioritization in continuous integration. In Proceedings of the 11th Asia-Pacific Symposium on Internetware, pages 1–6.

Zhang, J., Liu, Y., Gligoric, M., Legunsen, O., and Shi, A. (2022). Comparing and combining analysis-based and learning-based regression test selection. In Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, pages 17–28.