An Evaluation of Ranking-to-Learn Approaches for Test Case Prioritization in Continuous Integration
Keywords:Test Case Prioritization, Continuous Integration environments, Ranking-to-Learn
Continuous Integration (CI) environments is a practice adopted by most organizations that allows frequent integration of software changes, making software evolution more rapid and cost-effective. Such environments require dynamic Test Case Prioritization (TCP) approaches that adapt better to the test budgets and frequent addition/removal of test cases. In this sense, Ranking-to-Learn approaches have been proposed and are more suitable for CI constraints. By observing past prioritizations and guided by reward functions, they learn the best prioritization for a given commit. In order to contribute for improvements and direct future research, this work evaluates how far the solutions produced by these approaches are from optimal solutions produced by a deterministic approach (ground truth). To this end, we consider two learning-based approaches i) RETECS, which is based on Reinforcement Learning; and ii) COLEMAN, an approach based on Multi-Armed Bandit. The evaluation was conducted with twelve systems, three test budgets, two reward functions, and six measures concerning fault detection effectiveness, early fault detection, test time reduction in the CI cycles, prioritization time, and accuracy. Our findings have some implications for the approaches application and reward function choice. The approaches are applicable in real scenarios and produce solutions very close to the optimal ones, respectively, in 92% and 75% of the cases. Both approaches have some limitations to learn with few historical test data (a small number of CI Cycles) and deal with a large test case set, in which many failures are distributed over many test cases.
Bajaj, A. and Sangwan, O. P. (2019). A Systematic Literature Review of Test Case Prioritization Using Genetic Algorithms. IEEE Access, 7:126355–126375.
Basili, V. R., Caldiera, G., and Rombach, H. D. (1994). The goal question metric approach. Encyclopedia of software engineering, 2(1994):528–532.
Bertolino, A., Guerriero, A., Breno Miranda, R. P., and Russo, S. (2020). Learning-to-rank vs ranking-to-learn: Strategies for regression testing in continuous integration. In 42nd International Conference on Software Engineering, ICSE’20, pages 1–12, New York, NY, USA. ACM.
Busjaeger, B. and Xie, T. (2016). Learning for Test Prioritization: An Industrial Case Study. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 975–980, New York, NY, USA. ACM.
Di Nucci, D., Panichella, A., Zaidman, A., and De Lucia, A. (2018). A Test Case Prioritization Genetic Algorithm guided by the Hypervolume Indicator. IEEE Transactions on Software Engineering.
Duvall, P., Matyas, S., and Glover, A. (2007). Continuous Integration: Improving Software Quality and Reducing Risk. Addison-Wesley.
Elbaum, S., Malishevsky, A., and Rothermel, G. (2001). Incorporating varying test costs and fault severities into test case prioritization. In Proceedings of the 23rd International Conference on Software Engineering, pages 329–338.
Elbaum, S., McLaughlin, A., and Penix, J. (2014). The Google Dataset of Testing Results.
Epitropakis, M., Yoo, S., Harman, M., and Burke, E. (2015). Empirical evaluation of pareto eﬃcient multi-objective regression test case prioritisation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA, pages 234–245, New York, NY, USA. ACM.
Felidre, W., Furtado, L., Costa, D., Cartaxo, B., and Pinto, G. (2019). Continuous integration theater. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–10, Los Alamitos, CA, USA. IEEE Computer Society.
Fowler, M. (2006). Continuous Integration. [link].
Haghighatkhah, A., Mäntylä, M., Oivo, M., and Kuvaja, P. (2018). Test prioritization in continuous integration environments. Journal of Systems and Software, 146:80–98.
Hilton, M. (2016). Understanding and improving continuous integration. In 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 1066–1067, New York, NY, USA. ACM.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735–1780.
Khatibsyarbini, M., Isa, M. A., Jawawi, D. N. A., and Tumeng, R. (2018). Test case prioritization approaches in regression testing: A systematic literature review. Information and Software Technology, 93:74–93.
Kruskal, W. H. and Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47(260):583–621.
Kuleshov, V. and Precup, D. (2014). Algorithms for multi-armed bandit problems. Journal of Machine Learning Research, 1:1–48.
Li, K., Fialho, A., Kwong, S., and Zhang, Q. (2014). Adaptive operator selection with bandits for a multi-objective evolutionary algorithm based on decomposition. Evolutionary Computation, IEEE Transactions on, 18(1):114–130.
Li, Z., Harman, M., and Hierons, R. M. (2007). Search Algorithms for Regression Test Case Prioritization. IEEE Transactions on Software Engineering, 33(4):225–237.
Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50–60.
Marijan, D. (2015). Multi-perspective Regression Test Prioritization for Time-Constrained Environments. In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security, QRS’15, pages 157–162, Washington, DC, USA. IEEE Computer Society.
Marijan, D., Gotlieb, A., and Liaaen, M. (2019). A learning algorithm for optimizing continuous integration development and testing practice. Software: Practice and Experience, 49(2):192–213.
Marijan, D., Gotlieb, A., and Sen, S. (2013). Test Case Prioritization for Continuous Regression Testing: An Industrial Case Study. In IEEE International Conference on Software Maintenance, pages 540–543. IEEE.
Marijan, D., Liaaen, M., Gotlieb, A., Sen, S., and Ieva, C. (2017). TITAN: Test Suite Optimization for Highly Conﬁgurable Software. In Proceedings of the IEEE International Conference on Software Testing, Veriﬁcation and Validation, ICST, pages 524–531. IEEE.
Prado Lima, J. A. and Vergilio, S. R. (2020a). A multi-armed bandit approach for test case prioritization in continuous integration environments. IEEE Transactions on Software Engineering, page 12.
Prado Lima, J. A. and Vergilio, S. R. (2020b). Multi-armed bandit test case prioritization in continuous integration environments: A trade-oﬀ analysis. In Proceedings of the 5th Brazilian Symposium on Systematic and Automated Software Testing, pages 21–30, New York, NY, USA. Association for Computing Machinery.
Prado Lima, J. A. and Vergilio, S. R. (2020c). Test Case Prioritization in Continuous Integration Environments: A Systematic Mapping Study. Information and Software Technology.
Prado Lima, J. A. and Vergilio, S. R. (2021). Supplementary Material - An Evaluation of Ranking-to-Learn Approaches for Test Case Prioritization in Continuous Integration. URL [link].
Qu, X., Cohen, M. B., and Woolf, K. M. (2007). Combinatorial Interaction Regression Testing: A Study of Test Case Generation and Prioritization. In IEEE International Conference on Software Maintenance, pages 255–264.
Robbins, H. (1985). Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer.
Rothermel, G., Untch, R. H., Chu, C., and Harrold, M. J. (1999). Test Case Prioritization: An Empirical Study. In Proceedings of the IEEE International Conference on Software Maintenance, ICSM ’99, pages 179–188. IEEE Computer Society.
Spieker, H., Gotlieb, A., Marijan, D., and Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017, pages 12–22, New York, NY, USA. ACM.
Vargha, A. and Delaney, H. D. (2000). A Critique and Improvement of the CL Common Language Eﬀect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2):101–132.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2000). Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers.
Xiao, L., Miao, H., and Zhong, Y. (2018). Test case prioritization and selection technique in continuous integration development environments: a case study. International Journal of Engineering & Technology, 7(2.28):332–336.
Yoo, S. and Harman, M. (2012). Regression Testing Minimization, Selection and Prioritization: A Survey. Software Testing, Veriﬁcation & Reliability, 22(2):67–120.
Yu, Z., Fahid, F., Menzies, T., Rothermel, G., Patrick, K., and Cherian, S. (2019). TERMINATOR: better automated UI test case prioritization. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, FSE, pages 883–894. ACM.
How to Cite
Copyright (c) 2023 Jackson Antônio do Prado Lima, Silvia Regina Vergilio
This work is licensed under a Creative Commons Attribution 4.0 International License.