Property-based Testing for Machine Learning Models
Resumo
There has been a growing interest in machine learning due to its potential to address a myriad of problems that would otherwise be difficult to solve. Consequently, the adoption of machine learning based programs has become mainstream. Owing to this widespread adoption, it is imperative to develop automated approaches to assess the quality of machine learning-based solutions. Although significant research has been devoted to creating automated test input generation methods for machine learning programs, some promising approaches to test data generation have received limited attention. This paper introduces a property-driven approach to test data generation that leverages the training of an interpretable model, specifically a decision tree, to predict the behavior of the model under test. The tree-like structure of the resulting interpretable model provides valuable insights into the model’s behavior under test. These insights are then transformed into executable properties, enabling the generation of test data. A primary advantage of property-based testing is its capacity to generate a vast number of inputs from a single property, thereby offering a more rigorous evaluation of machine learning models. The results of our experiment suggest that our property-driven approach has the potential to generate test data that more thoroughly examine models compared to more widely used methods for evaluating the performance and generalizability of machine learning models.
Palavras-chave:
Software testing, property-based testing, machine learning
Referências
A. Aggarwal, P. Lohia, S. Nagar, K. Dey, and D. Saha. 2019. Black box fairness testing of machine learning models. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 625–635.
J. Bornholt, R. Joshi, V. Astrauskas, B. Cully, B. Kragl, S. Markle, K. Sauri, D. Schleit, G. Slatton, S. Tasiran, J. Van Geffen, and A. Warfield. 2021. Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). ACM, 836–850.
K. Claessen and J. Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00). ACM, 268–279.
A. L. Corgozinho, M. T. Valente, and H. Rocha. 2023. How Developers Implement Property-Based Tests. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 380–384.
R. A. Fisher. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 2 (1936), 179–188.
A. Geron. 2019. Hands-on Machine Learning with Scikit-Learn, Keras, and Tensor-Flow (2nd ed.). O’Reilly. 600 pages.
H. Goldstein, J. W. Cutler, D. Dickstein, B. C. Pierce, and A. Head. 2024. Property-Based Testing in Practice. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). ACM.
F. Hebert. 2019. Property-Based Testing with Proper, Erlang, and Elixir: Find Bugs Before Your Users Do. Pragmatic Bookshelf. 376 pages.
G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor. 2023. An Introduction to Statistical Learning: With Applications in Python. Springer. 607 pages.
A. Löscher and K. Sagonas. 2017. Targeted property-based testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 46–56.
D. R. MacIver, Z. Hatfield-Dodds, and Many Other Contributors. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software 4, 43 (2019).
C. Molnar. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Lulu.com. 318 pages.
C. Molnar, G. Casalicchio, and B. Bischl. 2020. Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges. In ECML PKDD Workshops. Springer, 417–431.
Glenford J. Myers, Corey Sandler, and Tom Badgett. 2011. The Art of Software Testing (3 ed.). Wiley Publishing.
S. Santos, B. Silveira, V. Durelli, R. Durelli, S. Souza, and M. Delamaro. 2021. On Using Decision Tree Coverage Criteria forTesting Machine Learning Models. In Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing (SAST ’21). ACM, 1–9.
A. Sharma, C. Demir, A. Ngonga Ngomo, and H. Wehrheim. 2021. MLCHECK – Property-Driven Testing of Machine Learning Classifiers. In 20th IEEE International Conference on Machine Learning and Applications (ICMLA). 738–745.
ScottW. VanderStoep and Deidre D. Johnson. 2008. Research Methods for Everyday Life: Blending Qualitative and Quantitative Approaches. Jossey-Bass. 352 pages.
S. Wang, L. Huang, A. Gao, J. Ge, T. Zhang, H. Feng, I. Satyarth, M. Li, H. Zhang, and V. Ng. 2023. Machine/Deep Learning for Software Engineering: A Systematic Literature Review. IEEE Transactions on Software Engineering 49, 3 (2023).
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. 2012. Experimentation in Software Engineering. Springer. 236 pages.
K. Yatoh, K. Sakamoto, F. Ishikawa, and S. Honiden. 2014. ArbitCheck: A Highly Automated Property-Based Testing Tool for Java. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. IEEE, 405–412.
J. M. Zhang, M. Harman, L. Ma, and Y. Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36.
J. Bornholt, R. Joshi, V. Astrauskas, B. Cully, B. Kragl, S. Markle, K. Sauri, D. Schleit, G. Slatton, S. Tasiran, J. Van Geffen, and A. Warfield. 2021. Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP ’21). ACM, 836–850.
K. Claessen and J. Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00). ACM, 268–279.
A. L. Corgozinho, M. T. Valente, and H. Rocha. 2023. How Developers Implement Property-Based Tests. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 380–384.
R. A. Fisher. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 2 (1936), 179–188.
A. Geron. 2019. Hands-on Machine Learning with Scikit-Learn, Keras, and Tensor-Flow (2nd ed.). O’Reilly. 600 pages.
H. Goldstein, J. W. Cutler, D. Dickstein, B. C. Pierce, and A. Head. 2024. Property-Based Testing in Practice. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). ACM.
F. Hebert. 2019. Property-Based Testing with Proper, Erlang, and Elixir: Find Bugs Before Your Users Do. Pragmatic Bookshelf. 376 pages.
G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor. 2023. An Introduction to Statistical Learning: With Applications in Python. Springer. 607 pages.
A. Löscher and K. Sagonas. 2017. Targeted property-based testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 46–56.
D. R. MacIver, Z. Hatfield-Dodds, and Many Other Contributors. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software 4, 43 (2019).
C. Molnar. 2020. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Lulu.com. 318 pages.
C. Molnar, G. Casalicchio, and B. Bischl. 2020. Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges. In ECML PKDD Workshops. Springer, 417–431.
Glenford J. Myers, Corey Sandler, and Tom Badgett. 2011. The Art of Software Testing (3 ed.). Wiley Publishing.
S. Santos, B. Silveira, V. Durelli, R. Durelli, S. Souza, and M. Delamaro. 2021. On Using Decision Tree Coverage Criteria forTesting Machine Learning Models. In Proceedings of the 6th Brazilian Symposium on Systematic and Automated Software Testing (SAST ’21). ACM, 1–9.
A. Sharma, C. Demir, A. Ngonga Ngomo, and H. Wehrheim. 2021. MLCHECK – Property-Driven Testing of Machine Learning Classifiers. In 20th IEEE International Conference on Machine Learning and Applications (ICMLA). 738–745.
ScottW. VanderStoep and Deidre D. Johnson. 2008. Research Methods for Everyday Life: Blending Qualitative and Quantitative Approaches. Jossey-Bass. 352 pages.
S. Wang, L. Huang, A. Gao, J. Ge, T. Zhang, H. Feng, I. Satyarth, M. Li, H. Zhang, and V. Ng. 2023. Machine/Deep Learning for Software Engineering: A Systematic Literature Review. IEEE Transactions on Software Engineering 49, 3 (2023).
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. 2012. Experimentation in Software Engineering. Springer. 236 pages.
K. Yatoh, K. Sakamoto, F. Ishikawa, and S. Honiden. 2014. ArbitCheck: A Highly Automated Property-Based Testing Tool for Java. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops. IEEE, 405–412.
J. M. Zhang, M. Harman, L. Ma, and Y. Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36.
Publicado
30/09/2024
Como Citar
DURELLI, Vinicius H. S.; MONTEIRO, Ricardo; DURELLI, Rafael S.; ENDO, Andre T.; FERRARI, Fabiano C.; SOUZA, Simone R. S..
Property-based Testing for Machine Learning Models. In: SIMPÓSIO BRASILEIRO DE TESTES DE SOFTWARE SISTEMÁTICO E AUTOMATIZADO (SAST), 9. , 2024, Curitiba/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 39-48.
DOI: https://doi.org/10.5753/sast.2024.3791.