On the Implementation of OS-Specific Tests: The CPython Case
Abstract
Modern software systems are frequently developed and tested across multiple platforms (e.g., Windows, Linux, and macOS). In the software testing context, practitioners adapt the tests to run differently according to the target platform. These tests, which need to identify the platform on which they will be executed, are referred to as OS-specific tests. In this paper, we present an empirical study to evaluate how developers implement OS-specific tests in CPython, which is the reference implementation project for the Python programming language. Then, we mine this project and assess their OS-specific tests quantitatively. For this, we propose three research questions to assess the frequency, location, and issues related to OS-specific tests. Our results show that OS-specific tests are common in the CPython project, and 13% of the analyzed test files are OS-specific tests (RQ1). OS Identification APIs are used more frequently in test code (53.46%), and the test decorator @unittest.skipUnless is the most used to skip tests depending on the platform (RQ2).We also find 170 issues related to OS-specific tests in CPython, and Windows is the most targeted platform (RQ3). Lastly, we discussed practical implications for practitioners and researchers. Based on our findings, we emphasized the importance of testing across multiple platforms and examined the relationship between issues and OS-specific tests, among other insights.
Keywords:
software testing, mining software repositories, test smells, Python
References
Vincent Aranega, Julien Delplanque, Matias Martinez, Andrew P. Black, Stéphane Ducasse, Anne Etien, Christopher Fuhrman, and Guillermo Polito. 2021. Rotten green tests in Java, Pharo and Python. Empirical Software Engineering 26, 6 (2021), 130.
Lívia Barbosa and Andre Hora. 2022. How and why developers migrate Python tests. In International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 538–548.
Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and David Binkley. 2012. An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In International Conference on Software Maintenance (ICSM). 56–65.
Alexandru Bodea. 2022. Pytest-Smell: A Smell Detection Tool for Python Unit Tests. In International Symposium on Software Testing and Analysis. ACM, 793–796.
Barisha Chowdhury, Md Fazle Rabbi, S. M. Mahedy Hasan, and Minhaz F. Zibran. 2025. Insights into Dependency Maintenance Trends in the Maven Ecosystem . In International Conference on Mining Software Repositories (MSR). 280–284.
CVE - Common Vulnerabilities and Exposures. June, 2025. [link].
Julien Delplanque, Stéphane Ducasse, Guillermo Polito, Andrew P. Black, and Anne Etien. 2019. Rotten Green Tests. In International Conference on Software Engineering (ICSE). IEEE, 500–511.
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer’s Perspective. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 830–840.
GitHub-hosted runners. June, 2025. [link].
CPython Glossary. June, 2025. [link].
Negar Hashemi, Amjed Tahir, and Shawn Rasheed. 2022. An Empirical Study of Flaky Tests in JavaScript. In International Conference on Software Maintenance and Evolution (ICSME). IEEE, 24–34.
Andre Hora. 2023. Excluding code from test coverage: practices, motivations, and impact. Empirical Software Engineering 28, 1 (2023), 1–33.
Andre Hora, Romain Robbes, Marco Tulio Valente, Nicolas Anquetil, Anne Etien, and Stephane Ducasse. 2018. How do Developers React to API Evolution? A Large-Scale Empirical Study. Software Quality Journal 26, 1 (2018), 161–191.
Ricardo Job and Andre Hora. 2024. Availability and Usage of Platform-Specific APIs: A First Empirical Study. In International Conference on Mining Software Repositories. 27–31.
Ricardo Job and Andre Hora. 2024. How and Why Developers Implement OSSpecific Tests. Empirical Software Engineering 30 (2024), 33.
Ricardo Job and Andre Hora. 2025. OSTDetector: An automated tool for extracting OS-specific Tests from Git repositories written in Python. DOI: 10.5281/zenodo.10120045
Ricardo Job and Andre Hora. July, 2025. On the Implementation of OS-Specific Tests: The CPython Case. DOI: 10.5281/zenodo.15794483
Maxime Lamothe, Yann-Gaël Guéhéneuc, and Weiyi Shang. 2021. A Systematic Review of API Evolution Literature. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–36.
Can Li, Jingxuan Zhang, Yixuan Tang, Zhuhang Li, and Tianyue Sun. 2024. Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources. In International Conference on Mining Software Repositories. 14–26.
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering. ACM, 643–653.
Matias Martinez and Bruno Gois Mateus. 2022. Why Did Developers Migrate Android Applications From Java to Kotlin? IEEE Transactions on Software Engineering 48 (2022), 4521–4534.
Gerard Meszaros. 2007. xUnit test patterns: Refactoring test code. Pearson Education.
Costain Nachuma, Md Mosharaf Hossan, Asif K. Turzo, and Minhaz F. Zibran. 2025. Decoding Dependency Risks: A Quantitative Study of Vulnerabilities in the Maven Ecosystem . In International Conference on Mining Software Repositories (MSR). 270–274.
Romulo Nascimento, Eduardo Figueiredo, and Andre Hora. 2021. JavaScript API Deprecation Landscape: A Survey and Mining Study. IEEE Software 39, 3 (2021), 96–105.
National Institute of Standards and Technology. June, 2025. [link].
Fabio Palomba, Dario Di Nucci, Annibale Panichella, Rocco Oliveto, and Andrea De Lucia. 2016. On the Diffusion of Test Smells in Automatically Generated Test Code: An Empirical Study. In International Workshop on Search-Based Software Testing. ACM, 5–14.
Anthony Peruma, Khalid Almalki, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2019. On the distribution of test smells in open source Android applications: an exploratory study. In International Conference on Computer Science and Software Engineering. IBM Corp., 193–202.
Anthony Peruma, Khalid Almalki, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2020. TsDetect: An Open Source Test Smells Detection Tool. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 1650–1654.
Piotr Przymus, Mikolaj Fejzer, Jakub Narebski, Krzysztof Rykaczewski, and Krzysztof Stencel. 2025. Out of Sight, Still at Risk: The Lifecycle of Transitive Vulnerabilities in Maven . In International Conference on Mining Software Repositories (MSR). 329–333.
Pytest. June, 2025. [link].
Md Shafiullah Shafin, Md Fazle Rabbi, S. M. Mahedy Hasan, and Minhaz F. Zibran. 2025. Faster Releases, Fewer Risks: A Study on Maven Artifact Vulnerabilities and Lifecycle Management . In International Conference on Mining Software Repositories (MSR). 275–279.
Mehedi Hasan Shanto, Muhammad Asaduzzaman, Manishankar Mondal, and Shaiful Chowdhury. 2025. Dependency Dilemmas: A Comparative Study of Independent and Dependent Artifacts in Maven Central Ecosystem . In International Conference on Mining Software Repositories (MSR). 304–308.
Mina Shehata, Saidmakhmud Makhkamjonoov, Mahad Syed, and Esteban Parra. 2025. Cascading Effects: Analyzing Project Failure Impact in the Maven Central Ecosystem . In International Conference on Mining Software Repositories (MSR). 309–313.
CPython source. June, 2025. [link].
Unittest. June, 2025. [link].
Tongjie Wang, Yaroslav Golubev, Oleg Smirnov, Jiawei Li, Timofey Bryksin, and Iftekhar Ahmed. 2021. PyNose: A Test Smell Detector For Python. In International Conference on Automated Software Engineering (ASE). IEEE, 593–605.
Hao Xia, Yuan Zhang, Yingtian Zhou, Xiaoting Chen, YangWang, Xiangyu Zhang, Shuaishuai Cui, Geng Hong, Xiaohan Zhang, Min Yang, et al. 2020. How Android developers handle evolution-induced API compatibility issues: a large-scale study. In International Conference on Software Engineering. 886–898.
Lívia Barbosa and Andre Hora. 2022. How and why developers migrate Python tests. In International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 538–548.
Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and David Binkley. 2012. An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In International Conference on Software Maintenance (ICSM). 56–65.
Alexandru Bodea. 2022. Pytest-Smell: A Smell Detection Tool for Python Unit Tests. In International Symposium on Software Testing and Analysis. ACM, 793–796.
Barisha Chowdhury, Md Fazle Rabbi, S. M. Mahedy Hasan, and Minhaz F. Zibran. 2025. Insights into Dependency Maintenance Trends in the Maven Ecosystem . In International Conference on Mining Software Repositories (MSR). 280–284.
CVE - Common Vulnerabilities and Exposures. June, 2025. [link].
Julien Delplanque, Stéphane Ducasse, Guillermo Polito, Andrew P. Black, and Anne Etien. 2019. Rotten Green Tests. In International Conference on Software Engineering (ICSE). IEEE, 500–511.
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer’s Perspective. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 830–840.
GitHub-hosted runners. June, 2025. [link].
CPython Glossary. June, 2025. [link].
Negar Hashemi, Amjed Tahir, and Shawn Rasheed. 2022. An Empirical Study of Flaky Tests in JavaScript. In International Conference on Software Maintenance and Evolution (ICSME). IEEE, 24–34.
Andre Hora. 2023. Excluding code from test coverage: practices, motivations, and impact. Empirical Software Engineering 28, 1 (2023), 1–33.
Andre Hora, Romain Robbes, Marco Tulio Valente, Nicolas Anquetil, Anne Etien, and Stephane Ducasse. 2018. How do Developers React to API Evolution? A Large-Scale Empirical Study. Software Quality Journal 26, 1 (2018), 161–191.
Ricardo Job and Andre Hora. 2024. Availability and Usage of Platform-Specific APIs: A First Empirical Study. In International Conference on Mining Software Repositories. 27–31.
Ricardo Job and Andre Hora. 2024. How and Why Developers Implement OSSpecific Tests. Empirical Software Engineering 30 (2024), 33.
Ricardo Job and Andre Hora. 2025. OSTDetector: An automated tool for extracting OS-specific Tests from Git repositories written in Python. DOI: 10.5281/zenodo.10120045
Ricardo Job and Andre Hora. July, 2025. On the Implementation of OS-Specific Tests: The CPython Case. DOI: 10.5281/zenodo.15794483
Maxime Lamothe, Yann-Gaël Guéhéneuc, and Weiyi Shang. 2021. A Systematic Review of API Evolution Literature. ACM Computing Surveys (CSUR) 54, 8 (2021), 1–36.
Can Li, Jingxuan Zhang, Yixuan Tang, Zhuhang Li, and Tianyue Sun. 2024. Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources. In International Conference on Mining Software Repositories. 14–26.
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In International Symposium on Foundations of Software Engineering. ACM, 643–653.
Matias Martinez and Bruno Gois Mateus. 2022. Why Did Developers Migrate Android Applications From Java to Kotlin? IEEE Transactions on Software Engineering 48 (2022), 4521–4534.
Gerard Meszaros. 2007. xUnit test patterns: Refactoring test code. Pearson Education.
Costain Nachuma, Md Mosharaf Hossan, Asif K. Turzo, and Minhaz F. Zibran. 2025. Decoding Dependency Risks: A Quantitative Study of Vulnerabilities in the Maven Ecosystem . In International Conference on Mining Software Repositories (MSR). 270–274.
Romulo Nascimento, Eduardo Figueiredo, and Andre Hora. 2021. JavaScript API Deprecation Landscape: A Survey and Mining Study. IEEE Software 39, 3 (2021), 96–105.
National Institute of Standards and Technology. June, 2025. [link].
Fabio Palomba, Dario Di Nucci, Annibale Panichella, Rocco Oliveto, and Andrea De Lucia. 2016. On the Diffusion of Test Smells in Automatically Generated Test Code: An Empirical Study. In International Workshop on Search-Based Software Testing. ACM, 5–14.
Anthony Peruma, Khalid Almalki, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2019. On the distribution of test smells in open source Android applications: an exploratory study. In International Conference on Computer Science and Software Engineering. IBM Corp., 193–202.
Anthony Peruma, Khalid Almalki, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2020. TsDetect: An Open Source Test Smells Detection Tool. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 1650–1654.
Piotr Przymus, Mikolaj Fejzer, Jakub Narebski, Krzysztof Rykaczewski, and Krzysztof Stencel. 2025. Out of Sight, Still at Risk: The Lifecycle of Transitive Vulnerabilities in Maven . In International Conference on Mining Software Repositories (MSR). 329–333.
Pytest. June, 2025. [link].
Md Shafiullah Shafin, Md Fazle Rabbi, S. M. Mahedy Hasan, and Minhaz F. Zibran. 2025. Faster Releases, Fewer Risks: A Study on Maven Artifact Vulnerabilities and Lifecycle Management . In International Conference on Mining Software Repositories (MSR). 275–279.
Mehedi Hasan Shanto, Muhammad Asaduzzaman, Manishankar Mondal, and Shaiful Chowdhury. 2025. Dependency Dilemmas: A Comparative Study of Independent and Dependent Artifacts in Maven Central Ecosystem . In International Conference on Mining Software Repositories (MSR). 304–308.
Mina Shehata, Saidmakhmud Makhkamjonoov, Mahad Syed, and Esteban Parra. 2025. Cascading Effects: Analyzing Project Failure Impact in the Maven Central Ecosystem . In International Conference on Mining Software Repositories (MSR). 309–313.
CPython source. June, 2025. [link].
Unittest. June, 2025. [link].
Tongjie Wang, Yaroslav Golubev, Oleg Smirnov, Jiawei Li, Timofey Bryksin, and Iftekhar Ahmed. 2021. PyNose: A Test Smell Detector For Python. In International Conference on Automated Software Engineering (ASE). IEEE, 593–605.
Hao Xia, Yuan Zhang, Yingtian Zhou, Xiaoting Chen, YangWang, Xiangyu Zhang, Shuaishuai Cui, Geng Hong, Xiaohan Zhang, Min Yang, et al. 2020. How Android developers handle evolution-induced API compatibility issues: a large-scale study. In International Conference on Software Engineering. 886–898.
Published
2025-09-22
How to Cite
JOB, Ricardo; HORA, Andre.
On the Implementation of OS-Specific Tests: The CPython Case. In: BRAZILIAN SYMPOSIUM ON SYSTEMATIC AND AUTOMATED SOFTWARE TESTING (SAST), 10. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 46-54.
DOI: https://doi.org/10.5753/sast.2025.13918.
