An Empirical Study of Bugs in COVID-19 Software Projects
Keywords:bugs, categorization, coronavirus, covid-19, defects, empirical study, github, mining software repositories, software development
The dire consequences of the COVID-19 pandemic have influenced development of COVID-19 software i.e., software used for analysis and mitigation of COVID-19. Bugs in COVID-19 software can be consequential, as COVID-19 software projects can impact public health policy and user data privacy. The goal of this paper is to help practitioners and researchers improve the quality of COVID-19 software through an empirical study of open source software projects related to COVID-19. We use 129 open source COVID-19 software projects hosted on GitHub to conduct our empirical study. Next, we apply qualitative analysis on 550 bug reports from the collected projects to identify bug categories. We identify 8 bug categories, which include data bugs i.e., bugs that occur during mining and storage of COVID-19 data. The identified bug categories appear for 7 categories of software projects including (i) projects that use statistical modeling to perform predictions related to COVID-19, and (ii) medical equipment software that are used to design and implement medical equipment, such as ventilators. Based on our findings, we advocate for robust statistical model construction through better synergies between data science practitioners and public health experts. Existence of security bugs in user tracking software necessitates development of tools that will detect data privacy violations and security weaknesses.
abquirarte (2020). accessibility fixes. github.com/cagov/covid19/issues/137. [Online; accessed 10-May-2020].
Agrawal, A., Rahman, A., Krishna, R., Sobran, A., and Menzies, T. (2018). We don’t need another hero?: The impact of ”heroes” on software development. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP’18, pages 245–253, New York, NY, USA. ACM.
Alasdair Sandford (2020). Coronavirus: Half of humanity now on lockdown as 90 countries call for confinement. https://www.euronews.com/2020/04/02/. [Online; accessed 17-Apr-2020].
Anderson, S., Allen, P., Peckham, S., and Goodwin, N. (2008). Asking the right questions: scoping studies in the commissioning of research on the organization and delivery of health services. Health research policy and systems, 6(1):7.
Apple (2020). Privacy-preserving contact tracing. https://www.apple.com/covid19/contacttracing. [Online; accessed 25-May 2020].
Applifting (2020). pomuzeme.si. github.com/Applifting/pomuzeme.si [Online; accessed 09-May-2020].
Attia, P. (2020). Comparing covid-19 to past pandemics, preparing for the future, and reasons for optimism. https://peterattiamd.com/ameshadalja/. [Online; accessed 21-May-2020].
Begley, S. (2020a). Death rates should increase when ICU's are overwhelmed. https://github.com/neherlab/covid19_scenarios/issues/7. [Online; accessed 10-May-2020].
Begley, S. (2020b). Influential covid-19 model uses flawed methods and shouldn’t guide u.s. policies, critics say. https://www.statnews.com/2020/04/17/. [Online; accessed 10-May-2020].
boogheta (2020). boogheta/coronavirus-countries. https://github.com/boogheta/coronavirus-countries. [Online; accessed 09-May-2020].
Butler, J. L. and Jaffe, S. (2020). Challenges and gratitude: A diary study of software engineers working from home during covid-19 pandemic.
Catolino, G., Palomba, F., Zaidman, A., and Ferrucci, F. (2019). Not all bugs are the same: Understanding, characterizing, and classifying bug types. Journal of Systems and Software, 152:165 – 181.
CDC (2020). Cases, data, and surveillance. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/index.html [Online; accessed 09-May-2020].
Chen, D., Xu, W., Lei, Z., Huang, Z., Liu, J., Gao, Z., and Peng, L. (2020). Recurrence of positive sars-cov-2 rna in covid-19: A case report. International Journal of Infectious Diseases, 93:297 – 299.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
Corey, L., Mascola, J. R., Fauci, A. S., and Collins, F. S. (2020). A strategic approach to covid-19 vaccine r&d. Science.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to algorithms. MIT press.
Crabtree, B. F. and Miller, W. L. (1999). Doing qualitative research. sage publications.
Crowell Morning (2020). Mobile applications for covid tracking & tracing – balancing the need for personal information and privacy rights in the time of coronavirus. https://www.crowell.com/NewsEvents/AlertsNewsletters/all/. [Online; accessed 20-May-2020].
De Clercq, E. (2006). Potential antivirals and antiviral strategies against sars coronavirus infections. Expert review of anti-infective therapy, 4(2):291–302.
deepset ai (2020). deepset-ai/covid-qa. https://github.com/deepset-ai/COVID-QA. [Online; accessed 09-May-2020].
Dehning, J., Zierenberg, J., Spitzner, F. P., Wibral, M., Neto, J. P., Wilczek, M., and Priesemann, V. (2020). Inferring change points in the spread of covid-19 reveals the effectiveness of interventions. Science.
elcronos (2020). elcronos/covid-19. https://github.com/elcronos/COVID-19. [Online; accessed 09-May-2020].
Emery Berger (2021). Csrankings: Computer science rankings. http://csrankings.org/#/index?all&us. [Online; accessed 31-February-2021].
enigmampc (2020). Safetrace. github.com/enigmampc/SafeTrace. [Online; accessed 09-May-2020].
Erin Duffin (2020). Impact of the coronavirus pandemic on the global economy - statistics & facts. https://www.statista.com/topics/6139/covid-19-impact-on-the-global-economy/. [Online; accessed 08-May-2020].
EuroCrypt (2020a). Eurocrypt 2020 program. https://eurocrypt.iacr.org/2020/program.php. [Online; accessed 16-May-2020].
EuroCrypt (2020b). s-212 panel discussion on contact tracing. https://youtu.be/Xt4P8E_Y-xc. [Online; accessed 16-May-2020].
Evans, A. B., Blackwell, J., Dolan, P., Fahlén, J., Hoekman, R., Lenneis, V., McNarry, G., Smith, M., and Wilcock, L. (2020). Sport in the face of the covid-19 pandemic: towards an agenda for research in the sociology of sport.
Farhana, E., Imtiaz, N., and Rahman, A. (2019). Synthesizing program execution time discrepancies in julia used for scientific software. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 496–500.
Garcia, J., Feng, Y., Shen, J., Almanee, Sumaya Xia, Y., and Chen, Q. A. (2020). A comprehensive study of autonomous vehicle bugs. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ’20. to appear.
GitHub (2020a). Covid-19 : Github topics. https://github.com/topics/covid-19. [Online; accessed 07-May-2020]
GitHub (2020b). Language savant. https://github.com/github/linguist. [Online; accessed 07-May-2020].
GitHub (2020c). Search : Covid-19. https://github.com/search?q=covid-19. [Online; accessed 07-May-2020].
Greenberg, A. (2020). India’s covid-19 contact tracing app could leak patient locations. https://www.wired.com/story/india-covid-19-contract-tracing-app/. [Online; accessed 23-May-2020].
Helms, J., Kremer, S., Merdji, H., Clere-Jehl, R., Schenck, M., Kummerlen, C., Collange, O., Boulay, C., Fafi-Kremer, S., Ohana, M., et al. (2020). Neurologic features in severe sars-cov-2 infection. New England Journal of Medicine.
helpwithcovid (2020). helpwithcovid/covid-volunteers. https://github.com/helpwithcovid/covid-volunteers [Online; accessed 09-May-2020].
Herzig, K., Just, S., and Zeller, A. (2013). It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, page 392–401. IEEE Press.
Hu, F. Z. and Qian, J. (2017). Land-based finance, fiscal autonomy and land supply for affordable housing in urban china: A prefecture-level analysis. Land Use Policy, 69:454 – 460.
IEEE (2010). Ieee standard classification for software anomalies. IEEE Std 1044-2009 (Revision of IEEE Std 1044-1993), pages 1–23.
ImperialCollegeLondon (2020). Imperialcollegelondon/covid19model. https://github.com/ImperialCollegeLondon/covid19model. [Online; accessed 09-May-2020].
Islam, M. J., Nguyen, G., Pan, R., and Rajan, H. (2019). A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, page 510–520, New York, NY, USA. Association for Computing Machinery.
ivan aksamentov (2020). Fix types and linting errors. https://github.com/neherlab/covid19_scenarios/issues/101. [Online; accessed 10-May-2020].
Janamanchi, B., Katsamakas, E., Raghupathi, W., and Gao, W. (2009). The state and profile of open source software projects in health and medical informatics. International Journal of Medical Informatics, 78(7):457–472.
Jarynowski, A., Wójta-Kempa, M., Płatek, D., and Czopek, K. (2020). Attempt to understand public health relevant social dimensions of covid-19 outbreak in poland. Available at SSRN 3570609.
Jin, Z., Zhao, Y., Sun, Y., Zhang, B., Wang, H., Wu, Y., Zhu, Y., Zhu, C., Hu, T., Du, X., et al. (2020). Structural basis for the inhibition of sars-cov-2 main protease by antineoplastic drug carmofur. Nature Structural & Molecular Biology, pages 1–4.
John Hopkins University (2020). Corona Virus Resource Center. https://coronavirus.jhu.edu/. [Online; accessed 31-May-2020].
JoHof (2020). Johof/lungmask. https://github.com/JoHof/lungmask. [Online; accessed 09-May-2020].
juanmnl (2020). covid19-monitor. github.com/juanmnl/covid19-monitor. [Online; accessed 09-May-2020].
Kissler, S. M., Tedijanto, C., Goldstein, E., Grad, Y. H., and Lipsitch, M. (2020). Projecting the transmission dynamics of sars-cov-2 through the postpandemic period. Science.
Koerth, M., Bronner, L., and Mithani, J. (2020). Why it’s so freaking hard to make a good covid-19 model. https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make/. [Online; accessed 22-May-2020].
Kraemer, M. U., Yang, C.-H., Gutierrez, B., Wu, C.-H., Klein, B., Pigott, D. M., du Plessis, L., Faria, N. R., Li, R., Hanage, W. P., et al. (2020). The effect of human mobility and control measures on the covid-19 epidemic in china. Science, 368(6490):493–497.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
landovsky (2020). Fix password reset procedure. https://github.com/Applifting/pomuzeme.si/issues/99. [Online; accessed 10-May-2020].
Linares-Vásquez, M., Bavota, G., and Escobar-Velasquez, C. (2017). An empirical study on android-related vulnerabilities. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR ’17, pages 2–13, Piscataway, NJ, USA. IEEE Press.
Ma, L., Zhang, F., Sun, J., Xue, M., Li, B., Juefei-Xu, F., Xie, C., Li, L., Liu, Y., Zhao, J., and Wang, Y. (2018). Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), pages 100–111.
Ma, W., Chen, L., Zhang, X., Zhou, Y., and Xu, B. (2017). How do developers fix cross-project correlated bugs? a case study on the github scientific python ecosystem. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, page 381–392. IEEE Press.
makers-for life (2020). makers-for-life/makair. https://github.com/makers-for-life/makair. [Online; accessed 09-May-2020].
Marivate, V. and Combrink, H. M. (2020). Use of available data to inform the covid-19 outbreak in south africa: A case study. Data Science Journal, 19(1):1–7.
Marivate, V., Nsoesie, E., Bekele, E., and open COVID-19 data working group, A. (2020). Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa.
mdeous (2020). Missing code of conduct. https://github.com/reach4help/reach4help/issues/135. [Online; accessed 10-May-2020].
Mello, M. M. and Wang, C. J. (2020). Ethics and governance for digital disease surveillance. Science.
Mitchell Hartman (2020). Covid-19 jobless claims are now over 40 million. many are still waiting for unemployment benefits. https://www.marketplace.org/2020/05/28/covid-19-jobless-claims-unemployment-benefits-waiting/. [Online; accessed 31-May-2020].
Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002). Two case studies of open source software development:Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11(3):309–346.
Munaiah, N., Kroh, S., Cabrey, C., and Nagappan, M. (2017). Curating github for engineered software projects. Empirical Software Engineering, pages 1-35.
Munn, Z., Peters, M. D., Stern, C., Tufanaru, C., McArthur, A., and Aromataris, E. (2018). Systematic review or scoping review? guidance for authors when choosing between a systematic or scoping review approach. BMC medical research methodology, 18(1):143.
National Institute of Standard and Technology (2020). Nist privacy framework. https://www.nist.gov/privacy-framework. [Online; accessed 24-May-2020].
neherlab (2020). covid19_scenarios. github.com/neherlab/covid19_scenarios. [Online; accessed 09-May-2020].
nthopinion (2020). nthopinion/covid19. https://github.com/nthopinion/covid19. [Online; accessed 09-May2020].
Oliveira, E., Leal, G., Valente, M. T., Morandini, M., Prikladnicki, R., Pompermaier, L., Chanin, R., Caldeira, C., Machado, L., and de Souza, C. (2020). Surveying the impacts of covid-19 on the perceived productivity of brazilian software developers. In Proceedings of the 34th Brazilian Symposium on Software Engineering, SBES ’20, page 586–595, New York, NY, USA. Association for Computing Machinery.
OpenMined (2020). covid-alert. github.com/OpenMined/covid-alert. [Online; accessed 09-May-2020].
Paul, R., Baltes, S., Gianisa, A., Torkar, R., Kovalenko, V., Marcos, K., Nicole, N., Yoo, S., Xavier, D., Tan, X., et al. (2020). Pandemic programming. Empirical Software Engineering, 25(6):4927–4961.
pavel ilin (2020). Temperature data not saved in the backend. https://github.com/COVID-19-electronic-health-system/Corona-tracker/issues/351. [Online; accessed 10-May-2020].
Pei, K., Cao, Y., Yang, J., and Jana, S. (2017). Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, page 1–18, New York, NY, USA. Association for Computing Machinery.
popsolutions (2020). popsolutions/openventilator. https://github.com/popsolutions/openventilator. [Online; accessed 09-May-2020].
Prana, G. A., Treude, C., Thung, F., Atapattu, T., and Lo, D. (2019). Categorizing the content of github readme files. Empirical Softw. Engg., 24(3):1296–1327.
Pulido, C. M., Villarejo-Carballido, B., Redondo-Sama, G., and Gómez, A. (2020). Covid-19 infodemic: More retweets for science-based information on coronavirus than for false information. International Sociology, page 0268580920914755.
Rahman, A. and Farhana, E. (2020). Dataset for Paper - COVID-19-EMSE. https://figshare.com/s/7044678e1d7e7feb1efb. [Online; accessed 22-January-2021].
Rahman, A., Farhana, E., Parnin, C., and Williams, L. (2020). Gang of eight: A defect taxonomy for infrastructure as code scripts. In Proceedings of the 42nd International Conference on Software Engineering, ICSE ’20. to appear.
Ray, B., Posnett, D., Filkov, V., and Devanbu, P. (2014). A large scale study of programming languages and code quality in github. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 155-165, New York, NY, USA. ACM.
reustle (2020). Fix prefecture sorting. reustle/covid19japan/issues/15. https://github.com/reustle/covid19japan/issues/15.[Online; accessed 10-May-2020].
Rourke, M., Eccleston-Turner, M., Phelan, A., and Gostin, L. (2020). Policy opportunities to enhance sharing for pandemic research. Science, 368(6492):716–718.
Saldana, J. (2015). The coding manual for qualitative researchers. Sage.
SinghRajenM (2020). Rajasthan district names are wrong. https://github.com/covid19india/covid19india-react/issues/321. [Online; accessed 10-May-2020].
soroushchehresa (2020). soroushchehresa/awesomecoronavirus. github.com/soroushchehresa/awesome-coronavirus. [Online; accessed 16-May-2020].
Tamm, M. V. (2020). Covid-19 in Moscow: prognoses and scenarios. FARMAKOEKONOMIKA. Modern Pharmacoeconomic and Pharmacoepidemiology, 13(1):43–51.
Thung, F., Wang, S., Lo, D., and Jiang, L. (2012). An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering, pages 271–280.
Tian, Y., Pei, K., Jana, S., and Ray, B. (2018). Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, page 303–314, New York, NY, USA. Association for Computing Machinery.
Timoeller (2020). Cdc children scraper is outdated. https://github.com/deepset-ai/COVID-QA/issues/43. [Online; accessed 10-May-2020].
Tom Simonite (2020). Software that reads ct lung scans had been used primarily to detect cancer. now it’s retooled to look for signs of pneumonia caused by coronavirus. https://www.wired.com/story/chinese-hospitals-deploy-ai-help-diagnose/. [Online; accessed 08-May-2020].
vaclavpavlicek (2020). Missing postgis. https://github.com/Applifting/omuzeme.si/issues/164. [Online; accessed 10-May-2020].
Van Bavel, J. J., Baicker, K., Boggio, P. S., Capraro, V., Cichocka, A., Cikara, M., Crockett, M. J., Crum, A. J., Douglas, K. M., Druckman, J. N., et al. (2020). Using social and behavioural science to support covid-19 pandemic response. Nature Human Behaviour, pages 1–12.
Vardi, M. Y. (2009). Conferences vs. journals in computing research. Communications of the ACM, 52(5):5–5.
Wan, Z., Lo, D., Xia, X., and Cai, L. (2017). Bug characteristics in blockchain systems: A large-scale empirical study. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pages 413–424.
Wang, C., Li, W., Drabek, D., Okba, N. M., van Haperen, R., Osterhaus, A. D., van Kuppeveld, F. J., Haagmans, B. L., Grosveld, F., and Bosch, B.-J. (2020). A human monoclonal antibody blocking sars-cov-2 infection. Nature Communications, 11(1):1–6.
WHO (2020). Global research on coronavirus disease (covid-19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov. [Online; accessed 09-May-2020].
Why Hunger (2020). Why hunger. https://whyhunger.org/map.php. [Online; accessed 08-May-2020].
Will, C. M. (2020). ‘and breathe...’? the sociology of health and illness in covid-19 time. Sociology of Health & Illness.
Yang, C. Y. and Wang, J. (2020). A mathematical model for the novel coronavirus epidemic in wuhan, china. Mathematical Biosciences and Engineering, 17(3):2708–2724.
zbraniecki (2020). Data has a gap between 2020-3-11 and 2020-3-24. https://github.com/covidatlas/coronadatascraper/issues/375. [Online; accessed 10-May-2020].
Zhang, T., Chen, J., Luo, X., and Li, T. (2019). Bug reports for desktop software and mobile apps in github: What’s the difference? IEEE Software, 36(1):63–71
How to Cite
Copyright (c) 2021 Akond Ashfaque Ur Rahman, Effat Farhana
This work is licensed under a Creative Commons Attribution 4.0 International License.