An Empirical Study of Bugs in COVID19 Software Projects

The dire consequences of the COVID­19 pandemic have influenced development of COVID­19 software i.e., software used for analysis and mitigation of COVID­19. Bugs in COVID­19 software can be consequential, as COVID­19 software projects can impact public health policy and user data privacy. The goal of this paper is to help practitioners and researchers improve the quality of COVID­19 software through an empirical study of open source software projects related to COVID­19. We use 129 open source COVID­19 software projects hosted on GitHub to conduct our empirical study. Next, we apply qualitative analysis on 550 bug reports from the collected projects to identify bug categories. We identify 8 bug categories, which include data bugs i.e., bugs that occur during mining and storage of COVID­19 data. The identified bug categories appear for 7 categories of software projects including (i) projects that use statistical modeling to perform predictions related to COVID­19, and (ii) medical equipment software that are used to design and implement medical equipment, such as ventilators. Based on our findings, we advocate for robust statistical model construction through better synergies between data science practitioners and public health experts. Existence of security bugs in user tracking software necessitates development of tools that will detect data privacy violations and security weaknesses.


Introduction
The novel Coronavirus disease (COVID19) is a world wide pandemic that spreads through droplets generated from coughs or sneezes and by touching contaminated sur faces (John Hopkins University, 2020). As of May 31 2020, COVID19 has caused 370,247 deaths across the world (John Hopkins University, 2020). Apart from causing thousands of deaths and creating long term health repercussions for vul nerable populations, COVID19 has severely impacted the economic sector. According to a recent study (Erin Duffin, 2020), due to COVID19 gross domestic product (GDP) will decrease from 3.0% to 2.4% worldwide. As of May 28 2020, nearly 41 million citizens reported unemployment in USA alone (Mitchell Hartman, 2020). More than 3.9 billion peo ple around the world were under some form of stay at home order due to COVID19 (Alasdair Sandford, 2020).
Health care professionals are at the frontline of combat ing COVID19. Practitioners from other domains, such as software engineering have also joined forces to analyze and mitigate the negative consequences of COVID19. For ex ample, statistical modeling was used to build a software that identifies pneumonia caused by COVID19 from lung scan images (Tom Simonite, 2020). The software was used in 34 Chinese hospitals (Tom Simonite, 2020). In response to the food insecurity caused by COVID19, practitioners have cre ated an interactive visualization software that displays free meal sites across USA (Why Hunger, 2020). The creators of the software envision in building a social movement to eradicate hunger and address economic inequalities. As an other example, Apple and Google have jointly announced of creating a software framework that will help practitioners build tools to trace COVID19 infection status of mobile app users (Apple, 2020). The abovementioned examples show COVID19 software i.e., software used for analysis and miti gation of COVID19, to have nearterm and longterm effects on public health and society.
Despite the abovementioned advancements, COVID19 software projects are susceptible to bugs. Let us consider Fig  ure 1 in this regard. Figure 1 provides a snapshot of a bug re port related to statistical modeling (Begley, 2020a). We ob serve when implementing a statistical model the practition ers did not consider the correlation between intensive care unit (ICU) bed availability and death rate prediction. Further more, the number of ICU beds is incorrectly assumed to be 40,000 instead of 1,000.
We hypothesize systematic analysis can reveal bug cate gories including statistical modeling bugs similar to Figure 1. In prior work researchers (Garcia et al., 2020; Rahman et al., 2020; LinaresVásquez et al., 2017; Catolino et al., 2019; Thung et al., 2012; Wan et al., 2017 have documented the im portance of bug categorization. For example, for autonomous vehicle software Garcia et al. 2020 stated that categorization of bugs can help to construct bug detection and testing tools. LinraesVásquez et al. 2017 stated categorizing vulnerabil ities can help Android practitioners "in focusing their veri fication and validation activities". According to Catolino et al. 2019, "understanding the bug type represents the first and most timeconsuming step to perform in the process of bug triage".
In prior work, researchers have categorized bugs for infras tructure as code (IaC) , autonomous vehicle (Garcia et al., 2020), and machine learning (Thung et al., 2012; Islam et al., 2019 software. However, COVID 19 software is different from previously studied software in the following aspects: (i) development context: unlike previ ously studied software projects, COVID19 software is de veloped in response to a pandemic that infected 6.1 million individuals in five months (John Hopkins University, 2020), and (ii) public health: unlike previously studied software projects, COVID19 software has direct implications on pub In response to the pandemic, researchers have conducted studies related to modeling (Dehning et al., 2020; Yang and Wang, 2020; Tamm, 2020, biological science (Jin et al., 2020; Wang et al., 2020; De Clercq, 2006; Helms et al., 2020, social science (Van Bavel et al., 2020; Pulido et al., 2020; Evans et al., 2020; Will, 2020; Jarynowski et al., 2020, and policy making (Corey et al., 2020; Mello and Wang, 2020; Rourke et al., 2020; Kraemer et al., 2020. However, characterization of bugs in COVID19 software remains an unexplored area. The scope of our paper is to get a systematic understand ing of bugs in COVID19 software projects. In our paper, we refer to COVID19 software projects as software projects that were created to analyze and mitigate the consequences of COVID19. These projects were created in response to a global pandemic that created a worldwide impact on pub lic health, economy, and societal activities. Our hypothesis is that the utility of COVID19 software projects and the ur gency associated with these projects can yield (i) manifesta tion of bugs unique to the COVID19 reality, and (ii) bug res olution time. Furthermore, from our empirical analysis what categories of bugs appear for what types of COVID19 soft ware projects.
The goal of this paper is to help practitioners and re searchers improve the quality of COVID19 software through an empirical study of open source software projects related to COVID19.
We answer the following research questions: We organize rest of the paper as follows: We discuss re lated work in Section 2. We provide the methodology to an swer the three research questions in Section 3 and provide the results in Section 4. We discuss our results with a sum mary of our findings in Section 5. We provide the limitations of our paper in Section 6. Finally, we conclude the paper in Section 7. Our constructed dataset is available as a public, citable repository .

Overview of the Empirical Study
An overview of our pa per is available in Figure 2. First, we mine software projects related to COVID19 from GitHub by applying a filtering cri teria based on number of issues, number of developers etc. Next, we apply qualitative analysis technique called open coding (Saldana, 2015) on the README files of the col lected open source software (OSS) projects to identify what categories of OSS projects exist related to COVID19. After characterizing the collected software projects, we again ap ply open coding on 550 bug reports from the collected OSS projects to identify bug categories. We also quantify the fre quency and resolution time of each bug category across the identified project categories. Finally, we conduct a scoping review (Munn et al., 2018) to find the similarities in bug categories between COVID19related software projects and other categories of software projects.

Related Work
Our paper is related with prior research that has focused on categorization of bugs in OSS projects. Mockus et al. 2002 studied the contribution nature in OSS Apache and Mozilla projects. They (Mockus et al., 2002) observed contributors who submit bug reports are approximately 8.2 times higher in number than contributors who address bugs in bug reports. Ma et al. 2017 investigated Python GitHub projects that are used in the scientific domain, and observed developers to use stack traces, as well as communicate with upstream devel opers, to identify root causes of bugs. Zhang et al. 2019 ex amined bug reports for mobile and desktop software hosted on GitHub, and identified differences on how the reports are constructed. Ray et al. 2014 studied the correlations between bugs and the language the software is being developed, and reported a modest correlation using an empirical study of 729 GitHub projects. Categorization of domainspecific OSS bugs has also been investigated: Thung et al. 2012, Garcia et al. 2020, Wan et al. 2017, Islam et al. 2019 in separate research papers used OSS projects to classify bug categories respectively, for machine learning, autonomous vehicle, blockchain, deep learning, and IaC.
Our paper is also related with publications that have in vestigated the impact of COVID19 on software develop ment. Ralph et al. 2020 surveyed 2,225 practitioners and re ported fear related to COVID19 to affect productivity of software practitioners. Butler and Jaffe 2020 conducted a di ary study with 435 practitioners and reported practitioners to face challenges, such as having too many meetings and feel ing overworked while working from home due to COVID19. Oliveira et al. 2020 surveyed 413 practitioners from Brazil and reported practitioners' perceived productivity to increase due to fewer interruptions.
From the abovementioned discussion we observe bugs in software projects related to COVID19 to be an under explored area. While there exists several bug categorization studies (Thung et al., 2012; Garcia et al., 2020; Wan et al., 2017; Islam et al., 2019; Rahman et al., 2020 no studies ex ist for COVID19related projects. The bug categorization related studies for IaC, block chain, and deep learning moti vated us to derive bug categories and quantify the identified bug categories. Wan et al. 2017's paper on blockchain bugs motivated us to study bug resolution time for each identified bug category. In our paper, we study COVID19 software bugs in the following manner: • categories of bugs; • frequency of identified bug categories; • resolution time of identified bug categories; and • categories of software projects.

Methodology
In this section we provide the methodology to answers re search questions: RQ1, RQ2, and RQ3.

Methodology for RQ1: What categories of open source COVID19 software projects exist?
We define COVID19 software projects as software projects used for analysis and mitigation of COVID19. We hypoth esize multiple categories of COVID19 software projects to exist in the OSS domain. We validate our hypothesis by sys tematically categorizing COVID19 software projects. Our categorization will provide insights on how the software de velopment community has responded to the COVID19 pan demic. We answer RQ1 by completing the following steps:

Dataset Collection
We conduct our empirical analysis by collecting COVID 19 software projects hosted on GitHub. To collect these projects we use GitHub's search utility (GitHub, 2020c), where we first identified projects tagged as 'covid19'. We use the search string 'covid19', as it is a topic designated for COVID19 by GitHub (GitHub, 2020a). Our assumption is that by using a GitHubdesignated tag we can collect OSS projects hosted on GitHub that are related to COVID19. OSS projects hosted on GitHub are susceptible to quality issues, as GitHub users often host repositories for personal purposes that are not reflective of realworld software de velopment (Munaiah et al., 2017). Upon collection of the projects we apply a set of filtering criteria so that we can identify projects that contain sufficient data for analysis. We describe the filtering criteria below: • Criterion1: The project must have at least 2 developers.
Our assumption is that this criterion will filter out projects used for personal purposes. • Criterion2: The project has at least 5 open issues. We use this filtering criterion to identify projects that are actively maintained. Our assumption is that by using this criterion we will able to identify COVID19 software projects that are not used for personal purposes as well as projects that are active. Prior research (Agrawal et al., 2018) has also used the count of issues to filter OSS projects hosted on GitHub to conduct empirical studies. • Criterion3: The project must have at least two commits per month. Munaiah et al. 2017 used the threshold of at least two commits per month to determine which projects have enough development activity for software organiza tions. We use this threshold to filter projects with short de velopment activity. • Criterion4: The README of the project is written in En glish. README projects related to COVID19 can be non English. We do not include nonEnglish projects as raters who will perform categorization are not familiar with non English languages, such as Spanish and Cantonese.

Characterization of COVID-19 Projects
Characterization of COVID-19 Software Bugs • Criterion5: The project is related with COVID19. We use the 'topic' 1 feature of GitHub to search and identify COVID19 software projects. However, practitioners can mislabel projects using the 'topic' feature of GitHub po tentially including projects in our dataset that are not re lated with COVID19. For example, from manual inspec tion we observe the 'RehanSaeed/Schema.NET' 2 project to be tagged as 'covid19', even though it is not related with COVID19. In fact, the project is used to convert blob objects into C# classes.

Qualitative Analysis of README files
We apply a qualitative analysis called open coding (Saldana, 2015) on the content of README files for each of the down loaded projects from Section 3.1.1. README files describe the content of the project and give GitHub users an overview of the software project (Prana et al., 2019). We hypothesize that by systematically analyzing the content of the README files we can derive what types of software projects are devel oped that are related to COVID19. In open coding a rater identifies and synthesizes patterns within unstructured text (Saldana, 2015). We select open cod ing because we can obtain detailed information on the soft ware project categories. We use a hypothetical example to demonstrate our process of open coding in Figure 3. First, we collect text from the README files for each of the col lected projects from Section 3.1.1. Next, we extract text snip pets that describe the purpose of the software project. For example, from the raw text 'The COVID19 Vulnerability In dex (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID19' we extract the text snippet 'a predictive model', as the extracted text snippet describes the purpose of the software project. Next, from the text snip pets 'a predictive model' and 'modelling estimated deaths' we generate an initial category called 'Models to predict'. Two initial categories 'Models to predict' and 'Models to un derstand' are combined into one category 'Statistical mod eling', as they both indicate the descriptions of the software projects to be related with statistical modeling.
The first and second authors conduct the open coding process separately. Both authors used Excel spreadsheets to conduct the open coding process manually. The first and second authors respectively an experience of 10 and 6 years in software engineering and has experience in con ducting open coding upon software project artifacts, such as commit messages  and Stack Over flow posts (Farhana et al., 2019). Upon completion of the open coding process, the first and second authors identify agreements and disagreements. Disagreements are resolved upon discussion, agreement rate is calculated using Cohen's Kappa (Cohen, 1960). During the discussion phase both au thors agreed present their justification, and recheck the cat egory derivation based on the discussion and revisiting con tent. The mapping determined upon discussion is considered final. One project can map to multiple categories.

Closed Coding
We apply closed coding (Crabtree and Miller, 1999) to iden tify which project maps to the identified categories from Sec tion 3.1.2. Closed coding is the qualitative analysis technique where a rater maps an artifact to a predefined category by inspecting the artifact (Crabtree and Miller, 1999). The first and second author separately conduct closed coding on the collected README files. Both authors use Excel spread sheets to conduct closed coding. After completing the closed coding process the first and second authors identify agree ments and disagreements. Agreement rate is recorded using Cohen's Kappa (Cohen, 1960). Disagreements are resolved using discussion. During the discussion phase both authors present their justification for disagreements. Next, based on the discussion the authors recheck the labeling based on the justification and content analysis. The categorization deter mined upon discussion is considered final.

Rater Verification
The derived categories are susceptible to the bias of the first and second author. We mitigate the limitation by allocating an additional rater who applied closed coding for a subset of the README files. The additional rater who is not an author of the paper, is a fourth year PhD candidate in the Depart ment of Computer Science at Tennessee Technological Uni

README excerpt Raw Text Initial Category Category
The  versity. The rater has a professional experience of 2 years in software engineering and has conduced qualitative analysis on software artifacts, such as bug reports. We randomly al locate a set of 100 README files mined from 100 projects to the rater. The rater applies closed coding on the content of the README files, to identify the mapping between each project and identified categories. Upon completion of closed coding we calculate Cohen's Kappa (Cohen, 1960) between the rater and the first author, as well as with the second au thor, separately.

Methodology for RQ2: What categories of bugs exist in open source COVID19 software projects? How frequently do the identified bug categories appear? What is the resolution time for the identified bug categories?
In this section, we answer "RQ2: What categories of bugs appear in COVID19 software projects? How frequently do the identified bug categories appear? What is the resolution time for each bug category?" A categorization of bugs for COVID19 software projects can inform practitioners and re searchers about how software related to COVID19 is devel oped and in which areas they can help. Furthermore, educa tors can learn about the software bugs that occur in a soft ware related to a pandemic and disseminate these findings in the classroom. Frequency of the identified bug categories can help us understand what categories of software tend to contain what types of software bugs and provide quality im provement suggestions accordingly. Quantifying the resolu tion time for bugs in software projects can help software en gineering researchers provide actionable guidelines to prac titioners. For example, Wan et al. 2017 observed that for blockchain software projects security bugs can take longer to fix compared to other bug categories. Based on their find ings Wan et al. 2017 recommended that blockchain project maintainers can adopt security analysis and repair tools to fix security bugs quickly. We provide the methodology to iden tify bug categories, quantify bug category frequency, and bug resolution time below: Methodology to Identify Bug Categories: We identify bug categories using the following steps: • Step#1Filtering: We collect the 4,405 issue reports from the 129 projects and manually inspect each issue report. We do not rely on automated approaches, such as keyword search or using bug labels, as automated approaches tend to generate false positives, which may bias research results (Herzig et al., 2013). While inspect ing each issue report we use the following IEEE defini tion for bugs: "an imperfection that needs to be replaced or repaired" (IEEE, 2010), similar to prior work (Rah man et al., 2020). By completing this step we will obtain a set of closed issues reports that correspond to bugs. We use closed reports because as open bug reports are often incomplete and may not help in identifying bugs (Wan et al., 2017). The first and second author manually inspect individu ally to identify what issue reports correspond to bugs. We record agreement rate and Cohen's Kappa (Cohen, 1960) between the first and second author. Disagree ments between the first and second author are resolved through discussions. The process is subjective and sus ceptible to the bias of the first and second author. We mitigate the bias by using an additional rater, who in spected randomly inspected 100 issue reports and clas sified them as bug reports and nonbug reports. The ad ditional rater is the fourth year PhD candidate at Ten nessee Technological University who is also involved in rater verification for RQ1.
BugPropAll(x) = # of bug reports labeled as category x total # of bug reports * 100% (1) BugPropCateg(x, y) = # of bug reports labeled as x, of project type y # of bug reports for project type y * 100% (2) • Step#2Open coding: We apply open coding (Saldana, 2015) on the content of the collected bug reports from Step#1. Our open coding process is illustrated in Fig  ure 4 using an example. First, we extract raw text from bug report titles and description, from which we gener ate initial categories. Next, we merge initial categories based on the commonalities and generate categories. Similar to deriving project categories, the first and sec ond author separately apply the process of open cod ing to generate bug categories. Upon completion of the process we quantify agreement rate and measure Co hen's Kappa (Cohen, 1960). For disagreements we con duct discussion. Generated categories upon discussion is considered final.

Methodology to Quantify Bug Category Frequency:
We apply the following steps to quantify the frequency of identified bug categories: • Step#1Closed coding: We apply closed coding (Crabtree and Miller, 1999) to map each identified category to the bug reports that we study. The first and second author sep arately apply closed coding for the collected bugs from Step#1. Upon completion, we calculate the agreement rate and Cohen's Kappa (Cohen, 1960). Disagreements are re solved using discussion. • Step#2Metric calculation: We quantify the frequency of the identified bug categories using two metrics: Bug PropAll' and 'BugPropCateg'. We use Equations 1 and 2 to respectively calculate 'BugPropAll' and 'BugPropCateg'. The 'BugPropAll' metric refers to the proportion of bugs across all projects, and provides a holistic overview of the frequency of identified bug categories. The 'BugProp Categ' metric refers to the proportion of bugs for a certain project category, and provides a granular overview of bug category frequency for each software project types identi fied from Section 4.1.2. • Step#3Rater verification: The use of first and second au thor as raters to conduct closed coding is susceptible to rater bias. We mitigate this limitation by allocating an addi tional rater. We assign randomly selected 250 bug reports to the additional rater who apply closed coding. We pro vide the additional rater with a document that provides def initions of each identified category with examples. Similar to our process of rater verification for project cate gorization, the additional rater is the fourth year PhD candi date in the Department of Computer Science in Tennessee Technological University. The fourth year PhD candidate is involved in the rater verification process for identifying project categories and labeling issue reports as bug reports.

Methodology to Quantify Bug Resolution Time
We use the open and closing timestamp for each closed bug report in our dataset to quantify the resolution time for each bug cate gory, similar to Wan et al. 2017. We calculate bug resolution time by computing the number of hours that have elapsed between when the bug report is opened and closed, and not reopened again, as per our dataset , which was downloaded on April 04, 2020. We report bug resolution time for all bug categories, as well as for bug reports that belong to certain categories of software projects.
3.3 Methodology to Answer RQ3: How simi lar are the identified bug categories to that with previously studied software projects?
We conduct a scoping review of publications related to soft ware bug categorization. Using a scoping review, researchers can synthesize results using a limited search (Anderson et al., 2008). According to Munn et al. 2018 "Researchers may con duct scoping reviews instead of systematic reviews where the purpose of the review is to identify knowledge gaps, scope a body of literature, clarify concepts or to investigate research conduct.". Unlike a systematic literature review, a scoping review is less comprehensive, and can be used as a precursor to conduct a systematic literature review. Scoping review can be useful to collect emerging evidence, which eventually can be used to inform further research decisions (Anderson et al., 2008). For example, if a researcher is inexperienced in the do main of software fuzzing, and wants to get an understanding of existing topics such as practices and techniques to imple ment fuzzing, then a scoping review could be useful to that researcher of interest. We conduct a scoping review by identifying wellknown venues where software engineering research is published. We select five conferences: International Conference on Soft ware Engineering (ICSE), Symposium on Foundations of Software Engineering (FSE), International Conference on Automated Software Engineering (ASE), International Con ference on Mining Software Repositories (MSR), and Inter national Symposium on Software Testing and Analysis (IS STA). We select these conferences because these conferences are considered reputed venues to publish literature related to software engineering (Emery Berger, 2021), and sponsored by special interest groups of the Association of Computing Machinery (ACM). We select conferences as they tend to have a shorter review cycle and are more likely to include recent advances in the field of interest (Vardi, 2009). We con duct the review by applying the following steps: • Step1: We download all papers from 2010 to 2020 for each of the four conferences. We select papers from 2010 to 2020 to identify and synthesize state of the art bug tax onomies and categories used for a wide range of software projects. Papers that studied bug categories prior to 2010 may not give us an understanding of the state of art. Our hypothesis is that by identifying papers from the last 10 years we will get a better overview of what types of bugs appear for a wide range of software projects. • Step2: We read the title, abstract, and keywords to deter mine if the downloaded papers are related to software bug categorization. • Step3: Upon completion of Step2, one rater reads each collected paper, and identifies topics discussed in the pa per of interest using qualitative analysis. For each paper the rater determines if the paper focuses on bug categoriza tion. If so, the rater documents the bug categories for the reported software project.

Bug report excerpt
Upon completion of the abovementioned steps, we derive reported bug categories for multiple software projects.

Results
In this section, we provide answers to the three research ques tions, RQ1, RQ2, and RQ3.

Answer to RQ1: What categories of open source COVID19 software projects exist?
We answer RQ1 by first providing summary statistics of our dataset in Section 4.1.1. Next, we report categories of the projects in Section 4.1.2.

Summary of Dataset
Altogether we download 129 projects for analysis. Using the search feature we identify 3,276 public projects upon which we apply our filtering criterion. A complete break down of our filtering criterion is available in Table 1. At tributes of the projects are available in Table 2. 'Languages' in Table 2 correspond to the count of main programming lan guages of the collected projects as determined by GitHub's linguist tool (GitHub, 2020b). Example languages include JavaScript, Python and R. A temporal evolution of the 129 COVID19 software projects based on creation date is available in Figure 5. We observe sharp increase in project creation after Feb 29, 2020.

Categorization of COVID19 Software Projects
We identify 7 categories of COVID19 software projects. We describe each of the categories below in alphabetic order: I: Aggregation:: This category includes software projects that curate data related to COVID19 and present collected COVID19 data in an aggregated format using vi sualizations. The purpose of these projects is to help users un derstand the spread of the COVID19 disease over time and location. Software projects that belong to this category can be country specific as done in 'juanmnl/covid19monitor' (juan mnl, 2020) and 'dsfsi/covid19za' (Marivate and Combrink, 2020) respectively, for Ecuador and South Africa. Aggrega tion of COVID19 data can also be at a global level, for ex ample, 'boogheta/coronaviruscountries' (boogheta, 2020) is a software that aggregates COVID19 data across the world and allows software users to compare the reported cases on a countrybycountry basis. II: Education:: This category includes projects that pro vide utilities on educating people about COVID19. Lack of knowledge related to infections and symptoms can con tribute to rapid spreading of COVID19. The purpose of these projects is to build software, where users can ask questions and obtain answers. We observe two categories of software: first, question and answer websites similar to Stack Overflow 3 , such as 'nthopinion/covid19' (nthopinion, 2020), where users can ask questions about COVID19, and other users answer such questions. Second, we observe bot specific software, such as 'deepsetai/COVIDQA' (deepset ai, 2020) that provides answers for questions related to COVID19 automatically.
III: Medical equipment:: This category includes projects to curate and maintain source code for the design and implementation of medical equipment used to treat COVID 19. The purpose of these projects is to create designs of COVID19 related medical equipment, such as ventilators at scale, so that the growing need of medical equipment in hos pitals is satisfied. One example of such repository is 'makers forlife/makair' (makersfor life, 2020), which states the fol lowing in it's README page: "Aims at helping hospitals cope with a possible shortage of professional ventilators dur ing the outbreak. Worldwide. ... We target a perunit cost well under 500 EUR, which could easily be shrunk down to 200 EUR or even 100 EUR per ventilator given proper economies of scale, as well as choices of cheaper ontheshelf compo nents". The project includes design of the proposed ventila tors as CAD files, as well as relevant firmware available as C++ code files.
Another example is the 'popsolutions/openventila tor ' (popsolutions, 2020), which aims to provide cheap but reliable ventilators to treat COVID19 in economically underdeveloped regions of the world. The software project initiated from a Facebook group called 'Open Source COVID19 Medical Supplies' 4 , where members discussed the scarcity of ventilators and the importance of creating cheap ventilators through efficient design. In the project we notice developers to create, build, and share designs using OpenSCAD scripts. OpenSCAD is an open source tool to build computeraided design (CAD) objects 5 .
IV: Mining:: This category includes projects that provide APIs to mine COVID19 data from data sources, such as the US Center for Disease Control and Prevention (CDC) 2020, the World Health Organization (WHO) 2020, and data reported from local institutions. The purpose of this category of software is to provide utilities for software devel opers so that they can get realtime access to COVID19 data to build aggregation software, discussed above. Because of the nature of the pandemic, access to realtime data is pivotal for accurate aggregation and analysis. The mining tools help developers to get such support. Mining software can be lo cation specific, for example 'dsfsi/covid19africa'  is dedicated to curate and collate COVID19 re lated data for African countries.
V: User tracking:: This category includes software projects that collects information from users regarding their COVID19 infection status. Tracking of user information can happen voluntarily, where the user voluntarily self re ports COVID19 infection status. The 'enigmampc/Safe Trace' (enigmampc, 2020) software is an example where users self report their infection status as well as location his tory. Tracking of user information can also be done using inference, as done in 'OpenMined/covidalert' (OpenMined, 2020), where the software collects user's location informa tion to predict if the user is in a location with high infection density. One utility of these projects is to identify highrisk locations so that users can have an understanding of which nearby location can be avoided. Self reporting software have yielded benefits for China and South Korea .

VI: Statistical modeling::
This category includes soft ware that use statistical models to predict attributes related to COVID19. The purpose of the projects is to make pre dictions for the future based on existing data. Example us age of statistical models include (i) predicting death rate as done in 'ImperialCollegeLondon/covid19model' (Imperial CollegeLondon, 2020), (ii) automating the process of lung segmentation with computerized tomography (CT) scan, as done in 'JoHof/lungmask' (JoHof, 2020), (iii) predicting the impact of the COVID19 pandemic on hospital demands as done in 'neherlab/covid19_scenarios ' (neherlab, 2020), and (iv) predicting presence of COVID19 with Xray images us ing deep learning as done in 'elcronos/COVID19' (elcronos, 2020). VII: Volunteer management:: This category includes software used to efficiently manage volunteering effort. The purpose of this software is to build software platforms so that users can volunteer and participate in activities to help dis tressed families and communities. One example is the 'covid volunteers ' (helpwithcovid, 2020) software, which provides a web portal where users can sign up for 650 projects that include donation of masks, personal protective equipment (PPEs), and testing of COVID19 6 . Platforms can be global, such as 'covidvolunteers', and also regional, for example 'Applifting/pomuzeme.si' (Applifting, 2020) creates a web portal so that people inside Czech Republic can volunteer.

Frequency of the Identified Categories
Based on project count aggregation is the most frequent cat egory. Along with project count, we provide summary statis tics of projects that belong to each category in Table 3. We also observe on average user tracking projects to be more frequently released compared to other project types.
We identify four software projects that belong to multiple categories. As an example, the 'soroushchehresa/awesome coronavirus' (soroushchehresa, 2020) project belongs to the categories: aggregation, mining, and statistical modeling.

Rater Agreement
We report agreement rate for three steps: open coding, closed coding, and rater verification. Open coding: After completing open coding, the first and sec ond author respectively, identified 7 and 10 categories. The agreement rate is 70.0%, and the Cohen's Kappa is 0.7, indi cating 'substantial' agreement (Landis and Koch, 1977). The authors disagreed on 'Volunteering software related to local communities', 'Education bots', and 'Aggregated visualiza tions', additional categories identified the second author.
Disagreements were resolved through discussion. Both au thors provided justifications for their categorization. The first author pointed out that the category 'Education bots' can be merged with 'Education' as the category 'Education' encom passes all categories of knowledge software, such as bots and web applications. The first author also pointed out that 'Volunteering software related to local communities' can be merged with 'Volunteer management', as the category is an extension of the category 'Volunteer management'. Further more, the first author also pointed out that 'Aggregated visu alizations' can be merged with 'Aggregation', as 'Aggrega tion' includes software that aggregates COVID19 data and displays aggregated data with visualizations. The second au thor was convinced by the first authors' justification and up dated her derived list of categories. Closed coding: During closed coding the first and second au thors mapped each of the 129 projects to an existing category. The agreement rate is 93.8%. The Cohen's Kappa is 0.92. The authors disagreed on the labeling of 8 projects, which are resolved through discussion. During the discussion phase both authors agreed to present their justification, and recheck the labeling based on the justification and content analysis. The categorization determined upon discussion is considered final. 6 https://helpwithcovid.com/medical Rater verification: We also measured the agreement rate be tween an additional rater and the authors for categorizing README files of projects. Cohen's Kappa between the ad ditional rater and the first author for a randomly selected set of 50 README files is 0.73, indicating 'substantial' agree ment (Landis and Koch, 1977). Cohen's Kappa between the additional rater and the second author for a randomly se lected set of 50 README files is 0.73, indicating 'substan tial' agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 76.0%.

Answer to RQ2: What categories of bugs exist in open source COVID19 software projects? How frequently do the identified bug categories appear? What is the resolu tion time for the identified bug categories?
We answer RQ2 by first providing a breakdown of how we obtained our bug reports in Table 4 and 5. As shown in Ta ble 5, the categories with the most and least bug reports are re spectively, aggregation and medical equipment. One project can belong to multiple categories, and that is why the total count of bug reports does not total 550. Next, we describe the identified bug categories in Sec tion 4.2.1 by applying open coding on the collected 550 bug reports. The frequency of the identified bug categories is pro vided in Section 4.2.2. We provide details of rater verification in Section 4.2.3. Finally, we provide the bug resolution time in Section 4.2.4.

Bug Categories of COVID19 Projects
We identify 8 bug categories, which we describe below al phabetically: I: Algorithm:: This category corresponds to bugs when implementation of an algorithm does not follow expected be havior. An algorithm is a sequence of computational steps that transform input into output (Cormen et al., 2009). We ob serve algorithm bugs to include two subcategories: (i) bugs related to statistical modeling algorithms, where statistical modeling results are incorrect due to incorrect assumptions and/or implementations, and (ii) bugs related to incorrect logic implemented in the software.

Example:
We provide examples for the two sub categories: • Statistical modeling: In a bug report titled "Death rates should increase when ICU's are overwhelmed" (Beg ley, 2020a), a practitioner describes how incorrect as sumption can result in incorrect modeling behavior. The practitioner discusses that bed space is correlated with estimation of fatality rate. When bed space of hospi tals are exhausted hospitals will not be able to treat new COVID19 new patients, which could potentially increase the fatality rate.
The bug report provides evidence that if the context of COVID19 is not correctly incorporated in statis tical models, those models will provide incorrect re sults. Incorrect statistical models can be consequential,    (Begley, 2020b). • Incorrect logic: In a bug report titled "Fix Prefecture Sorting" (reustle, 2020), a practitioner describes a sort ing bug which occurs when trying to visualize COVID 19 cases based on prefectures in Japan. A prefecture is an administrative jurisdiction in a country similar to a state or province (Hu and Qian, 2017). The bug occurred due to an incorrect logic that did not perform sorting by prefectures.

II: Data::
This category corresponds to bugs that occur during mining and storage of COVID19 data. As discussed in Section 4.1.2 we observed our dataset to include projects that mine and aggregate COVID19 data. We observe four subcategories of data bugs: (i) storage: bugs that occur while storing data in a database, (ii) mining: bugs that occur while retrieving data from data APIs, (iii) location: bugs where lo cation information in stored data is incorrect, and (iv) time series: bugs that correspond to missing data for a certain time period. Example: We provide examples for each of these sub categories below: • Storage: In a bug report titled "Temperature data not saved in the backend" (pavel ilin, 2020), a practitioner describes a bug where patient temperature data is in serted in the frontend but not stored in the database. • Mining: Bugs occur when COVID19related data is being mined. A practitioner describes a mining bug in a bug report titled "CDC Children scraper is out dated" (Timoeller, 2020). The mining tool mines data related to children affected by COVID19. • Location: In a bug report titled "Rajasthan District names are wrong", a practitioner describes that inserted location data for an Indian state called 'Rajasthan' is wrong (SinghRajenM, 2020). • Time series: Missing data was reported for a project and reported in a bug report titled "Data has a gap between 2020311 and 2020324" (zbraniecki, 2020).

III: Dependency::
This category corresponds to bugs that occur when execution of the software is dependent on a software artifact that is either missing or incorrectly speci fied. For COVID19 projects, an artifact can be an API or a build artifact.
Example: In a bug report titled "Missing PostGIS" (va clavpavlicek, 2020), a practitioner describes that installation and execution of the software is prohibited due to a software package called 'PostGIS', which is used to store spatial and geographic measurements, such as area, distance, polygon, and perimeter in PostgreSQL databases.
IV: Documentation:: This category corresponds to bugs that occur when incorrect and/or incomplete informa tion in specified in release notes, maintenance notes, and doc umentation files, such as README files.
Example: In a bug report titled "Missing code of conduct", a practitioner describes a 'CODE_OF_CONDUCT.md' file to be missing in a Markdown file that describes how practi tioners can contribute to the project (mdeous, 2020). V: Performance:: This category corresponds to bugs that cause performance discrepancies for the software. Per formance bugs are manifested in slow response of the web or mobile app.
Example: In a bug report titled "Cluster animation slow ing down the browser. It also takes much time", a practitioner describes how a performance bug related to an animation fea ture is slowing down a Firefox browser on Windows 10 (Sub ratappt, 2020). The performance bug was reported for a web site called 'covid19india.org' 7 , which aggregates COVID 19 data for India and displays them. VI: Security:: This category corresponds to bugs that violate confidentiality, integrity, or availability for the soft ware.
Example: In a bug report titled "Fix password reset proce dure" (landovsky, 2020), a practitioner describes a password reset bug, where the password reset procedure ends arbitrar ily after 500 login attempts. VII: Syntax:: This category corresponds to bugs related with the syntax of the programming languages used to de velop the software. Example: We notice bugs related to data types in 'ne herlab/covid19_scenarios'. In the bug report titled "Fix types and linting errors" (ivan aksamentov, 2020), a practitioner describes how linting and type checking was disabled for the project, which led to bugs related to linting and type check ing.
VIII: UI:: This category corresponds to bugs that in volve the user interface (UI) of the software. UI bugs include navigationrelated bugs on web pages, bugs related to acces sibility, displaying incorrect images, links, and color, and re sponsiveness.
Example: In a bug report titled "accessibility fixes" (abquirarte, 2020) describes a UI bug related to accessibility. According to the bug report, a screen reader incorrectly renders check marks and crosses in front of the "Do's and Don't as M's and N's".

Frequency of Identified Bug Categories
Based on the 'Proportion of Bugs Across All Projects (Bug PropAll)' metric we observe UI bugs to be the most frequent category, whereas documentation is the least frequent cate gory. We provide a complete breakdown of the metric in Ta ble 6. Data bugs have four subcategories: storage, mining, location, and time series. The frequency for storage, mining, location, and time series is respectively, 4.7%, 5.8%, 87.2%, and 2.3%. Algorithm bugs have two subcategories: statisti cal modeling and wrong logic. The frequency for statistical modeling and wrong logic is respectively, 42.3% and 57.7%.
We observe bug category frequency to vary for differ ent categories of projects. We provide the 'Proportion of Bugs For a Certain Project Category (BugPropCat)' val ues for each project category in Table 7. 'AGG', 'MINE', 'STA', 'EDU', 'TRAK', 'VOL' and 'EQU' respectively, cor responds to the seven project categories: aggregation, min ing, statistical modeling, education, user tracking, volunteer management system, and medical equipment.
According to Table 7, except for mining and medical equipment software, the dominant bug category is UI. One possible explanation can be the analyzed software projects have UIs, which may have contributed to the frequency of UI bugs. For mining software the dominant bug category is data bugs i.e., bugs that occur due to storing and processing of COVID19 data. For medical equipment software the dom inant bug category is dependency. We also notice algorithm bugs to be the second most frequent bug category for statis tical modeling software. Similar to prior work on machine learning (Thung et al., 2012), we expected algorithm bugs to be the most dominant category for statistical modeling. Sta tistical modeling software also have UIs for user interaction, and the count of UI bugs may have foreshadowed the count of algorithm bugs.

Rater Agreement and Verification
We report agreement rate for four steps: issue labeling, open coding, closed coding, and rater verification. Labeling issues as bugs: While labeling collected issue re ports as bug reports and nonbug reports the agreement rate is 96.5% and the Cohen's Kappa is 0.9. Open coding to identify bug categories: The first and sec ond author respectively, identified 9 and 10 categories. The agreement rate is 72.7%, and the Cohen's Kappa is 0.70, indi cating 'substantial' agreement (Landis and Koch, 1977). The first author identified 'database' as a category not identified by the second author. Upon discussion both authors agreed that 'database' is related to data storage and belongs to the data category. The second author identified two additional categories 'Public health data' and 'Type errors'. After dis cussing the definitions of all categories both authors agreed that 'Public health data' and 'Type errors' can respectively, be merged with data and syntax. Closed coding to quantify bug category frequency: Dur ing closed coding the first and second author mapped each project to an existing category. The agreement rate is 95.1% and the Cohen's Kappa is 0.93. The authors disagreed on the labeling of 27 bug reports, which are resolved through dis cussion. Rater verification: For the randomly selected 250 issue re ports we allocate an additional rater who manually identi fied which of the issue reports are bug reports and nonbug reports. The Cohen's Kappa between the additional rater and the first author is 0.80, indicating 'substantial' agree ment (Landis and Koch, 1977). The Cohen's Kappa between the additional rater and the second author is 0.84, indicating 'perfect' agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second au thor is respectively, 89.0% and 93.0%.
We have also measured the agreement rate between an ad ditional rater and the authors for categorizing bug reports. Cohen's Kappa between the additional rater and the first au thor for a randomly selected set of 250 bug reports is 0.65, indicating 'substantial' agreement (Landis and Koch, 1977). Cohen's Kappa between the additional rater and the second author for a randomly selected set of 250 bug reports is 0.68, indicating 'substantial' agreement (Landis and Koch, 1977). The agreement rate between the additional rater and the first and second author is respectively, 78.0% and 81.6%.

Resolution Time of Identified Bug Categories
We provide bug resolution time as measured in hours for all bug categories in Table 8. From Table 8 we observe that based on min and median bug resolution times security bugs take the longest to resolve, followed algorithm bugs. We also observe data bugs to take as long as 548 hours to resolve.
A breakdown of bug resolution time across the seven project categories is provided in Table 9. The 'All' row in    Table 9 shows the minimum, median, and maximum bug res olution time for all bug categories measured in hours.
In Table 9 we observe four instances where the minimum bug resolution time is less than 6 minutes (< 0.1 hours). One possible explanation can be practitioners' habit of opening a bug report after they have developed the fix for a bug (Wan et al., 2017; Thung et al., 2012. In such cases, practitioners notice the bug early, construct the fix for the bug, and then submit the bug report by opening and closing the bug report promptly.
Median bug resolution duration for each project type and bug category is provided in Table 10. 'AGG', 'MINE', 'STA', 'EDU', 'TRAK', 'VOL' and 'EQU' respectively, cor responds to the seven project categories: aggregation, min ing, statistical modeling, education, user tracking, volunteer management system, and medical equipment. We observe median bug resolution time to vary across bug categories as well as for project categories.

Answer to RQ3
: How similar are the iden tified bug categories to that with previ ously studied software projects?
We report our findings in Table 11. The 'Bug category' col umn presents the bug categories identified for COVID19 software projects, whereas, the 'Other software projects' col umn presents the software projects for which the bug cate gory was observed according to papers identified from our scoping review. We observe bug categories for COVID19 software projects to also be observable for other categories of software projects, such as deep learning and automated vehicle.

Discussion
In this section, we first provide a summary of our findings in Section 5.1. Next, we provide a discussion on the implica tions of our findings in Section 5.2.   Table 11. Comparison of bug categories of COVID19 software projects with that of other software project categories.

Bug category Other software projects Security
IaC , OSS GitHub projects (Ray et al., 2014)

Syntax
IaC , deep learning ( IaC

Implications
We discuss the implications of our findings below: Security and privacy implications of user tracking soft ware: From Table 3 we observe 9 projects to be related with user tracking. While the benefits of user tracking software have been documented for countries, such as Russia and South Korea (Crowell Morning, 2020), this category of soft ware can have negative impacts on privacy of endusers. Data generated from user tracking software can be leveraged for marketing purposes. We make the following recommenda tions to preserve privacy of user data in user tracking soft ware: • Policymakers should construct policies specific to COVID19 software that collects user data. • Practitioners who develop user tracking software should leverage existing privacy policy frameworks, such as the 'National Institute of Standards and Technology (NIST) Privacy Framework' 2020. • Privacy researchers can build tools that will automati cally detect and report privacy policy violations.
Evidence from Table 7 shows that security bugs to exist for user tracking software. We advocate security researchers to systematically investigate if user tracking software includes security bugs. Recent news articles suggest that user track ing software, such as contract tracing apps may become more and more prevalent as Apple and Google are already provid ing frameworks to build software that tracks user data (Ap ple, 2020). Our hypothesis is that availability of these frame works will facilitate rapid development and deployment of mobile apps that collect user data. Security weaknesses in these apps can provide malicious users opportunity to con duct largescale data breaches. We notice anecdotal evidence in this regard: a researcher has identified vulnerabilities in a user tracking app that could leak user location data (Green berg, 2020). Panelists at EuroCrypt 2020, a cryptography research conference, discussed limitations of user tracking mobile apps for COVID19 with respect to API design, in door location tracking, and informing users about privacy risks (EuroCrypt, 2020a) (EuroCrypt, 2020b).
Towards constructing correct statistical models: From Section 4.2.1 we have observed statistical modeling bugs to exist. Bugs related to statistical modeling can be conse quential because based on the predictions generated by sta tistical models, policymakers enforce public health policies. One possible explanation for buggy statistical models can be attributed to the quality of datasets using which statistical models are build (Koerth et al., 2020). For example, fatality prediction models that are built using the 'Diamond Princess Cruise Ship Dataset' may not be applicable for a specific geo graphic region with low population density. Another possible explanation can be a lack of context and knowledge related to public health specific that hinders model builders to identify appropriate independent variables to construct the models. Incorrect estimation of hospital beds from our discussion in Section 4.2.1 is one example. Other examples of independent variables related to public health includes staff availability, count of known cases, hospitalization rate etc. (Attia, 2020). According to a health expert (Attia, 2020), statistical mod els that predicted 2.4 million US residents to die, assumed a hospitalization rate of 1520%, which in reality was 5%.
Based on our findings and abovementioned explanations we make two recommendations: • Automated testing for COVID19 modeling: We hope to see novel research in the domain of COVID19 that will test the correctness of constructed statistical models used in forecasting in an automated manner. In recent years, we have seen research efforts that test deep learn ing models (Tian et al., 2018; Pei et al., 2017; Ma et al., 2018. We expect similar research pursuits for COVID 19 statistical modeling. • Better synergies between data science and public health practitioners: Construction and verification of COVID 19 statistical modeling should involve practitioners from public health and data science. Public health prac titioners within a specific locality can provide necessary context that data scientists can incorporate in their sta tistical models.

Implications for Educators:
Our findings have implica tions for educators involved in teaching the following topics: • Data science: Educators who teach data science can use the examples of statistical modeling bugs to highlight the value of considering the full context and related lim itations that accompany statistical modeling. • Information security and privacy: User tracking soft ware can be discussed in information security and pri vacy courses to demonstrate the value of protecting user data. Such discussion can also include privacy policy frameworks that are already in place, such as the NIST Privacy framework (National Institute of Standard and Technology, 2020). • Software engineering: Our categorization of bugs re lated to COVID19 software development can be dis cussed to demonstrate that understanding and repair of bugs requires contextualization.
Benchmark for practitioners and researchers: Ta bles 6-10 can be used as a measuring stick by practitioners and researchers who are involved with COVID19 software projects. Practitioners can estimate their bug resolution ef forts by comparing median resolution times for bugs in their COVID19 software projects to that of Tables 8, 9, and 10.
Compared to prior work related to blockchain and machine learning (Thung et al., 2012; Wan et al., 2017, median bug resolution time is lower for COVID19 software projects. We provide two possible explanations: one possible explanation can be related to the sense of urgency. Practitioners may have realized that bugs in COVID19 software projects could ham per the analysis or mitigation of COVID19, and therefore, needs immediate attention. Another possible explanation can be the limitations of our dataset. The age of our software projects does not exceed four months and that may have bi ased median bug resolution time. We advocate for future re search that will confirm or refute our explanations.
Recurrencerelated implications: Researchers (Kissler et al., 2020; Chen et al., 2020 have provided evidence that support the recurring nature of COVID19. About the re currence of COVID19 Kissler et al. 2020 stated "a resur gence in contagion could be possible as late as 2024.". We hypothesize that COVID19's recurrence will lead to more COVID19 software building. Whether or not our findings hold for these newly constructed COVID19 software can be validated through a replication of our paper. We expect to observe more categories of COVID19 software projects as well as more bug categories.

Differences between COVID19 Software Projects and Other Software Projects
We provide the differences that we have noticed between COVID19 software projects and other software projects, which we discuss in the following subsections:

Differences in Bug Manifestation
A nonCOVID19 software project does not have the con text of public health consequences that are associated with a COVID19 software project. We define a COVID19 soft ware project to be a software project that is related with an alyzing and mitigating the consequences of COVID19. By definition, we include software projects that directly captures the consequences related to public health, which is absent from a traditional software project. We observe empirical ev idence that shows the unique context of COVID19 to yield differences in bugs and bug resolution time when compared with other software projects. Let us consider the case of algorithm bugs. Algorithm bugs manifest in COVID19 projects as well as in machine learn ing and autonomous vehicle projects. A machine learning project that uses statistical modeling can have algorithm bugs that generates erroneous predictions. For a COVID19 soft ware project that predicts death rates, a bug related to the modeling algorithm can have serious consequences, as pub lic health policies are derived based on these models, as it oc curred during incorrect estimation of hospitalization rate (At tia, 2020). As discussed in Section 4.3 algorithmrelated bugs also appear for autonomous vehicles but presence of such bugs manifest in components unique to autonomous vehicle projects, such as lane positioning and navigation, and traffic light processing.
We have observed that data bugs appear for both deep learning projects and COVID19 software projects. The dif ference is for COVID19 we have the concepts of location, as practitioners tend to miss important locationrelated data for COVID19, e.g., not able to identify states in India that are observing an outbreak of COVID19. In the case of deep learning projects, data bugs are related with structure and type of training data.
As another example, dependencyrelated bugs appear for both IaC scripts and COVID19 software projects. In the case of IaC, dependencyrelated bugs are related to an IaCrelated artifact, such as Puppet manifest, class, or a module, upon which execution of an IaC script is dependent upon . For COVID19 software project dependencies are related with API and build artifacts, such as Maven depen dencies. This difference with respect to dependent artifacts also highlight the differences between COVID19 software projects and IaCbased software projects.
In short, our findings suggest that while commonalities for bug categories between COVID19 software projects and other software projects, the manifestation and artifacts re lated to the bug categories are different from other categories of software projects.

Difference in Bug Resolution Time
Our findings indicate that median bug resolution time is lower for OVID19 software projects than that of blockchain and machine learning projects. Based on our findings, we conjecture that the sense of urgency might have motivated practitioners to fix bugs in COVID19 software projects.

Differences with Existing Healthcarerelated Soft ware Projects
Our findings also demonstrate differences between COVID 19 software projects and other projects related to healthcare domain. To illustrate these differences we use Janamanchi et al. 2009's work. Janamanchi et al. 2009 studied 174 open source software projects related to the health domain and identified 11 categories of software projects that do not in clude the three categories of projects that we have iden tified for COVID19 software projects: volunteer manage ment, user tracking, and education. The inception and spread of COVID19 have motivated software practitioners to cre ate a wide range of software projects, such as projects related to user tracking and volunteer management so that people are aware about the consequences and hygiene practices related to COVID19. In the context of COVID19 software projects, projects related to user tracking focus on tracking user loca tion data emitted from smartphones to assess the proximity of individuals who might be exposed to COVID19. Software projects related to volunteer management are related with managing volunteers to address COVID19related societal issues, such as food banking. A pandemic of this nature was not experienced by health professionals prior to 2020. Exist ing research related to software projects that belong to health domain were not able to perform characterization of COVID 19 software projects and identify project categories unique to COVID19. Janmanchi et al. 2009 did not systematically study the types of bugs that appear in health care software projects. Our paper complements Janamanchi et al. 2009's work by studying healthcarerelated projects that are related with COVID19 by characterizing the bugs and the types of software projects related to COVID19 in which the bugs ap pear in.

Threats to Validity
We describe the limitations of our paper as following: Conclusion validity: We have used raters who derived the software and bug categories. Both raters are authors of the paper. Our derived categories are susceptible to the authors' bias. We mitigate this limitation by allocating another rater who is not the author of the paper who verified our ratings.
Our categories might not be comprehensive because our categorization for projects and bugs is limited to the dataset that we collected. The bug resolution time could be limiting as our dataset includes projects that have a duration of four months.
We use the topic 'covid19' to identify and filter COVID 19 software projects from GitHub. Any software project that is not labeled as 'covid19' will not be included in our dataset.
Our datasets have limited lifetime as the COVID19 was discovered in December 2019, and the lack of maturity in our datasets may influence our analysis. We mitigate this limita tion by identifying projects using a filtering criteria so that we can identify projects with sufficient development activ ity.
Internal validity: For RQ1 and RQ2 we use ourselves, the authors of the paper, as raters who conduct open and closed coding on README files and bug reports. Our research is susceptible to monomethod bias, as our categorization and labeling may be influenced by the authors' implicit expecta tions and hypotheses about the study.
External validity: Our findings are not comprehensive. We have not analyzed projects hosted outside GitHub and private projects hosted on GitHub. We mitigate this limita tion by analyzing 129 software projects that belong to 7 cat egories. Also, as we have used open coding to determine cat egories, our findings may not be identified by other raters. We mitigate this limitation by conducting rater verification, where we use a rater who is not the author of the paper.

Conclusion
The COVID19 pandemic has impacted people all over the world causing thousands of deaths. Software practitioners have joined the fight in combating the spread and mitigating the dire consequences of COVID19. An understanding of COVID19 software categories and software bugs can give us clues on how the software engineering community can help even further in combating COVID19.
We conduct an empirical study with 129 COVID19 soft ware projects hosted on GitHub. We identify 7 categories of software projects: aggregation, mining, statistical models, ed ucation, volunteer management, user tracking, and medical equipment. By applying open coding on 550 bug reports, we identify 8 categories of bugs: algorithm, data, dependency, documentation, performance, security, syntax, and UI. We observe bug category frequency to vary with project cate gories, e.g., for mining projects datarelated bugs is the most frequently occurring category.
Our findings have implications for educators, practition ers, and researchers. Educators can use our categorization of COVID software projects and related bugs to educate stu dents about the security and privacy implications of COVID 19 software. Privacy researchers can build tools that will check if user tracking software related to COVID19 are not leaking user data. Practitioners in the data science do main can learn from our categorization of statistical model ing bugs to understand limitations of constructed statistical models and verify underlying assumptions that accompany constructed statistical models. Based on our findings we also advocate for better synergies between data scientists and pub lic health experts so that statistical modeling bugs can be miti gated. We hope our paper will advance further research in the domain of COVID19 software.