How COVID-19 Impacted Data Science: a Topic Retrieval and Analysis from GitHub Projects’ Descriptions
Resumo
We present a data-driven research over code repositories that are data science oriented. The goal is to compare their topics of interest and evolution over the COVID-19 pandemic period by analyzing Jupyter Notebook and Python projects from a year before and during the pandemic. We employ a state-of-art algorithm to find topics based on the repositories descriptions, and compare the performance of tuning its hyperparameters for better accuracy.
Referências
Gonzalez, D. et al. (2020). The state of the ml-universe: 10 years of artificial intelligence & machine learning software development on github. In MSR, page 431–442.
Oliveira, G. P., Batista, N. A., Brandão, M. A., and Moro., M. M. (2018). Utilização de redes heterogêneas para medir a força dos relacionamentos no github. In SBBD.
Panichella, A. (2021). A systematic comparison of search-based approaches for lda hyperparameter tuning. Information and Software Technology, 130:106411.
Perkel, J. M. (2018). Why jupyter is data scientists’ computational notebook of choice. Nature, 563:145–146.
Pimentel, J. F., Oliveira, G. P., Silva, M. O., Seufitelli, D. B., and Moro, M. M. (2021). Ciênncia de dados com reprodutibilidade usando jupyter. In Jornada de Atualização em Informática 2021, pages 11–59. SBC.
Ralph, P. et al. (2020). Pandemic programming: How COVID-19 affects software developers and how their organizations can help. Empir. Softw. Eng., 25:4927–4961.
Röder, M., Both, A., and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In WSDM, pages 399–408.
Saraiva, M. C. and Medeiros, C. B. (2018). Correlating educational documents from different sources through graphs and taxonomies. In SBBD, pages 121–132.
Sharma, A. et al. (2017). Cataloging github repositories. In EASE, page 314–319.
Silveira, P. et al. (2021). A deep dive into the impact of covid-19 on software development. IEEE Transactions on Software Engineering.
Sipio, D. et al. (2020). A multinomial naïve bayesian (mnb) network to automatically recommend topics for github repositories. In Procs. EASE, page 71–80.
Tavares, A. C. R., Batista, N. A., and Moro., M. M. (2021). Greed: Github repositories and descriptions. Zenodo. DOI 10.5281/zenodo.5138079
Wang, L. et al. (2020). When the open source community meets covid-19: Characterizing covid-19 themed github repositories. ArXiv, 2010.12218.