Open-Source Software Projects Curating Model for Empirical Software Engineering Studies
Software projects are common inputs in Empirical Software Engineering (ESE), and they are often selected without following a specific strategy, leading to biased samples. To avoid this problem, researchers choose to use publicly available datasets instead of picking the projects themselves. However, some datasets are not maintained, containing old versions of projects, or even deprecated ones. This may raise some representativeness issues due to major changes in development practices and technologies over time. The main goal of this research is to develop a procedures model to construct and maintain a software project dataset with their product quality metrics, to support the development of ESE studies.
Baltes, S. and Ralph, P. (2020). Sampling in Software Engineering Research: A Critical Review and Guidelines. https://arxiv.org/abs/2002.07764v5
Garvin, D. A. (1984). What Does "Product Quality" Really Mean? MIT Sloan Management Review, 25–43. https://sloanreview.mit.edu/article/what-does-product-quality-really-mean/
Irrazábal, E., Vásquez, F., Díaz, R. and Garzás, J. (2011). Applying ISO/IEC 12207:2008 with SCRUM and Agile Methods. Communications in Computer and Information Science, 155 CCIS, 169–180. https://doi.org/10.1007/978-3-642-21233-8_15.
Johannesson, P. and Perjons, E. (2014). An introduction to design science (Vol. 10, pp. 978-3). Cham: Springer.
Jureczko, M. and Madeyski, L. (2010). Towards identifying software project clusters with regard to defect prediction. ACM International Conference Proceeding Series, 1. https://doi.org/10.1145/1868328.1868342
Kitchenham, B. (2004). Procedures for Performing Systematic Reviews. Keele University, 33, 1–26. https://www.researchgate.net/publication/228756057
Kitchenham, B. and Pfleeger, S. L. (1996). Software quality: the elusive target. IEEE Software, 13(1), 12–21. https://doi.org/10.1109/52.476281
Lehman, M. M. (1996). Laws of software evolution revisited. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1149, 108–124. https://doi.org/10.1007/BFb0017737
Lewin, K. (1947). Frontiers in Group Dynamics: Concept, Method and Reality in Social Science; Social Equilibria and Social Change. Human Relations, 1(1), 5–41. https://doi.org/10.1177/001872674700100103
Lewowski, T. and Madeyski, L. (2020). Creating Evolving Project Data Sets in Software Engineering. Studies in Computational Intelligence, 851, 1–14. https://doi.org/10.1007/978-3-030-26574-8_1
Mockus, A., Fielding, R. T. and Herbsleb, J. D. (2002). Two case studies of open source software development. ACM Transactions on Software Engineering and Methodology (TOSEM), 11(3), 309–346. https://doi.org/10.1145/567793.567795
Munaiah, N., Kroh, S., Cabrey, C. and Nagappan, M. (2017). Curating GitHub for engineered software projects. Empirical Software Engineering, 22(6), 3219–3253. https://doi.org/10.1007/s10664-017-9512-6
Nagappan, M., Zimmermann, T. and Bird, C. (2013). Diversity in Software Engineering Research. Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013. https://doi.org/10.1145/2491411
Palomba, F., Bavota, G., Penta, M. Di, Fasano, F., Oliveto, R., & Lucia, A. De. (2018). On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Software Engineering, 23(3), 1188–1221. https://doi.org/10.1007/S10664-017-9535-Z/TABLES/11
Parnas, D. L. (2001). Some software engineering principles. In Software fundamentals: collected papers by David L. Parnas (pp. 257–266). https://dl.acm.org/doi/10.5555/376584.376632
Petersen, K., Feldt, R., Mujtaba, S. and Mattsson, M. (2008). Systematic Mapping Studies in Software Engineering. Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, 68–77. https://www.splc.net
Rahman, M. M. and Roy, C. K. (2018). Improving IR-based bug localization with context-aware query reformulation. ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 621–632. https://doi.org/10.1145/3236024.3236065.
Seaman, C. B. (1999). Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering, 25(4), 557–572. https://doi.org/10.1109/32.799955
Shepperd, M., Song, Q., Sun, Z. and Mair, C. (2013). Data quality: Some comments on the NASA software defect datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215. https://doi.org/10.1109/TSE.2013.11
Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H. and Noble, J. (2010). The Qualitas Corpus: A curated collection of Java code for empirical studies. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 336–345. https://doi.org/10.1109/APSEC.2010.46
Vidal, S. A., Bergel, A., Marcos, C. and Díaz-Pace, J. A. (2016). Understanding and addressing exhibitionism in Java empirical research about method accessibility. Empirical Software Engineering, 21(2), 483–516. https://doi.org/10.1007/s10664-015-9365-9
Vidal, S., Bergel, A., Díaz-Pace, J. A. and Marcos, C. (2016). Over-exposed classes in Java: An empirical study. Computer Languages, Systems & Structures, 46, 1–19. https://doi.org/10.1016/J.CL.2016.04.001
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media.