Revisiting Aristotle vs. Ringelmann: The influence of biases on measuring productivity in Open Source software development

Christian Gut; Alfredo Goldman

doi:10.5753/sbes.2024.3275

Christian Gut USP
Alfredo Goldman USP

DOI: https://doi.org/10.5753/sbes.2024.3275

Resumo

Aristotle vs. Ringelmann was a discussion between two distinct research teams from the ETH Zürich who argued whether the productivity of Open Source software projects scales sublinear or superlinear with regard to its team size. This discussion evolved around two publications, which apparently used similar techniques by sampling projects on GitHub and running regression analyses to answer the question about superlinearity. Despite the similarity in their research methods, one team around Ingo Scholtes reached the conclusion that projects scale sublinear, while the other team around Didier Sornette ascertained a superlinear relationship between team size and productivity. In subsequent publications, the two authors argue that the opposite conclusions may be attributed to differences in project populations, since 81.7% of Sornette’s projects have less than 50 contributors. Scholtes, on the other hand, sampled specifically projects with more than 50 contributors. This publication compares the research from both authors by replicating their findings, thus allowing for an evaluation of how much project sampling actually accounted for the differences between Scholtes’ and Sornette’s results. Thereby, the discovery was made that sampling bias only partially explains the discrepancies between the two authors. Further analysis led to the detection of instrumentation biases that drove the regression coefficients in opposite directions. These findings were then consolidated into a quantitative analysis, indicating that instrumentation biases contributed more to the differences between Scholtes’ and Sornette’s work than the selection bias suggested by both authors.

Palavras-chave: Mining Software Repositories, Open Source, Empirical Software Engineering, Software Development Productivity, GitHub, Git, Economies of Scale, Diseconomies of Scale, Replication Study, Sampling Bias, Instrumentation Bias

Referências

Barry Boehm, Bradford Clark, Ellis Horowitz, Chris Westland, Ray Madachy, and Richard Selby. 1995. Cost models for future software life cycle processes: COCOMO 2.0. Annals of Software Engineering 1, 1 (Dec. 1995), 57–94. DOI: 10.1007/BF02249046

Frederick P. Brooks. 1995. The mythical man-month: essays on software engineering (anniversary ed ed.). Addison-Wesley Pub. Co, Reading, Mass.

Wladmir Araujo Chapetta and Guilherme Horta Travassos. 2020. Towards an evidence-based theoretical framework on factors influencing the software development productivity. Empirical Software Engineering 25, 5 (Sept. 2020), 3501–3543. DOI: 10.1007/s10664-020-09844-5

Andy Cockburn, Pierre Dragicevic, Lonni Besançon, and Carl Gutwin. 2020. Threats of a Replication Crisis in Empirical Computer Science – Communications of the ACM. [link]

The SciPy community. 2008. linregress — SciPy v1.14.0 Manual. [link]

Carlos Henrique C. Duarte. 2022. Software Productivity in Practice: A Systematic Mapping Study. Software 1, 2 (May 2022), 164–214. DOI: 10.3390/software1020008

Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think. Queue 19, 1 (Feb. 2021), 20–48. DOI: 10.1145/3454122.3454124.

Christoph Gote, Pavlin Mavrodiev, Frank Schweitzer, and Ingo Scholtes. 2022. Big data = big insights?: operationalising brooks’ law in a massive GitHub data set. In Proceedings of the 44th International Conference on Software Engineering. ACM, Pittsburgh Pennsylvania, 262–273. DOI: 10.1145/3510003.3510619

Christoph Gote, Ingo Scholtes, and Frank Schweitzer. 2019. git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada, 433–444. DOI: 10.1109/MSR.2019.00070

Christoph Gote and Christian Zingg. 2021. gambit – An Open Source Name Disambiguation Tool for Version Control Systems. [link] arXiv:2103.05666 [physics].

J. L. Hodges. 1958. The significance probability of the smirnov two-sample test. Arkiv för Matematik 3, 5 (Jan. 1958), 469–486. DOI: 10.1007/BF02589501

Ciera Jaspan and Caitlin Sadowski. 2019. No Single Metric Captures Productivity. In Rethinking Productivity in Software Engineering, Caitlin Sadowski and Thomas Zimmermann (Eds.). Apress, Berkeley, CA, 13–20. DOI: 10.1007/978-1-4842-4221-6_2

Amy J. Ko. 2019. Why We Should Not Measure Productivity. In Rethinking Productivity in Software Engineering, Caitlin Sadowski and Thomas Zimmermann (Eds.). Apress, Berkeley, CA, 21–26. DOI: 10.1007/978-1-4842-4221-6_3

Luigi Lavazza, Sandro Morasca, and Davide Tosi. 2018. An Empirical Study on the Factors Affecting Software Development Productivity. e-Informatica Software Engineering Journal 12 (2018), 27–49. DOI: 10.5277/E-INF180102 Medium: PDF Publisher: Institute of Applied Informatics, Wrocław University of Technology, Wrocław.

Thomas Maillart and Didier Sornette. 2019. Aristotle vs. Ringelmann: On superlinear production in open source software. Physica A: Statistical Mechanics and its Applications 523 (June 2019), 964–972. DOI: 10.1016/j.physa.2019.04.130

Goran Murić, Andres Abeliuk, Kristina Lerman, and Emilio Ferrara. 2019. Collaboration Drives Individual Productivity. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 1–24. DOI: 10.1145/3359176

Ingo Scholtes, Pavlin Mavrodiev, and Frank Schweitzer. 2016. From Aristotle to Ringelmann: a large-scale analysis of team productivity and coordination in Open Source Software projects. Empirical Software Engineering 21, 2 (April 2016), 642–683. DOI: 10.1007/s10664-015-9406-4

S. S. SHAPIRO and M. B. WILK. 1965. An analysis of variance test for normality (complete samples)†. Biometrika 52, 3-4 (Dec. 1965), 591–611. DOI: 10.1093/biomet/52.3-4.591 _eprint: [link].

Martin Shepperd, Nemitari Ajienka, and Steve Counsell. 2018. The role and value of replication in empirical software engineering results. Information and Software Technology 99 (July 2018), 120–132. DOI: 10.1016/j.infsof.2018.01.006

Forrest J. Shull, Jeffrey C. Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in Empirical Software Engineering. Empirical Software Engineering 13, 2 (April 2008), 211–218. DOI: 10.1007/s10664-008-9060-1

Didier Sornette, Thomas Maillart, and Giacomo Ghezzi. 2014. How Much Is the Whole Really More than the Sum of Its Parts? 1 + 1 = 2.5: Superlinear Productivity in Collective Group Actions. PLoS ONE 9, 8 (Aug. 2014), e103023. DOI: 10.1371/journal.pone.0103023

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 908–911. DOI: 10.1145/3236024.3264598

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. Van Der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul Van Mulbregt, SciPy 1.0 Contributors, Aditya Vijaykumar, Alessandro Pietro Bardelli, Alex Rothberg, Andreas Hilboll, Andreas Kloeckner, Anthony Scopatz, Antony Lee, Ariel Rokem, C. Nathan Woods, Chad Fulton, Charles Masson, Christian Häggström, Clark Fitzgerald, David A. Nicholson, David R. Hagen, Dmitrii V. Pasechnik, Emanuele Olivetti, Eric Martin, Eric Wieser, Fabrice Silva, Felix Lenders, Florian Wilhelm, G. Young, Gavin A. Price, Gert-Ludwig Ingold, Gregory E. Allen, Gregory R. Lee, Hervé Audren, Irvin Probst, Jörg P. Dietrich, Jacob Silterra, James T Webber, Janko Slavič, Joel Nothman, Johannes Buchner, Johannes Kulick, Johannes L. Schönberger, José Vinícius De Miranda Cardoso, Joscha Reimer, Joseph Harrington, Juan Luis Cano Rodríguez, Juan Nunez-Iglesias, Justin Kuczynski, Kevin Tritz, Martin Thoma, Matthew Newville, Matthias Kümmerer, Maximilian Bolingbroke, Michael Tartre, Mikhail Pak, Nathaniel J. Smith, Nikolai Nowaczyk, Nikolay Shebanov, Oleksandr Pavlyk, Per A. Brodtkorb, Perry Lee, Robert T. McGibbon, Roman Feldbauer, Sam Lewis, Sam Tygier, Scott Sievert, Sebastiano Vigna, Stefan Peterson, Surhud More, Tadeusz Pudlik, Takuya Oshima, Thomas J. Pingel, Thomas P. Robitaille, Thomas Spura, Thouis R. Jones, Tim Cera, Tim Leslie, Tiziano Zito, Tom Krauss, Utkarsh Upadhyay, Yaroslav O. Halchenko, and Yoshiki Vázquez-Baeza. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 3 (March 2020), 261–272. DOI: 10.1038/s41592-019-0686-2

Frank Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80–83. DOI: 10.2307/3001968 Publisher: [International Biometric Society, Wiley].