Tie Strength Metrics to Rank Pairs of Developers from GitHub

Authors

  • Natércia A. Batista Universidade Federal de Minas Gerais
  • Guilherme A. Sousa Universidade Federal de Minas Gerais
  • Michele A. Brandão Instituto Federal de Minas Gerais
  • Ana Paula C. da Silva Universidade Federal de Minas Gerais
  • Mirella M. Moro Universidade Federal de Minas Gerais

DOI:

https://doi.org/10.5753/jidm.2018.1637

Keywords:

Metrics, Social Networks, Web Data, Web Software Repositores

Abstract

The Web provides huge volumes of data, which makes efficient data collecting and processing not easy tasks. An example of such volumes is in software repositories, a type of Web storage platform for software and projects,
their developers and companies. In this work, we first present a systematic literature review over topics related to such repositories. Then, we extract their data and enrich it by building a development network. Based on such a network, we investigate tie strength metrics on their capability of defining new information through a correlation analysis. We also use the metrics to rank pairs of developers by considering three different aggregate methods. Our experimental analysis shows different results for each ranking method when considering all pairs of developers, which reveals the difficulty of choosing the best way to rank pairs of developers. However, when considering the top 10 best ranked pairs, two methods present similar results. Also, the combination of tie strength metrics with ranking aggregated methods allows to identify important developers in the network and their collaboration strength.

Downloads

Download data is not yet available.

Author Biography

Mirella M. Moro, Universidade Federal de Minas Gerais

Mirella M. Moro is an assistant professor at the Computer Science department at UFMG (Belo Horizonte, Brazil). She holds a Ph.D. in Computer Science (University of California Riverside - UCR, 2007), and MSc and BSc in Computer Science as well (UFRGS, Brazil, 2001, 1999). She is the Education Director of SBC (Brazilian Computer Society) and is the editor-in-chief of the new electronic magazine SBC Horizontes, which focuses on career in Computer Science. She is also a member of the ACM Education Council, ACM SIGMOD, ACM SIGCSE, ACM-W, IEEE, IEEE WIE, and MentorNet. Mirella has been working with research in Computer Science in the area of Databases since 1997. Her research interests include hybrid XML/relational modeling, XML query optimization, stream processing, content-based dissemination systems, temporal databases, versioning management, and schema evolution.

References

Alves, G. B., Brandão, M. A., Marques, D., Silva, A. P. C., and Moro., M. M. The strength of social coding collaboration on github. In Brazilian Symposium on Databases - Short Papers. Salvador, Brazil, pp. 247–252, 2016.

Anvik, J., Hiew, L., and Murphy, G. C. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering. ICSE ’06. ACM, New York, NY, USA, pp. 361–370, 2006.

Barabási, A.-L. and Albert, R. Emergence of scaling in random networks. Science 286 (5439): 509–512, 1999.

Bartusiak et al, R. Cooperation prediction in github developers network with restricted boltzmann machine. In ACIIDS. Vietnam, pp. 96–107, 2016.

Batista, N. A., Alves, G. B., Gonzaga, A. L., and Brandão, M. A. GitSED: Um Conjunto de Dados com Informações Sociais Baseado no GitHub. In Braz. Symp. on Databases, Dataset Showcase Work. pp. 224–233, 2017a.

Batista, N. A., Brandão, M. A., Alves, G. B., da Silva, A. P. C., and Moro, M. M. Collaboration strength metrics and analyses on github. In Procs. Int’l Conf. on Web Intelligence. Leipzig, Germany, 2017b.

Batista, N. A., Brandão, M. A., da Silva, A. P. C., and Moro, M. M. Aspectos Temporais para Medir a Força da Colaboração no GitHub. In Brazilian Symposium on Databases - Short Papers. pp. 234–239, 2017c.

Begel, A., DeLine, R., and Zimmermann, T. Social media for software engineering. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. Santa Fe, New Mexico, USA, pp. 33–38, 2010.

Biazzini, M. and Baudry, B. "may the fork be with you": Novel metrics to analyze collaboration on github. In Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics. WETSoM 2014. ACM, New York, NY, USA, pp. 37–43, 2014.

Bissyandé, T. F., Thung, F., Lo, D., Jiang, L., and Réveillère, L. Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference. COMPSAC ’13. IEEE Computer Society, Washington, DC, USA, pp. 303–312, 2013.

Brandão, M. A. and Moro, M. M. Social professional networks: A survey and taxonomy. Computer Communications vol. 100, pp. 20 – 31, 2017.

Brandão, M. A. and Moro, M. M. Strength of co-authorship ties in clusters: a comparative analysis. In Procs. of Alberto Mendelzon International Workshop on Foundation of Databases and the Web. Montevideo, Uruguai, pp. 1–10, 2017a.

Brandão, M. A. and Moro, M. M. The strength of co-authorship ties through different topological properties. Journal of the Brazilian Computer Society 23 (1): 5, 2017b.

Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., and Khalil, M. Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software 80 (4): 571–583, 2007.

Casalnuovo, C. et al. Developer onboarding in github: The role of prior social links and language experience. In Procs. Joint Meeting on Foundations of Software Engineering. Bergamo, Italy, pp. 817–828, 2015.

Cosentino, V., Izquierdo, J. L. C., and Cabot, J. A systematic mapping study of software development with github. IEEE Access vol. 5, pp. 7173–7192, 2017.

Dabbish, L. et al. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Computer Supported Cooperative Work. Seattle, USA, pp. 1277–1286, 2012.

Easley, D. and Kleinberg, J. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010.

Emerson, P. The original borda count and partial voting. Social Choice and Welfare 40 (2): 353–358, 2013.

Ganjisaffar, Y., Caruana, R., and Lopes, C. V. Bagging gradient-boosted trees for high precision, low variance ranking models. In Procs. International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China, pp. 85–94, 2011.

Grechanik, M., Fu, C., Xie, Q., McMillan, C., Poshyvanyk, D., and Cumby, C. A search engine for finding highly relevant applications. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1. ICSE ’10. ACM, New York, NY, USA, pp. 475–484, 2010.

Guendouz, M., Amine, A., and Hamou, R. M. Open source projects recommendation on github. In Optimizing Contemporary Application and Processes in Open Source Software. IGI Global, pp. 86–101, 2018.

Hahn, J., Moon, J. Y., and Zhang, C. Impact of social ties on open source project team formation. In IFIP international conference on open source systems. Springer, pp. 307–317, 2006.

Han, Y., Wan, Y., Chen, L., Xu, G., and Wu, J. Exploiting geographical location for team formation in social coding sites. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp. 499–510, 2017.

Kitchenham, B. Procedures for performing systematic reviews. Keele, UK, Keele University 33 (2004): 1–26, 2004.

Kitchenham, B., Brereton, O. P., Budgen, D., Turner, M., Bailey, J., and Linkman, S. Systematic literature reviews in software engineering–a systematic literature review. Info. and Software Technology 51 (1): 7–15, 2009.

Majumder, A., Datta, S., and Naidu, K. Capacitated team formation problem on social networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’12. ACM, New York, NY, USA, pp. 1005–1013, 2012.

McDonald, N. and Goggins, S. Performance and participation in open source software on github. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems. CHI EA ’13. ACM, New York, NY, USA, pp. 139–144, 2013.

Mockus, A. Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In 2009 6th IEEE International Working Conference on Mining Software Repositories(MSR). Vol. 00. pp. 11–20, 2009.

Morais, D. C. and de Almeida, A. T. Group decision making on water resources based on analysis of individual rankings. Omega 40 (1): 42 – 52, 2012.

Nurmi, H. Voting procedures: A summary analysis. British Journal of Political Science 13 (2): 181–208, 1983.

Rocha, L. M. A., Silva, T. H. P., and Moro, M. M. Análise da Contribuição para Código entre Repositórios do GitHub. In Brazilian Symposium on Databases - Short Papers. pp. 103–108, 2016.

Thung, F., Bissyande, T. F., Lo, D., and Jiang, L. Network structure of social coding in github. In Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering. CSMR ’13. IEEE Computer Society, Washington, DC, USA, pp. 323–326, 2013.

Tsay, J., Dabbish, L., and Herbsleb, J. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In International Conference on Software Engineering. Hyderabad, India, pp. 356–366, 2014.

Vasilescu, B., Filkov, V., and Serebrenik, A. Perceptions of diversity on github: A user survey. In Proceedings of the Eighth International Workshop on Cooperative and Human Aspects of Software Engineering. CHASE ’15. IEEE Press, Piscataway, NJ, USA, pp. 50–56, 2015.

Wan, Y., Chen, L., Xu, G., Zhao, Z., Tang, J., and Wu, J. Scsminer: mining social coding sites for software developer recommendation with relevance propagation. World Wide Web, 2018.

Wu, W., Zhang, W., Yang, Y., and Wang, Q. Drex: Developer recommendation with k-nearest-neighbor search and expertise ranking. In Proceedings of the 2011 18th Asia-Pacific Software Engineering Conference. APSEC ’11. IEEE Computer Society, Washington, DC, USA, pp. 389–396, 2011.

Young, H. P. Condorcet’s theory of voting. American Political science review 82 (4): 1231–1244, 1988.

Yu, Y., Wang, H., Yin, G., and Wang, T. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology vol. 74, pp. 204 – 218, 2016.

Downloads

Published

2018-06-20

How to Cite

Batista, N. A., Sousa, G. A., Brandão, M. A., da Silva, A. P. C., & Moro, M. M. (2018). Tie Strength Metrics to Rank Pairs of Developers from GitHub. Journal of Information and Data Management, 9(1), 69. https://doi.org/10.5753/jidm.2018.1637

Issue

Section

SBBD 2017