Mining Experts from Source Code Analysis: An Empirical Evaluation


  • Johnatan Alves Oliveira Federal University of Minas Gerais (UFMG)
  • Markos Viggiato University of Alberta
  • Denis Pinheiro Federal University of Minas Gerais (UFMG)
  • Eduardo Figueiredo Federal University of Minas Gerais (UFMG)



Library Experts, Software Skills, Expert Identification, Mining Software Repositories


Third-party libraries have been widely adopted in modern software projects due to several benefits, such as code reuse and software quality. Software development is increasingly complex and requires specialists with knowledge in several technologies, such as the nowadays libraries. Such complexity turns it extremely challenging to deliver quality software, given the time pressure. For this purpose, it is necessary to identify and hire qualified developers, to obtain a good team, both in open source and proprietary systems. For these reasons, enterprise and open source projects try to build teams composed of highly skilled developers in specific libraries. Developers with expertise in specific libraries may reduce the time spent on software development tasks and improve the quality of the final product. However, their identification may not be trivial. In this paper, we first argue that source code activities can be used to identify the hard skills of software developers, such as library expertise. We then evaluate a mining-based strategy to identify library experts. To achieve our goal, we selected the 9 most popular Java libraries and evaluated the skills of more than 1.5 million developers in these libraries by analyzing their commits in 16,703 Java projects on GitHub. We validated the results by applying a survey with 137 library expert candidates and observed, on average, 88% of precision for the applied strategy.


Download data is not yet available.


Alshuqayran, N., Ali, N., and Evans, R. (2016). A systematic mapping study in microservice architecture. In 9th International Conference on Service-Oriented Computing and Applications (SOCA), pages 44–51.

Basili, V., Caldiera, G., and Rombach, H. D. (1994). The Goal Question Metric Approach. Online Technical Report.

Begel, A., Khoo, Y. P., and Zimmermann, T. (2010). Codebook: discovering and exploiting relationships in software repositories. In 32nd International Conference on Software Engineering (ICSE), pages 125–134.

Brown, V. R. and Vaughn, E. D. (2011). The writing on the (facebook) wall: The use of social networking sites in hiring decisions. Journal of Business and psychology, 26(2):219.

Capiluppi, A., Serebrenik, A., and Singer, L. (2013). Assessing technical candidates on the social web. IEEE software, 30(1):45–51.

Constantinou, E. and Kapitsaki, G. M. (2016). Identifying developers’ expertise in social coding platforms. In 42th Euromicro Conf. on Software Engineering and Advanced Applications (SEAA), pages 63–67.

Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). Social coding in github: Transparency and collaboration in an open software repository. In 12th Proc. of the Conf. on Computer Supported Cooperative Work (CSCW), pages 1277–1286.

Damasiotis, V., Fitsilis, P., Considine, P., and O’Kane, J. (2017). Analysis of software project complexity factors. In Proc. of the 2017 International Conf. on Management Engineering, Software Engineering and Service Sciences, pages 54–58.

Destefanis, G., Ortu, M., Counsell, S., Swift, S., Marchesi, M., and Tonelli, R. (2016). Software development: do good manners matter? PeerJ Computer Science, 2(2):1–10.

Easterbrook, S., Singer, J., Storey, M.-A., and Damian, D. (2008). Selecting empirical methods for software engineering research. In Guide to advanced empirical software engineering, pages 285–311.

Ferreira, M., Mombach, T., Valente, M. T., and Ferreira, K. (2019). Algorithms for estimating truck factors: A comparative study. Software Quality Journal, 1(27):1–37.

Garcia, V. C., Lucrédio, D., Alvaro, A., Almeida, E. S. D., de Mattos Fortes, R. P., and de Lemos Meira, S. R. (2007). Towards a maturity model for a reuse incremental adoption. In 7th Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS), pages 61–74.

Greene, G. J. and Fischer, B. (2016). Cvexplorer: Identifying candidate developers by mining and exploring their open source contributions. In 31st Int. Conf. on Automated Software Engineering (ASE), pages 804–809.

Joblin, M., Apel, S., Hunsen, C., and Mauerer, W. (2017). Classifying developers into core and peripheral: An empirical study on count and network metrics. In 39th International Conference on Software Engineering (ICSE), pages 164–174.

Klock, S., van der Werf, J. M. E. M., Guelen, J. P., and Jansen, S. (2017). Workload-based clustering of coherent feature sets in microservice architectures. In 2017 IEEE International Conference on Software Architecture (ICSA), pages 11–20.

Krüger, J., Wiemann, J., Fenske, W., Saake, G., and Leich, T. (2018). Do you remember this source code? In 40th Proc. of the International Conf. on Software Engineering (ICSE), pages 764–775.

Ma, D., Schuler, D., Zimmermann, T., and Sillito, J. (2009). Expert recommendation with usage expertise. In International Conference on Software Maintenance (ICSM, pages 535–538.

Ma, W., Chen, L., Zhang, X., and Xu, Y. Z. . B. (2017). How do developers fix cross-project correlated bugs? a case study on the GitHub scientific python ecosystem. In 39th International Conference on Software Engineering (ICSE), pages 1–12.

Marlow, J. and Dabbish, L. (2013). Activity traces and signals in software developer recruitment and hiring. In 16th Proc. of the 2013 Conf. on Computer supported cooperative work (CSCW), pages 145–156.

McCuller, P. (2012). How to recruit and hire great software engineers: building a crack development team. Apress.

Mockus, A. and Herbsleb, J. D. (2002). Expertise browser: a quantitative approach to identifying expertise. In 24rd Proc. of the International Conf. on Software Engineering (ICSE), pages 503–512.

Moraes, A., Silva, E., da Trindade, C., Barbosa, Y., and Meira, S. (2010). Recommending experts using communication history. In 2nd International Workshop on Recommendation Systems for Software Engineering, page 41–45.

Oliveira, J., Fernandes, E., Souza, M., and Figueiredo, E. (2016). A method based on naming similarity to identify reuse opportunities. In 7th Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era - Volume 1, pages 41:305–41:312.

Oliveira, J., Pinheiro, D., and Figueiredo, E. (2020). Web site of the paper.

Oliveira, J., Viggiato, M., and Figueiredo, E. (2019). How well do you know this library? mining experts from source code analysis. In 18th Brazilian Symposium on Software Quality (SBES), pages 49–58.

Ortu, M., Adams, B., Destefanis, G., Tourani, P., Marchesi, M., and Tonelli, R. (2015). Are bullies more productive?: empirical study of affectiveness vs. issue fixing time. In 12th Proc. of the Working Conf. on Mining Software Repositories (MSR), pages 303–313.

Ortu, M., Destefanis, G., Counsell, S., Swift, S., Tonelli, R., and Marchesi, M. (2016). Arsonists or firefighters? affectiveness in agile software development. In 18th International Conf. on Agile Software Development (XP), pages 144–155.

Pahl, C. (2015). Containerization and the paas cloud. IEEE Cloud Computing, 2(3):24–31.

Pfleeger, S. L. and Kitchenham, B. A. (2001). Principles of survey research: Part 1: Turning lemons into lemonade. SIGSOFT Softw. Eng. Notes, 26(6):16–18.

Saxena, R. and Pedanekar, N. (2017). I know what you coded last summer: Mining candidate expertise from GitHub repositories. In 17th Companion of the Conf. on Computer Supported Cooperative Work and Social Computing (CSCW), pages 299–302.

Schuler, D. and Zimmermann, T. (2008). Mining usage expertise from version archives. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, pages 121––124.

Singer, L., Filho, F. F., Cleary, B., Treude, C., Storey, M.-A., and Schneider, K. (2013). Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In 13th Proc. of the Conf. on Computer supported cooperative work (CSCW), pages 103–116.

Sommerville, I. (2015). Software Engineering. Pearson.

Tong, J., Ying, L., Hongyan, T., and Zhonghai, W. (2016). Can we use programmer’s knowledge? fixing parameter configuration errors in hadoop through analyzing Q&A sites. In 5th IEEE Int. Congress on Big Data (BigData Congress), pages 478–484.

Tsui, F., Karam, O., and Bernal, B. (2016). Essentials of software engineering. Jones & Bartlett Learning.

Viggiato, M., Oliveira, J., Figueiredo, E., Jamshidi, P., and Kästner, C. (2019). Understanding similarities and differences in software development practices across domains. In 14th International Conference on Global Software Engineering (ICGSE), pages 74–84.

Wohlin, C., Runeson, P., Hst, M., Ohlsson, M. C., Regnell, B., and Wessln, A. (2012). Experimentation in Software Engineering. Springer Publishing Company, Incorporated.

Wu, W., Zhang, W., Yang, Y., and Wang, Q. (2011). Drex: Developer recommendation with k-nearest-neighbor search and expertise ranking. In 18th Asia-Pacific Software Engineering Conference, pages 389–396.

Ye, C. (2017). Research on the key technology of big data service in university library. In 13th Int. Conf. on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pages 2573–2578.




How to Cite

Oliveira, J. A., Viggiato, M., Pinheiro, D., & Figueiredo, E. (2021). Mining Experts from Source Code Analysis: An Empirical Evaluation. Journal of Software Engineering Research and Development, 9(1), 1:1 – 1:16.



Research Article

Most read articles by the same author(s)