Investigating the Use of Artificial Intelligence in Python Projects Hosted on GitHub

  • Luiz Andre do Nascimento Ubaldo IFPR
  • Jailton Coelho IFPR

Abstract


Artificial Intelligence (AI) has evolved significantly in recent years. Despite the growing popularization of AI, has it also been incorporated into the development of open-source projects in recent years? Motivated by this question, a study with 15,770 Python repositories was conducted. The results showed that the most used Python libraries for AI were TensorFlow, OpenCV, and Scikit-Learn. It was also observed that 12% of the projects have at least one dependency on an AI-related library. Finally, it was observed that the countries with the highest number of Python projects related to AI are China, the United States, and Germany.

References

Aghili, R., Li, H., and Khomh, F. (2023). Studying the characteristics of aiops projects on github. Empirical Software Engineering, 28(6):143.

Borges, H., Hora, A., and Valente, M. T. (2016). Understanding the factors that impact the popularity of GitHub repositories. In 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 334–344.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., et al. (2023). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.

Coelho, J. (2023). Crescendo, sobrevivendo ou morrendo? explorando a comunidade dos projetos brasileiros no github. In Anais do XX Congresso Latino-Americano de Software Livre e Tecnologias Abertas, pages 218–221. SBC.

Coelho, J., Valente, M. T., Milen, L., and Silva, L. L. (2020). Is this GitHub project maintained? measuring the level of maintenance activity of open-source projects. Information and Software Technology, 122:106274.

Coelho, J., Valente, M. T., Silva, L. L., and Shihab, E. (2018). Identifying unmaintained projects in GitHub. In 12th International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–10.

Dakhel, A. M., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M. C., and Jiang, Z. M. J. (2023). Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software, 203:111734.

Fan, W., Zhao, Z., Li, J., Liu, Y., Mei, X., Wang, Y., Tang, J., and Li, Q. (2023). Recommender systems in the era of large language models (llms). arXiv preprint arXiv:2307.02046.

Gomes, R. M. and Baunach, M. (2019). Code generation from formal models for automatic rtos portability. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 271–272. IEEE.

Gonzalez, D., Zimmermann, T., and Nagappan, N. (2020). The state of the ml-universe: 10 years of artificial intelligence & machine learning software development on github. In Proceedings of the 17th International conference on mining software repositories, pages 431–442.

Peng, S., Kalliamvakou, E., Cihon, P., and Demirer, M. (2023). The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590.

Pina, D., Goldman, A., and Seaman, C. (2022). Sonarlizer xplorer: a tool to mine github projects and identify technical debt items using sonarqube. In Proceedings of the International Conference on Technical Debt, pages 71–75.

Shin, J. and Nam, J. (2021). A survey of automatic code generation from natural language. Journal of Information Processing Systems, 17(3):537–555.

Slowik, A. and Kwasnicka, H. (2020). Evolutionary algorithms and their applications to engineering problems. Neural Computing and Applications, 32:12363–12379.

Tang, J. (2018). Intelligent Mobile Projects with TensorFlow: Build 10+ Artificial Intelligence Apps Using TensorFlow Mobile and Lite for IOS, Android, and Raspberry Pi. Packt Publishing Ltd.

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., and Ting, D. S. W. (2023). Large language models in medicine. Nature medicine, 29(8):1930–1940.

Wong, M.-F., Guo, S., Hang, C.-N., Ho, S.-W., and Tan, C.-W. (2023). Natural language generation and understanding of big code for ai-assisted programming: A review. Entropy, 25(6):888.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.
Published
2024-09-30
UBALDO, Luiz Andre do Nascimento; COELHO, Jailton. Investigating the Use of Artificial Intelligence in Python Projects Hosted on GitHub. In: WORKSHOP ON SOFTWARE VISUALIZATION, EVOLUTION AND MAINTENANCE (VEM), 12. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 13-22. DOI: https://doi.org/10.5753/vem.2024.3811.