Simplifying Bibliometric Analysis with Python Streamlit
Resumo
Research Context: Bibliometric analysis is a research methodology that has become increasingly popular in recent years, in different domains of science. Many tools have been developed to assist researchers in making sense of this data. Scientific and Practical Problem: The tools available to perform bibliometric analysis today are mostly integrated only with Web of Science (WoS) and SCOPUS, limiting researchers who wish to diversify their sources. They do not offer any support in data preprocessing. Many scientists choose to use more than one tool to adapt to their needs, as rarely is one tool complete. Proposed Analysis: After studying the available software, packages, and articles on bibliometric analysis, we propose a new tool that can be adapted to multiple science databases, performs bibliometric analysis, supports Excel to facilitate data preprocessing, and is designed to be both user-friendly and accessible to a broad audience. Related IS Theory: Related IS Theory fields are Organizational knowledge creation and Information processing theory. Diffusion of innovations theory is also relevant. Research Method: Design Science Research (DSR) was applied to identify the necessities for bibliometric analysis not covered by the available tools and identify how we can improve these gaps in an efficient manner. Summary of Results: The resulting artifact can be a tool for scholars of various fields. The direct integration with Excel facilitates the preprocessing and integration with many databases. It facilitates the extraction of missing metadata and features a simple, intuitive interface. Contributions and Impact to IS area: This study proposes a novel approach to performing bibliometric analysis, utilizing multiple science databases and integrating all necessary visualization tools.
Referências
Barbon Jr, S., Tavares, G. M., and Kido, G. (2017). Artificial and natural topic detection in online social networks. iSys - Revista Brasileira de Sistemas de Informacao, 10(1):80–98.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008.
Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163–177.
Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6):066111.
Cordasco, G. and Gargano, L. (2011). Community detection via semi-synchronous label propagation algorithms. arXiv preprint arXiv:1103.4550. Journal reference: Int. J. of Social Network Mining, 2012, Vol. 1, No. 1, pp. 3–26.
Costa, A. P., Moresi, E. A. D., Pinho, I., and Halaweh, M. (2023). Integrating bibliometrics and qualitative content analysis for conducting a literature review. In 2023 24th International Arab Conference on Information Technology (ACIT), pages 1–8, Ajman, United Arab Emirates. IEEE.
Crossref (2025). Crossref rest api. [link]. Accessed: 2025-09-22.
de Sousa Araújo, G., Santana, E. E. C., Júnior, A. F. L. J., and Lobato, F. M. F. (2025). The artificial intelligence integration in the brazilian legal sector: A systematic review. In Anais do XXI Simpósio Brasileiro de Sistemas de Informação (SBSI 2025), pages 575–584, Porto Alegre. Sociedade Brasileira de Computação.
Ding, Y. (2009). Pagerank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60(11):2229–2243.
Dol, S. M. and Jawandhiya, P. M. (2024). Data visualization for the dataset collected from education sector using python. In 2024 1st International Conference on Communications and Computer Science (InCCCS), pages 1–6.
Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., and Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133:285–296.
Dorfman, R. (1979). A formula for the gini coefficient. The Review of Economics and Statistics, 61(1):146–149.
Ellegaard, O. and Wallin, J. A. (2015). The bibliometric analysis of scholarly production: How great is the impact? Scientometrics, 105(3):1809–1831.
Farris, F. A. (2010). The gini index and measures of inequality. The American Mathematical Monthly, 117(10):851–864.
Gastwirth, J. L. (1971). A general definition of the lorenz curve. Econometrica, 39(6):1037–1039.
Graeff, C. A., Farias, K., and Carbonera, C. E. (2023). On the prediction of software merge conflicts: A systematic review and meta-analysis. In Proceedings of the SBSI ’23: XIX Brazilian Symposium on Information Systems.
Guleria, H. V., Luqmani, A. M., Deo, S., Devendra, K. H., Sharma, K., Mishra, S., Bidwe, R. V., Zope, B., and Buchade, A. (2023). ’big news’ morgans: A chatbot for f1 news summarization. In 2023 International Conference on Integration of Computational Intelligent Systems (ICICIS), pages 1–6. IEEE.
Hagberg, Aric, Schult, Daniel, and Swart, Pieter (2008). NetworkX: Network Analysis in Python. Software available at [link].
Hakimi, S. L. (1964). Optimum locations of switching centers and the absolute centers and medians of a graph. Operations Research, 12(3):450–459.
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., and Jia, H. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622:178–210.
Kheddar, H. (2025). Transformers and large language models for efficient intrusion detection systems: A comprehensive survey. Information Fusion, 124:103347.
Koçak, M. and Akçalı, Z. (2025). The published role of artificial intelligence in drug discovery and development: a bibliometric and social network analysis from 1990 to 2023. Journal of Cheminformatics, 17(1):71.
Kudo, T. N., Bulcão-Neto, R. F., Vincenzi, A. M. R., Souza, É. F. D., and Felizardo, K. R. (2022). Using evidence from systematic studies to guide a phd research in requirements engineering: An experience report. Journal of Software Engineering Research and Development, 10(7):1–12.
Laurett, N. S. and Ribeiro, F. N. (2022). Caracterização das publicações e relações entre mídias alternativas polarizadas no facebook. In Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), pages 133–144.
Lemos, L. C., Ralha, C. G., Claro, D. B., Maciel, R. S. P., Argolo, A. A., and Linhares, C. D. G. (2024). A temporal network visualization and data analysis on two decades of sbsi. In Proceedings of the XX Brazilian Symposium on Information Systems (SBSI 2024), pages 1–12. Association for Computing Machinery.
Maz-Machado, A., Torralbo-Rodríguez, M., Vallejo-Ruíz, M., and Bracho-López, R. (2010). Análisis bibliométrico de la producción científica de la universidad de málaga en el social sciences citation index (1998-2007). Revista Española de Documentación Científica, 33(4):582–599.
Mioto, V. and Vignatti, A. L. (2025). Beyond boundaries: Collaboration networks and research output in brazilian computer science. In Brazilian Workshop on Social Network Analysis and Mining (BRA/SNAM), Curitiba, PR, Brazil.
Modak, N. M., Beydoun, G., Merigó, J. M., Rahimi, I., and Susilo, W. (2025). 40 years of computer standards & interfaces: A bibliometric retrospective. Computer Standards & Interfaces, 95:104046.
Monath, N., Dubey, A., Guruganesh, G., Zaheer, M., Ahmed, A., McCallum, A., Mergen, G., Najork, M., Terzihan, M., Tjanaka, B., Wang, Y., and Wu, Y. (2021). Scalable hierarchical agglomerative clustering. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ’21), pages 1245–1255. ACM.
Moral-Muñoz, J. A., Herrera-Viedma, E., Santisteban-Espejo, A., and Cobo, M. J. (2020). Software tools for conducting bibliometric analysis in science: An up-to-date review. El profesional de la información, 29(1):e290103.
Muhammad Qadir, H., Suleman, M. T., Khan, R. A., Sohaib, M., Hasan, M. J., and Hussain, S. A. (2025). Optimizing learning outcomes: a deep dive into hybrid ai models for adaptive educational feedback. Journal of Big Data, 12(1):144.
Neely, A. (2005). The evolution of performance measurement research: Developments in the last decade and a research agenda for the next. International Journal of Operations & Production Management, 25(12):1264–1277.
Negre, C. F. A., Morzan, U. N., Hendrickson, H. P., Pal, R., Lisi, G. P., Loria, J. P., Rivalta, I., Ho, J., and Batista, V. S. (2018). Eigenvector centrality for characterization of protein allosteric pathways. Proceedings of the National Academy of Sciences, 115(52):E12201–E12208.
Nielsen, F. (2016). Hierarchical Clustering, pages 195–211. Springer.
Nolêto, R. M. A., Nolêto, C., Santos, N. P. S., and Madeira, A. M. A. (2023). Inovações no reconhecimento e detecção de animais: Uma análise da literatura com Ênfase em redes neurais e aprendizado de máquina. In 16º Encontro Unificado de Computação do Piauí (ENUCOMPI), pages 33–40, Piripiri, PI, Brasil. Sociedade Brasileira de Computação.
Pan, W., Jian, L., and Liu, T. (2019). Grey system theory trends from 1991 to 2018: a bibliometric analysis and visualization. Scientometrics, 121(3):1407–1434.
Passas, I. (2024). Bibliometric analysis: The main steps. Encyclopedia, 4(2):1014–1025.
Peffers, K., Tuunanen, T., Rothenberger, M. A., and Chatterjee, S. (2008). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3):45–77.
Soares, R. H. S., Fernandes, J. H. C., and Sampaio, R. (2019). Formal information flows among top authorities of the brazilian federal government based on co-word analysis of data published in the official gazette. In Anais do Brazilian Workshop on Social Network Analysis and Mining, pages 1–6. Sociedade Brasileira de Computação.
Streamlit (2025). Streamlit. [link]. Accessed: 2025-09-22.
Tenório, K., Santos, J., Accete, V., Remigio, S., da Silva, A. P., Dermeval, D., Bittencourt, I. I., and Marques, L. B. (2021). On the joint use of artificial intelligence and brain-imaging techniques in technology-enhanced learning environments: A systematic literature review. Revista Brasileira de Informática na Educação (RBIE), 29:502–518.
Uzeda, L. E., Parreiras, M., and Xexéo, G. (2023). Exploring the intersection of game-based learning and sustainable education in engineering: A bibliometric analysis. In Anais Estendidos do XXII Simpósio Brasileiro de Jogos e Entretenimento Digital (SBGames), pages 683–694, Rio Grande/RS, Brasil. Sociedade Brasileira de Computação (SBC).
van Raan, A. (2014). Advances in bibliometric analysis: research performance assessment and science mapping. In Research Performance Assessment and Science Mapping. Portland Press Limited. Disponível em: [link].
