A Tool for Semantic Organization of Helpdesk Information through Topic Modeling
Abstract
In this work, we propose a semantic organization tool for textual data applied to enhance the automated analysis of helpdesk tickets (service requests) in large volumes of data. It leverages Topic Modeling (TM) techniques (i.e., CluWords) and Large Language Models (LLMs) to identify recurring patterns and themes (Llama3), aiming to improve categorization and service efficiency. Through a case study with more than 9,000 tickets, we demonstrate its application in real-world scenarios, supporting team sizing, monitoring of emerging demands, and time consumption analysis. The results indicate greater analytical efficiency and decision-making support with the use of the tool.
Keywords:
Topic Modeling, Helpdesk, Large Language Models
References
Prafulla Bafna, Dhanya Pramod, and Anagha Vaidya. 2016. Document clustering: TF-IDF approach. In 2016 ICEEOT. IEEE, 61–66.
Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems 46 (2013).
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. 2024. A survey on evaluation of large language models. ACM TIST 15, 3 (2024).
Vinicius HS Durelli, Rafael S Durelli, Andre T Endo, Elder Cirilo, Washington Luiz, and Leonardo Rocha. 2018. Please please me: does the presence of test cases influence mobile app users’ satisfaction?. In SBES. 132–141.
Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).
Richard T Herschel and Nory E Jones. 2005. Knowledge management and business intelligence: the importance of integration. Jn. of knowledge management (2005).
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. Minneapolis, Minnesota.
Washington Luiz, Felipe Viegas, Rafael Alencar, Fernando Mourão, Thiago Salles, Dárlinton Carvalho, Marcos Andre Gonçalves, and Leonardo Rocha. 2018. A feature-oriented sentiment rating for mobile app reviews. In WWW.
Meta. 2024. Introducing Llama 3.1: Our most capable models to date. [link]
Joshua Ofoeda, Richard Boateng, and John Effah. 2019. Application programming interface (API) research: A review of the past to inform the future. IJEIS (2019).
Antônio Pereira, Felipe Viegas, Marcos André Gonçalves, and Leonardo Rocha. 2023. Evaluating the Limits of the Current Evaluation Metrics for Topic Modeling. In Proc. the 29th WebMedia 2023. 119–127.
Prabu Ravichandran, Jeshwanth Reddy Machireddy, and Sareen Kumar Rachakatla. 2024. Generative AI in Business Analytics: Creating Predictive Models from Unstructured Data. Hong Kong Journal of AI and Medicine 4, 1 (2024), 146–169.
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos Gonçalves. 2019. CluWords: exploiting semantic word clustering representation for enhanced topic modeling. In WSDM.
Hanna Wallach, David Mimno, and Andrew McCallum. 2009. Rethinking LDA: Why priors matter. Advances in neural information processing systems 22 (2009).
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In WWW. 1445–1456.
Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems 46 (2013).
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. 2024. A survey on evaluation of large language models. ACM TIST 15, 3 (2024).
Vinicius HS Durelli, Rafael S Durelli, Andre T Endo, Elder Cirilo, Washington Luiz, and Leonardo Rocha. 2018. Please please me: does the presence of test cases influence mobile app users’ satisfaction?. In SBES. 132–141.
Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).
Richard T Herschel and Nory E Jones. 2005. Knowledge management and business intelligence: the importance of integration. Jn. of knowledge management (2005).
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. Minneapolis, Minnesota.
Washington Luiz, Felipe Viegas, Rafael Alencar, Fernando Mourão, Thiago Salles, Dárlinton Carvalho, Marcos Andre Gonçalves, and Leonardo Rocha. 2018. A feature-oriented sentiment rating for mobile app reviews. In WWW.
Meta. 2024. Introducing Llama 3.1: Our most capable models to date. [link]
Joshua Ofoeda, Richard Boateng, and John Effah. 2019. Application programming interface (API) research: A review of the past to inform the future. IJEIS (2019).
Antônio Pereira, Felipe Viegas, Marcos André Gonçalves, and Leonardo Rocha. 2023. Evaluating the Limits of the Current Evaluation Metrics for Topic Modeling. In Proc. the 29th WebMedia 2023. 119–127.
Prabu Ravichandran, Jeshwanth Reddy Machireddy, and Sareen Kumar Rachakatla. 2024. Generative AI in Business Analytics: Creating Predictive Models from Unstructured Data. Hong Kong Journal of AI and Medicine 4, 1 (2024), 146–169.
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos Gonçalves. 2019. CluWords: exploiting semantic word clustering representation for enhanced topic modeling. In WSDM.
Hanna Wallach, David Mimno, and Andrew McCallum. 2009. Rethinking LDA: Why priors matter. Advances in neural information processing systems 22 (2009).
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In WWW. 1445–1456.
Published
2025-11-10
How to Cite
CARVALHO, Daniel; PEREIRA, Antônio; CUNHA, Washington; TULER, Elisa; DIAS, Diego; ROCHA, Leonardo.
A Tool for Semantic Organization of Helpdesk Information through Topic Modeling. In: WORKSHOP ON TOOLS AND APPLICATIONS - BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 155-158.
ISSN 2596-1683.
DOI: https://doi.org/10.5753/webmedia_estendido.2025.13896.
