ALTES: An Automatic Topic Labeling Tool Using External Sources
Abstract
Interpreting the content of a large number of stored documents is challenging. Topic modeling is an unsupervised machine learning technique that supports this interpretation by identifying groups of words related to the same subject into sets of documents. However, interpreting the generated topics can be complex due to the lack of a straightforward semantic context in the grouped words. To address this challenge, the paper presents the ALTES labeling tool, which supports the interpretation of topics generated by the topic modeling technique through enrichment with data from external sources. ALTES finds words related to the terms that compose the topics and establishes associations between ideas or concepts that are not initially evident in the identified topics.
Keywords:
Topic modeling
References
Allahyari, M., Pouriyeh, S., Kochut, K. J., and Arabnia, H. R. (2017). A knowledge-based topic modeling approach for automatic topic labeling. International Journal of Advanced Computer Science and Applications, 8:335–349.
Amorim, A., Murrugarra-Llerena, N., Silva, V., de Oliveira, D., and Paes, A. (2022). Modelagem de tópicos em textos curtos: uma avaliação experimental. In SBBD, pages 254– 266.
Baratieri, T., Lentsck, M. H., Peres, C. K., and de Brito Pitilin, É. (2021). Modelagem de tópicos de pesquisa sobre o novo coronavírus: aplicação do latent dirichlet allocation. Ciência, Cuidado e Saúde.
Bhatia, S., Lau, J. H., and Baldwin, T. (2016). Automatic labeling of topics with neural embeddings. CoRR, abs/1612.05340.
Blei, D., Carin, L., and Dunson, D. (2010). Probabilistic topic models. IEEE Signal Processing Magazine, 27(6):55–65.
Kozbagarov, O., Mussabayev, R., and Mladenovic, N. (2021). A new sentence-based interpretative topic modeling and automatic topic labeling. Symmetry, 13:837.
Lau, J. H., Grieser, K., Newman, D., and Baldwin, T. (2011). Automatic labeling of topic models. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 1536–1545, Portland, Oregon, USA.
Praveen, SV e Vajrobol, V. (2023). O chatgpt pode ser confiável para consultoria? Desvendando as percepções do médico usando técnicas de aprendizagem profunda. Anais de Engenharia Biomédica, pages 1–4.
Amorim, A., Murrugarra-Llerena, N., Silva, V., de Oliveira, D., and Paes, A. (2022). Modelagem de tópicos em textos curtos: uma avaliação experimental. In SBBD, pages 254– 266.
Baratieri, T., Lentsck, M. H., Peres, C. K., and de Brito Pitilin, É. (2021). Modelagem de tópicos de pesquisa sobre o novo coronavírus: aplicação do latent dirichlet allocation. Ciência, Cuidado e Saúde.
Bhatia, S., Lau, J. H., and Baldwin, T. (2016). Automatic labeling of topics with neural embeddings. CoRR, abs/1612.05340.
Blei, D., Carin, L., and Dunson, D. (2010). Probabilistic topic models. IEEE Signal Processing Magazine, 27(6):55–65.
Kozbagarov, O., Mussabayev, R., and Mladenovic, N. (2021). A new sentence-based interpretative topic modeling and automatic topic labeling. Symmetry, 13:837.
Lau, J. H., Grieser, K., Newman, D., and Baldwin, T. (2011). Automatic labeling of topic models. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 1536–1545, Portland, Oregon, USA.
Praveen, SV e Vajrobol, V. (2023). O chatgpt pode ser confiável para consultoria? Desvendando as percepções do médico usando técnicas de aprendizagem profunda. Anais de Engenharia Biomédica, pages 1–4.
Published
2023-09-25
How to Cite
AMORIM, Annie; MURRUGARRA-LLERENA, Nils; SILVA, Vítor; DE OLIVEIRA, Daniel; PAES, Aline.
ALTES: An Automatic Topic Labeling Tool Using External Sources. In: DEMOS AND APPLICATIONS - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 120-125.
DOI: https://doi.org/10.5753/sbbd_estendido.2023.233252.
