ALTES: An Automatic Topic Labeling Tool Using External Sources

  • Annie Amorim Fluminense Federal University
  • Nils Murrugarra-Llerena Weber State University
  • Vítor Silva Federal University of Rio de Janeiro
  • Daniel de Oliveira Fluminense Federal University
  • Aline Paes Fluminense Federal University

Abstract


Interpreting the content of a large number of stored documents is challenging. Topic modeling is an unsupervised machine learning technique that supports this interpretation by identifying groups of words related to the same subject into sets of documents. However, interpreting the generated topics can be complex due to the lack of a straightforward semantic context in the grouped words. To address this challenge, the paper presents the ALTES labeling tool, which supports the interpretation of topics generated by the topic modeling technique through enrichment with data from external sources. ALTES finds words related to the terms that compose the topics and establishes associations between ideas or concepts that are not initially evident in the identified topics.
Keywords: Topic modeling

References

Allahyari, M., Pouriyeh, S., Kochut, K. J., and Arabnia, H. R. (2017). A knowledge-based topic modeling approach for automatic topic labeling. International Journal of Advanced Computer Science and Applications, 8:335–349.

Amorim, A., Murrugarra-Llerena, N., Silva, V., de Oliveira, D., and Paes, A. (2022). Modelagem de tópicos em textos curtos: uma avaliação experimental. In SBBD, pages 254– 266.

Baratieri, T., Lentsck, M. H., Peres, C. K., and de Brito Pitilin, É. (2021). Modelagem de tópicos de pesquisa sobre o novo coronavírus: aplicação do latent dirichlet allocation. Ciência, Cuidado e Saúde.

Bhatia, S., Lau, J. H., and Baldwin, T. (2016). Automatic labeling of topics with neural embeddings. CoRR, abs/1612.05340.

Blei, D., Carin, L., and Dunson, D. (2010). Probabilistic topic models. IEEE Signal Processing Magazine, 27(6):55–65.

Kozbagarov, O., Mussabayev, R., and Mladenovic, N. (2021). A new sentence-based interpretative topic modeling and automatic topic labeling. Symmetry, 13:837.

Lau, J. H., Grieser, K., Newman, D., and Baldwin, T. (2011). Automatic labeling of topic models. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 1536–1545, Portland, Oregon, USA.

Praveen, SV e Vajrobol, V. (2023). O chatgpt pode ser confiável para consultoria? Desvendando as percepções do médico usando técnicas de aprendizagem profunda. Anais de Engenharia Biomédica, pages 1–4.
Published
2023-09-25
AMORIM, Annie; MURRUGARRA-LLERENA, Nils; SILVA, Vítor; DE OLIVEIRA, Daniel; PAES, Aline. ALTES: An Automatic Topic Labeling Tool Using External Sources. In: DEMOS AND APPLICATIONS - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 120-125. DOI: https://doi.org/10.5753/sbbd_estendido.2023.233252.