Adapting Large Language Models for Topic Modeling Tasks

  • Daniel Carvalho UFSJ
  • Antônio Pereira UFSJ
  • Elisa Tuler UFSJ
  • Diego Dias UFES
  • Washington Cunha UFMG
  • Leonardo Rocha UFSJ


This work presents a proposal for adapting Large Language Models (LLMs) to the unsupervised task of Topic Modeling (TM). Our proposal consists of three stages: document summarization, characterization of topics, and definition of topics. We instantiated our proposal with two LLMs, one open-source (Llama3) and the other proprietary (GPT 3.5), comparing them with four state-of-the-art (SOTA) strategies in TM. Our results demonstrated that the approach is very promising, having been able to define topics as coherent as SOTA strategies but still with room for improvement in terms of organizational structure.

Palavras-chave: Modelagem de Tópicos, Grandes Modelos de Linguagem, Processamento de Linguagem Natural


