Evaluating Contextualized Embeddings for Topic Modeling in Public Bidding Domain

  • Henrique R. Hott UFMG
  • Mariana O. Silva UFMG
  • Gabriel P. Oliveira UFMG
  • Michele A. Brandão UFMG / IFMG
  • Anisio Lacerda UFMG
  • Gisele Pappa UFMG


Public procurement plays a crucial role in government operations by acquiring goods and services through competitive bidding processes. However, the increasing volume of procurement data has made manual analysis impractical and time-consuming. Therefore, text clustering and topic modeling techniques have been widely used to uncover hidden patterns in unstructured text data. This paper leverages the power of BERT-based models to overcome the challenges associated with analyzing public procurement data. Specifically, we employ BERTopic, a topic modeling technique based on BERT, to generate clusters that capture the underlying topics in procurement data. Additionally, we evaluate several sentence embedding models for representing procurement documents. By combining BERT-based models and advanced sentence embeddings, we aim to enhance the accuracy and interpretability of topic modeling in public procurement analysis. Our results provide valuable insights into the underlying topics within the data, aiding decision-making processes and improving the efficiency of procurement operations.
HOTT, Henrique R.; SILVA, Mariana O.; OLIVEIRA, Gabriel P.; BRANDÃO, Michele A.; LACERDA, Anisio; PAPPA, Gisele. Evaluating Contextualized Embeddings for Topic Modeling in Public Bidding Domain. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 410-426. ISSN 2643-6264.