Mitigando os Limites das Métricas Atuais de Avaliação de Estratégias de Modelagem de Tópicos

Antônio Pereira de Souza Júnior; Felipe Augusto Resende Viegas; Leonardo Chaves Dutra da Rocha

doi:10.5753/webmedia_estendido.2024.244137

Antônio Pereira de Souza Júnior UFSJ
Felipe Augusto Resende Viegas UFMG
Leonardo Chaves Dutra da Rocha UFSJ

DOI: https://doi.org/10.5753/webmedia_estendido.2024.244137

Resumo

Topic Modeling (TM) helps extract and organize information from large amounts of textual data by discovering semantic topics from documents. This master thesis delves into issues of topic quality evaluation, responsible for driving the advances in the TM field by assessing the overall quality of the topic generation process. Since traditional TM metrics capture the quality of topics by strictly evaluating the words that make up the topics, either syntactically (e.g., NPMI, TF-IDF Coherence) or semantically (e.g., WEP), we investigate whether we are approaching the limits of what the current evaluation metrics can assess regarding TM quality. For this, we perform a comprehensive experimental evaluation, considering three widely used datasets (ACM, 20News, and WOS) for which a natural organization of the collection’s documents into semantic classes (topics) does exist. We contrast the quality of topics generated by four traditional and state-of-the-art TM techniques (i.e., LDA, NMF, CluWords, and BERTopic) with each collection’s “natural topic structure”. Our results show that, despite the importance of the current metrics, they could not capture some important idiosyncratic aspects of the TM task, in this case, the capability of the topics to induce a structural organization of the document space into distinct semantic groups. To mitigate such limitations, we propose incorporating metrics commonly used to evaluate clustering algorithms into the TM evaluation process, relying on some commonalities between TM and clustering tasks. Results highlight the effectiveness of clustering metrics in distinguishing the results of TM techniques compared to the datasets’ground truth (class organization). However, adopting additional evaluation metrics implies expanding the analysis space. Thus, as a third contribution, we propose consolidating the various metrics into a unified framework, using Game Theory for decision-making, specifically Multi-Attribute Utility Theory (MAUT). Our experimental results demonstrate that MAUT allows a more precise assessment of TM quality.

Palavras-chave: Modelagem de tópicos, Avaliação de modelagem de tópicos, Aprendizado de máquina, NLP, Mineração de dados

Referências

Rodrigo Carvalho, Nícollas Silva, Luiz Chaves, Adriano C. M. Pereira, and Leonardo Rocha. 2019. Geographic-categorical diversification in POI recommendations. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, WebMedia 2019, Rio de Janeiro, Brazil, October 29 - November 01, 2019. ACM, 349–356.

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).

Gokhan Kul, Duc Thanh Anh Luong, Ting Xie, Varun Chandola, Oliver Kennedy, and Shambhu Upadhyaya. 2018. Similarity metrics for SQL query clustering. IEEE Transactions on Knowledge and Data Engineering 30, 12 (2018), 2408–2420.

C Saranya and G Manikandan. 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5, 3 (2013), 2701–2704.