Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need?

Fernando Rezende Zagatti; Daniel Lucrédio; Helena de Medeiros Caseli

Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need?

Fernando Rezende Zagatti UFSCar
Daniel Lucrédio UFSCar
Helena de Medeiros Caseli UFSCar

Resumo

Keyword extraction is an important step for text interpretation, serving to identify and highlight the most significant words or phrases within a text. This step is essential for various applications such as summarization, indexing, and information retrieval. This paper presents a custom-built keyword extraction pipeline named USKE (Unsupervised Statistical Keyword Extraction) and compares its performance to large language models (LLMs). USKE is able to deliver fast and simple results based in statistical methods even when dealing with large datasets. Our evaluation demonstrates that although LLMs can achieve good results in single sentences with minimal context, they require a lot of post-processing and may output inconsistent answers, while USKE excels in efficiency and scalability.

Springer (English)

Publicado

17/11/2024

Como Citar

Selecione um Formato

ZAGATTI, Fernando Rezende; LUCRÉDIO, Daniel; CASELI, Helena de Medeiros. Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need?. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 460-475. ISSN 2643-6264.