Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need?
Resumo
Keyword extraction is an important step for text interpretation, serving to identify and highlight the most significant words or phrases within a text. This step is essential for various applications such as summarization, indexing, and information retrieval. This paper presents a custom-built keyword extraction pipeline named USKE (Unsupervised Statistical Keyword Extraction) and compares its performance to large language models (LLMs). USKE is able to deliver fast and simple results based in statistical methods even when dealing with large datasets. Our evaluation demonstrates that although LLMs can achieve good results in single sentences with minimal context, they require a lot of post-processing and may output inconsistent answers, while USKE excels in efficiency and scalability.
Publicado
17/11/2024
Como Citar
ZAGATTI, Fernando Rezende; LUCRÉDIO, Daniel; CASELI, Helena de Medeiros.
Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need?. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 460-475.
ISSN 2643-6264.