Reducing Costs in Large-Scale Classification: A Hybrid BERT–LLM Strategy
Resumo
The Brazilian National Graduate Education System (SNPG) faces the challenge of mapping thousands of researchers, such as master’s and doctoral degree holders and higher education teachers, with strong affinity to a set of strategic themes. In this context, Large Language Models (LLMs) have demonstrated high effectiveness in analyzing and classifying academic profiles, showing consistency in the identification of their areas of concentration. However, the large-scale application of such models entails substantial computational costs, specialized hardware requirements, and significant environmental impacts, such as increased carbon emissions. As an alternative, this work proposes a hybrid approach that combines a BERT-based pre-classification module with proprietary LLMs. We evaluate a BERT module that functions as a pre-filter, reducing the number of queries to LLMs by estimating the affinity between academic outputs and strategic themes, discarding those with low relevance. Affinity estimation is performed through embedding similarity measures. We use labeled data provided by the Workshop on Lightweight and Efficient Deep Learning in High-Performance Computing (LeanDL-HPC 2025) to propose and evaluate different strategies for aggregating information from academic profiles, computing their similarity with strategic themes. Results show that giving more weight to the title and abstract of academic outputs, aggregating embeddings with the max operator, and using the Mpnet model slightly improve performance.
Palavras-chave:
Costs, Computational modeling, Large language models, High performance computing, Conferences, Education, Estimation, Medical services, Hardware, Faces, Large Language Models, BERT-based models, Hybrid approach, Cost reduction, Pre-filter module
Publicado
28/10/2025
Como Citar
MIRANDA, Augusto Cesar Dalal; UTINO, Matheus Yasuo Ribeiro; GÔLO, Marcos Paulo Silva; SANTOS, Marcela Aparecida Aniceto Dos; SOUZA, Mariana Caravanti de.
Reducing Costs in Large-Scale Classification: A Hybrid BERT–LLM Strategy. In: WORKSHOP ON LIGHTWEIGHT EFFICIENT DEEP LEARNING IN HPC ENVIRONMENTS (LEANDL-HPC) - INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 37. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 131-138.
