AI-PAVE-Br: Leveraging Large Language Models for Enhanced Product Attribute Value Extraction through a Golden Set Approach

Murilo Gazzola; Hugo Gobato Souto; Samuel Silva; Júlia Schubert Peixoto; Felipe Siqueira; André Luis Pedroso de Morais; Caio Gomes

doi:10.5753/stil.2025.37821

Murilo Gazzola LuizaLabs / UPM
Hugo Gobato Souto LuizaLabs / UPM
Samuel Silva LuizaLabs
Júlia Schubert Peixoto LuizaLabs
Felipe Siqueira LuizaLabs
André Luis Pedroso de Morais LuizaLabs
Caio Gomes LuizaLabs

DOI: https://doi.org/10.5753/stil.2025.37821

Resumo

The explosive growth and complexity of product data within the dynamic Brazilian e-commerce landscape demand robust and specialized methods for structured information extraction. Traditional approaches to Product Attribute Value Extraction (PAVE) often struggle with the linguistic nuances and sheer diversity of product descriptions in Portuguese. To address this critical gap, this paper introduces two major contributions. First, we present AI-PAVEBr, a specialized system engineered with Large Language Models (LLMs) to perform high-accuracy PAVE specifically for Brazilian e-commerce catalogs. Second, to facilitate reproducible research and provide a definitive benchmark, we introduce and share the Golden Set, a new, meticulously curated, and manually annotated dataset for PAVE in Portuguese. We detail the creation process and structure (Entity, Category, Subcategories) of this high-quality reference set. Our experiments conclusively show that AI-PAVE-Br, leveraging targeted prompt engineering, dramatically outperforms conventional Named Entity Recognition (NER) baselines. This work not only delivers a superior, scalable solution for a major non-English market but also enriches the NLP community with a valuable, publicly available resource for future PAVE research.

Referências

Abilio, R., Coelho, G. P., and da Silva, A. E. A. (2024). Evaluating named entity recognition: A comparative analysis of monoand multilingual transformer models on a novel brazilian corporate earnings call transcripts dataset. Applied Soft Computing, 166:112158.

Adamson, A. S. and Welch, H. G. (2019). Machine learning and the cancer-diagnosis problem — no gold standard. New England Journal of Medicine, 381(24):2285–2287.

Brinkmann, A., Baumann, N., and Bizer, C. (2024a). Using LLMs for the Extraction and Normalization of Product Attribute Values, page 217–230. Springer Nature Switzerland.

Brinkmann, A., Baumann, N., and Bizer, C. (2024b). Using llms for the extraction and normalization of product attribute values. In Tekli, J., Gamper, J., Chbeir, R., and Manolopoulos, Y., editors, Advances in Databases and Information Systems, pages 217–230. Springer Nature Switzerland.

Brinkmann, A. and Bizer, C. (2025). Automated self-refinement and self-correction for llm-based product attribute value extraction. arXiv preprint arXiv:2501.01237.

Brinkmann, A., Shraga, R., and Bizer, C. (2025). Extractgpt: Exploring the potential of large language models for product attribute value extraction. In International Conference on Information Integration and Web Intelligence, pages 38–52. Springer.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners (openai). [link].

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019a). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1, pages 4171–4186.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019b). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North. Association for Computational Linguistics.

Fang, C., Li, X., Fan, Z., Xu, J., Nag, K., Korpeoglu, E., Kumar, S., and Achan, K. (2024). Llm-ensemble: Optimal large language model ensemble method for e-commerce product attribute value extraction. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 2910–2914, New York, NY, USA. Association for Computing Machinery.

Gong, J. and Eldardiry, H. (2024). Multi-label zero-shot product attribute-value extraction. In Proceedings of the ACM Web Conference 2024, WWW ’24, page 2259–2270. ACM.

Joachims, T. (1999). Making large-scale svm learning practical. advances in kernel methodssupport vector learning. b. schokopt et al.

Kim, H., Kim, J.-E., and Kim, H. (2024). Exploring nested named entity recognition with large language models: Methods, challenges, and insights. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8653–8670.

Lafferty, J., McCallum, A., Pereira, F., et al. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Icml, volume 1, page 3. Williamstown, MA.

Luo, F., Xiao, H., and Chang, W. (2011). Product named entity recognition using conditional random fields. In 2011 Fourth international conference on business intelligence and financial engineering, pages 86–89. IEEE.

Neuberger, J., Ackermann, L., van der Aa, H., and Jablonski, S. (2025). A universal prompting strategy for extracting process model information from natural language text using large language models. In Maass, W., Han, H., Yasar, H., and Multari, N., editors, Conceptual Modeling, pages 38–55, Cham. Springer Nature Switzerland.

Sabeh, K., Kacimi, M., Gamper, J., Litschko, R., and Plank, B. (2024). Exploring large language models for product attribute value identification. arXiv preprint arXiv:2409.12695.

Silva, D. F., Silva, A. M. e., Lopes, B. M., Johansson, K. M., Assi, F. M., de Jesus, J. T. C., Mazo, R. N., Lucrédio, D., Caseli, H. M., and Real, L. (2021). Named Entity Recognition for Brazilian Portuguese Product Titles, page 526–541. Springer International Publishing.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: Pretrained BERT Models for Brazilian Portuguese, page 403–417. Springer International Publishing.

Srinivas, M., Krishna Reddy, S. V., NM, M., and Miyazawa, H. (2024). Evaluation of chatgpt, gemini and llama-2 for e-commerce product attribute extraction. In Proceedings of the 2024 10th International Conference on e-Society, e-Learning and e-Technologies (ICSLT), pages 43–48.

Sugiyama, A., Harumoto, K., Kawashima, M., and Matsumoto, Y. (2010). Attribute value extraction from semi-structured web documents. IEICE transactions on information and systems, 93(10):2626–2633.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wasilewski, A. (2024). Functional framework for multivariant e-commerce user interfaces. Journal of Theoretical and Applied Electronic Commerce Research, 19(1):412–430.

Yang, L., Wang, Q., Wang, J., Quan, X., Feng, F., Chen, Y., Khabsa, M., Wang, S., Xu, Z., and Liu, D. (2023). MixPAVE: Mix-prompt tuning for few-shot product attribute value extraction. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 9978–9991, Toronto, Canada. Association for Computational Linguistics.