Detection of Vehicle Purchases in Various Invoices using Large-Scale Language Models

  • Gabriel V. Heisler UFSC
  • William J. Beckhauser UFSC
  • Vitória S. Santos UFSC
  • Renato Fileto UFSC

Resumo


The precise identification of products in invoice item descriptions is crucial for applications such as auditing and fraud detection. However, the free-text descriptions of these items are often short, diverse, and inconsistent with other data fields. It makes product identification challenging and compromises the performance of existing solutions. In this work we compare the performance and computational costs of language models (LLMs) in the task of detecting vehicle descriptions in an invoice dataset from public purchases. Experimental results reveal that some state-of-the-art LLMs can reach high performance, even in noisy scenarios, and lightweight models yield competitive performance with lower computational costs.

Referências

Bardelli, C., Rondinelli, A., Vecchio, R., and Figini, S. (2020). Automatic electronic invoice classification using machine learning models. Machine Learning and Knowledge Extraction, 2(4):617–629.

Brinkmann, A., Baumann, N., and Bizer, C. (2024). Using llms for the extraction and normalization of product attribute values. In ADBIS, pages 217–230. Springer.

Da Costa, L. S., Oliveira, I. L., and Fileto, R. (2023). Text classification using embeddings: a survey. Knowledge and Information Systems, 65(7):2761–2803.

Di Oliveira, V., Bezerra, Y. F., Weigang, L., Brom, P. C., and Celestino, V. R. R. (2024). Slim-raft: A novel fine-tuning approach to improve cross-linguistic performance for mercosur common nomenclature. arXiv preprint arXiv:2408.03936.

Gasparetto, A., Marcuzzo, M., Zangari, A., and Albarelli, A. (2022). A survey on text classification algorithms: From text to predictions. Information, 13(2):83.

Hamdi, A., Carel, E., Joseph, A., Coustaty, M., and Doucet, A. (2021). Information extraction from invoices. In ICDAR, pages 699–714. Springer.

Holt, X. and Chisholm, A. (2018). Extracting structured data from invoices. In Proceedings of the ALTA Workshop 2018, pages 53–59.

Rea, L. and Parker, R. (2012). Designing and Conducting Survey Research: A Comprehensive Guide. Wiley.

Saout, T., Lardeux, F., and Saubion, F. (2024). An overview of data extraction from invoices. IEEE Access, 12:19872–19886.

Sarawagi, S. et al. (2008). Information extraction. Foundations and Trends® in Databases, 1(3):261–377.

Silva, M. O., Costa, L. L., Bezerra, G., Gomide, L. D., Hott, H. R., Oliveira, G. P., Brandao, M. A., Lacerda, A., and Pappa, G. (2023). Análise de sobrepreço em itens de licitaçoes públicas. In WCGE, pages 118–129. SBC.

Soares, D., da Silva, J. P. D., Zibetti, A. W., and Werner, S. S. (2024). Sobrepreço em compras públicas: Metodologia baseada na identificação de valores discrepantes. In SBBD, pages 266–272. SBC.

Tutica, L., Vineel, K., Mishra, S., Mishra, M. K., and Suman, S. (2020). Invoice deduction classification using lgbm prediction model. In ETAEERE, pages 127–137. Springer.
Publicado
12/11/2025
HEISLER, Gabriel V.; BECKHAUSER, William J.; SANTOS, Vitória S.; FILETO, Renato. Detection of Vehicle Purchases in Various Invoices using Large-Scale Language Models. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 164-167. DOI: https://doi.org/10.5753/eramiars.2025.16764.