Classificação dos Códigos de NCM Usando Processamento de Linguagem Natural

  • Pedro Pinheiro UFPA
  • Marcos Amaris UFPA

Abstract


This article aims to develop a process to classify the descriptions of products in electronic invoices (abbreviated NF-e in portuguese). This classification is done on the Chapters (first two digits) of the Mercosul Common Nomenclature (NCM). The classification was performed using the Support Vector Machine algorithm, with a database of 340,000 distinct products, which were treated using Natural Language Processing techniques. An accuracy of 84% was obtained for a total of 50 classes.

Keywords: Natural Processing Language, Machine Learning, Text Classification, Mercosul Common Nomenclature

References

Che, J., Xing, Y., and Zhang, L. (2018). A comprehensive solution for deep-learning based cargo inspection to discriminate goods in containers. In Proceedings of the CVPR IEEE Conference, pages 1206–1213.

de Abreu Batista, R., Bagatini, D. D., and Frozza, R. (2018). Classificação automática de códigos ncm utilizando o algoritmo naïve bayes. iSys - Revista Brasileira de Sistemas de Informação, 11(2):4–29.

Luppes, J., de Vries, A. P., and Hasibi, F. (2019). Classifying short text for the harmonized system with convolutional neural networks. Radboud University.

Prati, R. C. (2006). Novas abordagens em aprendizado de máquina para a geração de regras, classes desbalanceadas e ordenação de casos. PhD thesis, Universidade de São Paulo.
Published
2021-11-18
PINHEIRO, Pedro; AMARIS, Marcos. Classificação dos Códigos de NCM Usando Processamento de Linguagem Natural. In: REGIONAL SCHOOL OF HIGH PERFORMANCE NORTH 2 (ERAD-NO2) AND REGIONAL SCHOOL OF MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE NORTH 2 (ERAMIA-NO2), 1. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 9-12. DOI: https://doi.org/10.5753/erad-no2.2021.18671.