Classificação dos Códigos de NCM Usando Processamento de Linguagem Natural
Abstract
This article aims to develop a process to classify the descriptions of products in electronic invoices (abbreviated NF-e in portuguese). This classification is done on the Chapters (first two digits) of the Mercosul Common Nomenclature (NCM). The classification was performed using the Support Vector Machine algorithm, with a database of 340,000 distinct products, which were treated using Natural Language Processing techniques. An accuracy of 84% was obtained for a total of 50 classes.
References
de Abreu Batista, R., Bagatini, D. D., and Frozza, R. (2018). Classificação automática de códigos ncm utilizando o algoritmo naïve bayes. iSys - Revista Brasileira de Sistemas de Informação, 11(2):4–29.
Luppes, J., de Vries, A. P., and Hasibi, F. (2019). Classifying short text for the harmonized system with convolutional neural networks. Radboud University.
Prati, R. C. (2006). Novas abordagens em aprendizado de máquina para a geração de regras, classes desbalanceadas e ordenação de casos. PhD thesis, Universidade de São Paulo.
