Automatic Classification of NCM Codes Using the Naïve Bayes Algorithm
Keywords:Machine Learning, Consumer Product Classification, NCM, Text classification, Naïve Bayes Algorithm
This paper consists of the development of a classifier for the automatic categorization of product item descriptions into their appropriate Common Mercosul Nomenclature (NCM) codes. This classifier was developed using the Naïve Bayes supervised learning algorithm. For training, data from items of consumer invoices belonging to chapters 22 and 90 of the NCM were used. The results evidenced the capacity of the model to correctly classify the instances. For the simpler and easier data set, based on chapter 22, an accuracy of 98% was obtained, while for the medium and difficult sets, based on chapters 22 and 90, the accuracy obtained was 90% and 83%, respectively.
Ding, L., Fan, Z., Chen, D. (2015) “Auto-Categorization of HS Code Using Background Net Approach”, Procedia Computer Science, v. 60, p. 1462-1471.
Flick, U. (2012) “Introdução à metodologia de pesquisa: um guia para iniciantes”, Penso Editora.
Indurkhya, N., Damerau, F. J. (Ed.). (2010) “Handbook of natural language processing”, CRC Press.
Kohavi, R. (1995) “A study of cross-validation and bootstrap for accuracy estimation and model selection”, International joint Conference on artificial intelligence. [S.l.: s.n.). v. 14, p. 1137–1145.
Leskovec, J., Rajaraman, A., Ullman, J. D. (2014) “Mining of massive datasets”, Cambridge University Press.
Luhn, H. P. (1957) “A statistical approach to mechanized encoding and searching of literary information”, IBM Journal of research and development, v. 1, n. 4, p. 309-317.
Manning, C. D., Raghavan, P., Schütze, H. (2008) “Introduction to information retrieval”, v. 1, n. 1. Cambridge: Cambridge university press.
Ministério da Indústria, Comércio Exterior e Serviços. (2016) “TEC em Excel Completa”, Disponível em: . Acesso em: 2 de Abr. 2017.
Mitchell, T. M. (1997) “Machine learning”, Burr Ridge, IL: McGraw Hill, v. 45, p. 37.
Orengo, V. M., Huyck, C. R. “RSLP Stemmer (Removedor de Sufixos da Língua Portuguesa)”, Disponível em: . Acesso em: 2 de Abr. 2017.
Orengo, V. M., Huyck, C. R. (2001) “A Stemming Algorithm for the Portuguese Language”, In: spire. 2001. p. 186-193.
Pedregosa, F., Buitinck, L., Louppe, G., Blondel, M., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly, A., Holt, B., Varoquaux, G. (2011) “Scikit-learn: Machine Learning in Python”, JMLR 12, pp. 2825-2830.
Receita Federal. (2015) “Sistema harmonizado de designação e de codificação de mercadorias”. Disponível em: . Acesso em: 2 de Abr. 2017.
Russell, S. J., Norvig, P. (2003) “Artificial intelligence: a modern approach”, Upper Saddle River: Prentice hall.
Sparck Jones, K. (1972) “A statistical interpretation of term specificity and its application in retrieval”, Journal of documentation, v. 28, n. 1, p. 11-21.
Tong, S., Koller, D. (2001) “Support vector machine active learning with applications to text classification”, Journal of machine learning research, v. 2, n. Nov, p. 45-66.
Triola, M. F. (2008) “Bayes’ Theorem”. Disponível em: . Acesso em: 2 de Abr. 2017.