Intelligent Classification of Economic Activities from Free Text Descriptions

  • Elias Oliveira UFES
  • Patrick Marques Ciarelli UFES
  • Wallace F. Henrique UFES
  • Lucas Veronese Felipe Pedroni UFES
  • Alberto F. De Souza UFES

Resumo


We tackle the problem of automating the categorization of economic activities from business descriptions in free text format. This kind of information is vital to fundamental aspects of national governmental administration such as short, medium and long term planning and taxation. As the number of possible categories considered is very large (more than 1000 in the Brazilian scenario), the automatic text categorization problem targeted here is quite challenging. We have applied and compared the use of two different techniques to deal with it: the Vector Space Model in its classical form to represent the texts, and VG-RAM, a Weightless Neural Network.

Referências

Aleksander, I. (1996). Self-adaptive Universal Logic Circuits (Design Principles and Block Diagrams of Self-adaptive Universal Logic Circuit with Trainable Elements). IEE Electronic Letters, (2):231–232.

Aleksander, I. (1998). From WISARD to MAGNUS: a Family of Weightless Virtual Neural Machines. In RAM-Based Neural Networks, pages 18–30. J. Austin.

Carneiro, R., Dias, S. S., Fardin Jr., D., Oliveira, S., Garcez, A. S. d., and De Souza, A. F. (2006). Improving VG-RAM Neural Networks Performance Using Knowledge Correlation. Lecture Notes on Computer Science, 4232:427–436.

CNAE (2003). Classificação Nacional de Atividades Econômicas Fiscal. IBGE – Intituto Brasileiro de Geografia e Estatística, Rio de Janeiro, RJ, 1.1 edition. [link].

Ludermir, T. B., Carvalho, A. C. P. L. F., Braga, A. P., and Souto, M. d. (1999). Weightless Neural Models: a Review of Current and Past Works. Neural Computing Surveys, 2:41–61.

Salton, G., Wong, A., and Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11):613–620.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47.

Soucy, P. and Mineau, G. W. (2001). A Simple KNN Algorithm for Text Categorization. In ICDM ’01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 647–648, Washington, DC, USA. IEEE Computer Society.
Publicado
30/06/2007
OLIVEIRA, Elias; CIARELLI, Patrick Marques; HENRIQUE, Wallace F.; PEDRONI, Lucas Veronese Felipe; SOUZA, Alberto F. De. Intelligent Classification of Economic Activities from Free Text Descriptions. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 5. , 2007, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2007 . p. 1635-1639.