Data Mining using Naive Bayes classifier: an application in short news

  • Thais Neubauer Universidade de São Paulo
  • Sarajane Peres Universidade de São Paulo


In the information age, a plethora of content is available on a wide range of subjects, requiring an organization capable of making that content more accessible and engaging. An interesting application of classification tasks was identified in the Index project, developed by the Amsterdam-based company The Next Web. To solve this classification task, the Naive Bayes (NB) technique was applied to classify short news in four topics. To evaluate the results produced by such a classifier, a series of tests using cross-validation were carried out. It was possible to conclude that the NB classifier had satisfactory performance, achieving about 70% of accuracy in the best cases. In this paper, we intend to present the context of the Index project and discuss the results obtained with the NB classifiers. Despite the good results, the project is still in progress, as it is necessary to test variations as classification techniques and text representation approaches.

Palavras-chave: Mineração de dados, Classificação de texto, Classificação Estatistica, Naive Bayes


