A Temporal-Relational Model for Document Classification

  • Fernando Mourão UFMG
  • Wagner Meira Jr. UFMG

Abstract


Automatic Document Classification (ADC) is one of the most relevant research problems in information retrieval. Despite the large number of ADC techniques already proposed, there is still a demand for techniques that are effective and efficient in taking into consideration relationships among terms. In this paper we propose a new network-based model for textual documents and introduce a family of relational algorithms for ADC that consider the temporal evolution of documents. Experimental evaluation of these algorithms shows that they achieve results comparable to SVM in four real datasets. Further, its simplicity, efficiency and the absence of a complex parameter tuning are characteristics that make our algorithm an interesting alternative to SVM.

References

Macskassy, S. A. and Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8:935–983.

Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. [link].

Montejo-Raez, A., Urena-Lopez, L. A., Garcia-Cumbreras, M. A., and Perea-Ortega, J. M. (2008). Using linguistic information as features for text categorization. In Proc. of the MMDSS, Varese, Italy. Ios Press Inc.

Mourão, F. (2009). Um modelo temporal-relacional para classificação de documentos. Master’s thesis, UFMG. Disponível em [link].

Mourão, F., Rocha, L., Miranda, L., A., V., and Meira Jr., W. (2009). Quantifying the impact of information aggregation on complex networks: A temporal perspective. In Proc. of the 6th WAW, Barcelona, Spain.

Rocha, L., Mourão, F., Pereira, A., Gonçalves, M., and Meira Jr, W. (2008). Exploiting temporal contexts in text classification. In Proc. of the 17th CIKM, CA, USA. ACM.
Published
2010-07-20
MOURÃO, Fernando; MEIRA JR., Wagner. A Temporal-Relational Model for Document Classification. In: THESIS AND DISSERTATION CONTEST (CTD), 23. , 2010, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2010 . p. 65-72. ISSN 2763-8820.