Combining Rule-based and Statistical Methods for Named Entity Recognition in Portuguese

  • Eduardo Ferreira University of Lisbon
  • João Balsa University of Lisbon
  • António Branco University of Lisbon

Resumo


We present and discuss a tool for the recognition of expressions for named entities in Portuguese that resorts to a rule-based approach when dealing with numbers, measures, time and addresses, and uses a hybrid approach when dealing with names. The expressions for named entities are delimited and semantically classified by a XML-like markup. Evaluation results are presented.

Referências

Barreto, F., Branco, A., Ferreira, E., Mendes, A., Nascimento, M., Nunes, F., and Silva, J. (2006). Open resources and tools for the shallow processing of portuguese. In [Calzolari et al. 2006].

Branco, A. and Silva, J. (2004). Evaluating solutions for the rapid development of state-of-the-art POS taggers for portuguese. In Lino, M. T., Xavier, M. F., Ferreira, F., Costa, R., and Silva, R., editors, Proc. LREC2004, pages 507–510, Paris. ELRA.

Brants, T. (2000). TnT - a statistical part-of-speech tagger. In Proceedings of the 3rd Applied Natural Language Processing Conference, pages 224–231.

Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odjik, J., and Tapias, D., editors (2006). Proceedings of LREC2006. ELRA.

Chinchor, N. (1997). MUC-7 named entity task definition (version 3.5). Available at: [link].

HAREM (2006). HAREM - avaliação conjunta de sistemas de reconhecimento de entidades mencionadas. Available at: [link].

Klein, G. (2004). JFlex user’s manual (version 1.4.1). Available at: [link].

Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. In Proceedings of the Empirical Methods on Natural Language Processing Conference, ACL, pages 133–142.

Santos, D., Seco, N., Cardoso, N., and Vilela, R. (2006). HAREM: An advanced NER evaluation contest for portuguese. In [Calzolari et al. 2006], pages 1986–1991.
Publicado
30/06/2007
FERREIRA, Eduardo; BALSA, João; BRANCO, António. Combining Rule-based and Statistical Methods for Named Entity Recognition in Portuguese. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 5. , 2007, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2007 . p. 1615-1624.