PaPart-of-Speech Tagging for a Telecom Services FAQ
Abstract
This paper describes the development of a part-of-speech (POS) tagger for a frequently asked questions (FAQ) page about telecommunication services. POS tagging consists of a series of adjustments in context rules or adjacent words in text sentences for precise grammatical classification of a particular application. Lexical categories and associated attributes are summarized with presentation of relevant statistics from the system’s Portuguese language dictionary, whose lexical processing was assisted by the open-source Flex library. This morphological lexical analyzer represents the initial part of a complete chatbot system, which can potentially replace the FAQ page and assist visitors in a more user-friendly and interactive way. This is possible through the evolution of Natural Language Processing (NLP) technologies, which can reduce operating costs in a number of areas, including Customer Services and sales.
References
Aluísio, S. e Almeida, G. (2006). O que é e como se constrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa linguística. Calidoscópio 4(3), pages 156–178.
Amorim, M. T. C. F., Cury, D. e Menezes, C. (2012). Um Helpdesk Inteligente baseado em Ontologias. In: Anais do 23o Simpósio Brasileiro de Informática na Educação. Rio de Janeiro: CBIE.
Barbosa, C. R. S. C. de. (2004) Técnicas de Parsing para Gramática Livre de Contexto Lexicalizada da Língua Portuguesa. São José dos Campos: CPG da Engenharia Eletrônica e Computação do Instituto Tecnológico de Aeronáutica. Tese de Doutorado. 171p.
Brill, E. (1992). A Simple rule-based Part of Speech Tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pages 152–155. Association for Computational Linguistics, Trento.
Camara Junior, A. T. (2016). Processamento de Linguagem Natural para Indexação Automática Semântico-ontológica. Revista Ibero-Americana de Ciência da Informação. 9(2). p.569. jul/dez.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research. 12, pages 2493–2537.
Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. (1992). A practical part-of-speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pages 133–140. Association for Computational Linguistics, Trento.
Fonseca, E. and Aluísio, S. (2016). Improving POS Tagging across Portuguese Variants with Word Embeddings. In: Proceedings of the International Conference on Computational Processing of the Portuguese Language, pages 227-232. Springer, Cham.
Ghosh, S., Ghosh, S. and Das, D. (2016). Part-of-speech Tagging of Code-Mixed Social Media Text. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, pages 90–97. Association for Computational Linguistics, Austin.
Guthrie, L., Pustejovsky, J., Wilks, Y. and Slator, M. (1996). The Role of Lexicons in Natural Language Processing. Communications of the ACM. 39(1), pages 63–72.
Khurana, P., Agarwal, P., Shroff, G., Vig, L. and Srinivasan, A. (2017). Hybrid BiLSTM-Siamese network for FAQ Assistance. In: Proceedings of the ACM on Conference on Information and Knowledge Management, pages 537–545. ACM, Singapore.
Leonhardt, M. D. (2005). Doroty: um Chatterbot para Treinamento de Profissionais Atuantes no Gerenciamento de Redes de Computadores. Porto Alegre: CGCC da Universidade Federal do Rio Grande do Sul. Dissertação de Mestrado. 110p.
Oliveira, C. e Freitas, M. (2006). Classes de Palavras e Etiquetagem na Lingüística Computacional. Calidoscópio 4(3), pages 179–188.
Scarton, C., Duran, M. and Aluísio, S. (2014). Using cross-linguistic knowledge to build VerbNet-style lexicons: results for a (Brazilian) Portuguese VerbNet. In: Proceedings of the International Conference on Computational Processing of the Portuguese Language, pages 149-160. Springer, Cham.
Strube de Lima, V. L. (1996). Processamento da Linguagem Natural - premissas e desafios. In: 4o Anais da Escola Regional de Informática -SBC/Regional Sul, pages 110-124. SBC, Canoas/Londrina.
The Fast Lexical Analyzer (2018). https://www.gnu.org/software/flex/
Thiele, P. F. O. (2015) Desambiguação de anotações morfossintáticas feitas por MTMDD. Porto Alegre: PPGCC da Pontificia Universidade Catolica do Rio Grande do Sul. Dissertação de Mestrado. 58p.
