Uma Abordagem de Anotação Semântica Automática Direcionada a Sistemas de Perguntas e Respostas

Laura L. Dias; Luciano V. B. Espiridião; Anderson A. Ferreira

doi:10.5753/sbbd.2021.17873

Laura L. Dias Universidade Federal de Ouro Preto (UFOP)
Luciano V. B. Espiridião Universidade Federal de Ouro Preto (UFOP)
Anderson A. Ferreira Universidade Federal de Ouro Preto (UFOP)

DOI: https://doi.org/10.5753/sbbd.2021.17873

Resumo

O crescimento acelerado dos repositórios de conteúdo tem ocasionado à necessidade de melhores mecanismos de indexação e busca, incluindo sistemas de perguntas e respostas. Os usuários ainda enfrentam dificuldades para navegar no grande volume de informações na Web. No entanto, estudos sobre anotação semântica automática permitem a identificação de conteúdos nos repositórios e auxiliam diversos sistemas. Este trabalho propõe um método de processamento de perguntas, por meio da BERT, para a realização da tarefa de anotação semântica, agregando recursos da DBpedia como contexto às perguntas. Os resultados experimentais mostram avanços de até 13% quando comparados ao baseline.

Palavras-chave: Processamento de Linguagem Natural, Recuperação de Informação, RDF e Dados Ligados, Processamento de Perguntas, BERT

Referências

Amaral, D. O. F. d. (2013). O reconhecimento de entidades nomeadas por meio de conditional random fields para a língua portuguesa. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul.

Chandurkar, A. and Bansal, A. (2017). A composite natural language processing and information retrieval approach to question answering using a structured knowledge base.International Journal of Semantic Computing, 11(03):345–371.

Dasiopoulou, S., Giannakidou, E., Litos, G., Malasioti, P., and Kompatsiaris, Y.(2011). A survey of semantic image and video annotation tools. In Knowledge-driven multimedia information extraction and ontology evolution, pages 196–239.Springer

de Ramos Araújo, L. and de Souza, J. F. (2011). Aumentando a transparência do governo por meio da transformação de dados governamentais abertos em dados ligados. Revista Eletrônica de Sistemas de Informação, 10(1).

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Dias, L., Barbosa, J., Barrére, E., and De Souza, J. (2017). An approach to identify similarity among educational resources using external knowledge bases. Brazilian Journal of Computers in Education, 25(2):18–37.

Dias, L. L., Barrére, E., and de Souza, J. F. (2020). The impact of semantic annotation techniques on content-based video lecture recommendation. Journal of Information Science, page 1–13.

Dimitrakis, E., Sgontzos, K., and Tzitzikas, Y. (2020). A survey on question answering systems over linked data and documents. Journal of Intelligent Information Systems, 55(2):233–259.

Garg, S., Vu, T., and Moschitti, A. (2020). Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7780–7788.

Gupta, Y., Saini, A., and Saxena, A. (2015). A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications, 42(3):1223–1234.

Hirschman, L. and Gaizauskas, R. (2001). Natural language question answering: theview from here. Natural language engineering, 7(4):275–300.

Jain, P., Hitzler, P., Sheth, A. P., Verma, K., and Yeh, P. Z. (2010). Ontology alignment for linked open data. In International Semantic Web Conference, pages 402–417. Springer.

Kapanipathi, P., Abdelaziz, I., Ravishankar, S., Roukos, S., Gray, A., Astudillo, R.,Chang, M., Cornelio, C., Dana, S., Fokoue, A., et al. (2021). Leveraging abstract meaning representation for knowledge base question answering. Findings of the Association for Computational Linguistics: ACL.

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906

Kawase, R., Siehndel, P., Pereira Nunes, B., Herder, E., and Nejdl, W. (2014). Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources. In Proceedings of the 25th ACM conference on Hypertext and social media, pages 56–65. ACM.

Ko, J., Si, L., and Nyberg, E. (2010). Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering. Information processing & management, 46(5):541–554.

Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N.,Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al. (2015). Dbpedia–alarge-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167–195.

Liddy, E. D. (1998). Enhanced text retrieval using natural language processing. Bulletin of the Association for Information Science and Technology, 24(4):14–16.

Ma, X., Sun, K., Pradeep, R., and Lin, J. (2021). A replication study of dense passage retriever. arXiv preprint arXiv:2104.05740

Mohasseb, A., Bader-El-Den, M., and Cocea, M. (2018). Question categorization and classification using grammar based approach. Information Processing & Management, 54(6):1228–1243.

Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489–508.

Qazi, A. and Goudar, R. (2016). Emerging trends in reducing semantic gap towards multimedia access: A comprehensive survey. Indian Journal of Science and Technology, 9(30).

Seaborne, A. and Prud’hommeaux, E. (21 July 2005). Sparql query language for rdf. W3C Working Draft.

Shah, A. A., Ravana, S. D., Hamid, S., and Ismail, M. A. (2018). Accuracy evaluation of methods and techniques in web-based question answering systems: a survey. Knowledge and Information Systems, 58(3):611–650.

Song, S., Huang, W., and Sun, Y. (2017). Semantic query graph based sparql generation from natural language questions. Cluster Computing, 22(1):847–858.

Wang, Y., Qin, J., and Wang, W. (2017). Efficient approximate entity matching using jaro-winkler distance. In International Conference on Web Information Systems Engineering, pages 231–239. Springer.

Yang, Y., Yih, W.-t., and Meek, C. (2015). Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2013–2018.