Comparison of Relation Extraction Models for Generating Knowledge Graphs in the Oil Drilling Domain

  • Gabriel H. G. Ticianeli UNESP
  • Arnaldo Candido Junior UNESP
  • Ivan Rizzo Guilherme UNESP
  • Bruno Elias Penteado USP
  • Stephan Ribeiro Perrout Petrobras
  • Luis Henrique Morelli UNESP
  • Pedro Henrique Paiola UNESP
  • Gabriel Lino Garcia UNESP

Resumo


Grafos de Conhecimento são estruturas que oferecem conhecimento explícito, raciocínio simbólico e resultados interpretáveis, além de poderem ser aprimoradas com o tempo. Por isso, a extração automática de relações entre entidades a partir de textos não estruturados é uma das atuais áreas de pesquisa do Processamento de Linguagem Natural. Neste artigo, selecionamos três modelos de extração de relações do estado da arte treinados em domínio geral e comparamos seus resultados obtidos em um dataset da área de perfuração de poços de petróleo. Os resultados demonstraram a baixa eficácia dos modelos gerais quando aplicados à linguagem técnica deste domínio.
Palavras-chave: Processamento de Linguagem Natural, Grafo de Conhecimento, Extração de Relações, Perfuração de Petróleo

Referências

Abdullah, M., Madain, A., and Jararweh, Y. (2022). Chatgpt: Fundamentals, applications and social impacts. In 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), pages 1–8. Ieee.

Athiwaratkun, B., Nogueira dos Santos, C., Krone, J., and Xiang, B. (2020). Augmented natural language for generative sequence labeling. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 375–385, Online. Association for Computational Linguistics.

Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., and Choi, Y. (2019). COMET: Commonsense transformers for automatic knowledge graph construction. In Korhonen, A., Traum, D., and Màrquez, L., editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779, Florence, Italy. Association for Computational Linguistics.

Cabot, P.-L. H. and Navigli, R. (2021). Rebel: Relation extraction by end-to-end language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370–2381.

Cabot, P.-L. H., Tedeschi, S., Ngomo, A.-C. N., and Navigli, R. (2023). RedFM : a filtered and multilingual relation extraction dataset. arXiv preprint arXiv:2306.09802.

Cinelli, L. P., de Oliveira, J. F., de Pinho, V. M., Passos, W. L., Padilla, R., Braz, P. F., Galves, B., Dalvi, D. P., Lewenfus, G., Ferreira, J. O., et al. (2021). Automatic event identification and extraction from daily drilling reports using an expert system and artificial intelligence. Journal of Petroleum Science and Engineering, 205:108939.

De Cao, N., Izacard, G., Riedel, S., and Petroni, F. (2020). Autoregressive entity retrieval. arXiv preprint arXiv:2010.00904.

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Freitas, C., Souza, E., Castro, M. C., Cavalcanti, T., da Silva, P. F., and Cordeiro, F. C. (2015). Recursos linguísticos para o pln específico de domínio: o petrolês. Linguamática, 15(2):51–68.

Gomes, D. d. S. M., Cordeiro, F. C., Consoli, B. S., Santos, N. L., Moreira, V. P., Vieira, R., Moraes, S., and Evsukoff, A. G. (2021). Portuguese word embeddings for the oil and gas industry: Development and evaluation. Computers in Industry, 124:103347.

Hoffimann, J., Mao, Y., Wesley, A., and Taylor, A. (2018). Sequence mining and pattern analysis in drilling reports with deep natural language processing. In SPE Annual Technical Conference and Exhibition?, page D031S033R004. SPE.

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., et al. (2021). Knowledge graphs. ACM Computing Surveys (Csur), 54(4):1–37.

Huang, K.-H., Tang, S., and Peng, N. (2021). Document-level entity-based extraction as template generation. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t., editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5257–5269, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., and Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

Liu, X., Huang, H., Shi, G., and Wang, B. (2022). Dynamic prefix-tuning for generative template-based event extraction. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5216–5228, Dublin, Ireland. Association for Computational Linguistics.

Lu, Y., Liu, Q., Dai, D., Xiao, X., Lin, H., Han, X., Sun, L., and Wu, H. (2022). Unified structure generation for universal information extraction. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5755–5772, Dublin, Ireland. Association for Computational Linguistics.

Melnyk, I., Dognin, P., and Das, P. (2022). Knowledge graph generation from text. arXiv preprint arXiv:2211.10511.

Meta (2024). Meta ai research topic - no language left behind.

Nadkarni, P. M., Ohno-Machado, L., and Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5):544–551.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering.

Ribeiro, L. C., Afonso, L. C., Colombo, D., Guilherme, I. R., and Papa, J. P. (2020). Evolving neural conditional random fields for drilling report classification. Journal of Petroleum Science and Engineering, 187:106846.

Singh, H. and Sharma, R. (2012). Role of adjacency matrix & adjacency list in graph theory. International Journal of Computers & Technology, 3(1):179–183.

Tang, W., Xu, B., Zhao, Y., Mao, Z., Liu, Y., Liao, Y., and Xie, H. (2022). Unirel: Unified representation and interaction for joint relational triple extraction. arXiv preprint arXiv:2211.09039.

Yan, H., Dai, J., Ji, T., Qiu, X., and Zhang, Z. (2021a). A unified generative framework for aspect-based sentiment analysis. In Zong, C., Xia, F., Li, W., and Navigli, R., editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2416–2429, Online. Association for Computational Linguistics.

Yan, H., Gui, T., Dai, J., Guo, Q., Zhang, Z., and Qiu, X. (2021b). A unified generative framework for various NER subtasks. In Zong, C., Xia, F., Li, W., and Navigli, R., editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5808–5822, Online. Association for Computational Linguistics.

Ye, H., Zhang, N., Chen, H., and Chen, H. (2022). Generative knowledge graph construction: A review. arXiv preprint arXiv:2210.12714.

Zeng, X., Zeng, D., He, S., Liu, K., and Zhao, J. (2018). Extracting relational facts by an end-to-end neural model with copy mechanism. In Gurevych, I. and Miyao, Y., editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 506–514, Melbourne, Australia. Association for Computational Linguistics.

Zhao, F., Jiang, Z., Kang, Y., Sun, C., and Liu, X. (2021). Adjacency list oriented relational fact extraction via adaptive multi-task learning. arXiv preprint arXiv:2106.01559.

Zhong, L., Wu, J., Li, Q., Peng, H., and Wu, X. (2023). A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys, 56(4):1–62.
Publicado
17/11/2024
TICIANELI, Gabriel H. G.; CANDIDO JUNIOR, Arnaldo; GUILHERME, Ivan Rizzo; PENTEADO, Bruno Elias; PERROUT, Stephan Ribeiro; MORELLI, Luis Henrique; PAIOLA, Pedro Henrique; GARCIA, Gabriel Lino. Comparison of Relation Extraction Models for Generating Knowledge Graphs in the Oil Drilling Domain. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 707-718. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245094.

Artigos mais lidos do(s) mesmo(s) autor(es)