Joint Event Extraction with Contextualized Word Embeddings for the Portuguese Language

Anderson da Silva Brito Sacramento; Marlo Souza

Anderson da Silva Brito Sacramento UFBA https://orcid.org/0000-0002-2288-6899
Marlo Souza UFBA https://orcid.org/0000-0002-5373-7271

Resumo

Event Extraction (EE) is the task of identifying mentions of particular event types and their arguments in text, and it constitutes an important and challenging task within the area of Information Extraction (IE). However, in the context of the Portuguese language, very little work has been conducted on this topic. In this paper, we propose a neural-based method for EE, as well as a data resource to mitigate this research gap. We also present a data augmentation strategy for EE, employing an Open Information Extraction (OIE) system, aiming to overcome the shortage in annotated data for the problem in the Portuguese language. Our experimental results show that our method is able to predict event types and arguments automatically, and the proposed method of data augmentation, in one of the two evaluated samples, contributes to the performance of the tested models in the subtask of argument role prediction. Further, an implementation of our method is available to the community, as the models trained in our experiments (https://github.com/FORMAS/TEFE).

Palavras-chave: Event Extraction, Information Extraction, Natural Language Processing, Portuguese language