Contextual Augmentation and Delimitation for Named Entity Recognition and Relation Extraction in Legal Documents

  • Fabiano Muniz Belém Federal University of Minas Gerais (UFMG)
  • Marcelo Ganem Federal University of Minas Gerais (UFMG)
  • Celso França Federal University of Minas Gerais (UFMG)
  • Marcos Carvalho Federal University of Minas Gerais (UFMG)
  • Alberto H. F. Laender Federal University of Minas Gerais (UFMG)
  • Marcos André Gonçalves Federal University of Minas Gerais (UFMG)

Abstract


Transformer architectures have become the main component of various state-of-the-art methods for natural language processing tasks, such as Named Entity Recognition and Relation Extraction (NER+RE). As these architectures rely on semantic aspects of word sequences, they may fail to accurately identify and delimit entity spans when there is little semantic context surrounding the named entities. This is the case of entities composed by digits and punctuation only, such as IDs and phone numbers, as well as long composed names. In this paper, we propose new techniques for contextual reinforcement and entity delimitation based on pre- and post-processing techniques to provide a richer semantic context, improving SpERT, a state-of-the-art Span-based Entity and Relation Transformer. We evaluate our strategies using real data from public administration documents and court lawsuits. Our results show that our pre- and post-processing strategies, when used co-jointly, allows significant improvements on NER+ER effectiveness.
Keywords: Named Entity Recognition, Relation Extraction, Contextual Enhancement, Text Processing

References

Brunner, U. & Stockinger, K. (2020). Entity Matching with Transformer Architectures-A Step Forward in Data Integration. In International Conference on Extending Database Technology, pages 463-473.

Caputo, A., Basile, P., & Semeraro, G. (2009). Boosting a Semantic Search Engine by Named Entities. In Foundations of Intelligent Systems, pages 241-250.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171-4186.

Eberts, M. & Ulges, A. (2020). Span-based Joint Entity and Relation Extraction with Transformer Pretraining. In 24th European Conference on Artificial Intelligence, pages 2006-2013.

Eberts, M. & Ulges, A. (2021). An End-to-end Model for Entity-level Relation Extraction using Multiinstance Learning. In Association for Computational Linguistics, pages 3650-3660.

Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Annual Meeting of the Association for Computational Linguistics, pages 363-370.

Fu, J., Huang, X., & Liu, P. (2021). SpanNER: Named Entity Re-/Recognition as Span Prediction. In Annual Meeting of the Association for Computational Linguistics, pages 7183-7195.

Liu, C., Fan, H., & Liu, J. (2021). Span-based nested named entity recognition with pretrained language model. In Jensen, C. S., Lim, E.-P., Yang, D.-N., Lee, W.-C., Tseng, V. S., Kalogeraki, V., Huang, J.-W., & Shen, C.-Y., editors, Database Systems for Advanced Applications, pages 620-628.

Luz de Araujo, P. H., de Campos, T. E., de Oliveira, R. R. R., Stauffer, M., Couto, S., & Bermejo, P. (2018). LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In International Conference on the Computational Processing of Portuguese (PROPOR), pages 313-323.

Niu, F., Zhang, C., R´e, C., & Shavlik, J. W. (2012). DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. VLDS, 12:25-28.

Patil, N., Patil, A., & Pawar, B. (2020). Named Entity Recognition using Conditional Random Fields. Procedia Computer Science, 167:1181-1188.

Silva, L., Canalle, G. K., Salgado, A. C., Lóscio, B., & Moro, M. (2019). Uma Análise Experimental do Impacto da Seleção de Atributos em Processos de Resolução de Entidades. In SBBD, pages 37-48.

Wang, T., Zhao, X., Lv, Q., Hu, B., & Sun, D. (2021). Density weighted diversity based query strategy for active learning. In IEEE International Conference on Computer Supported Cooperative Work in Design (CSCWD), pages 156-161.

Zhang, S., He, L., Vucetic, S., & Dragut, E. (2018). Regular Expression Guided Entity Mention Mining from Noisy Web Data. In Empirical Methods in Natural Language Processing, pages 1991-2000.
Published
2022-09-19
BELÉM, Fabiano Muniz; GANEM, Marcelo; FRANÇA, Celso; CARVALHO, Marcos; LAENDER, Alberto H. F.; GONÇALVES, Marcos André. Contextual Augmentation and Delimitation for Named Entity Recognition and Relation Extraction in Legal Documents. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 37. , 2022, Búzios. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 292-303. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2022.224650.