Reconhecimento de entidades nomeadas em documentos jurídicos em português utilizando redes neurais
Resumo
Ao longo dos últimos anos, a tecnologia da informação vem transformando o mundo jurídico, automatizando processos e, por consequência, diminuindo o tempo necessário para criação e análise de peças jurídicas digitais. Um dos problemas mais estudados nesta área é o reconhecimento de entidades nomeadas (REN) em textos não estruturados. Trabalhos anteriores não abordaram a detecção de entidades legais por meio da aplicação de modelos baseados em redes neurais disponíveis em bibliotecas de processamento de linguagens natural. Neste artigo, o uso de das bibliotecas Spacy e FLAIR foram analisados no contexto de REN em petições iniciais. Os modelos foram treinados com arquiteturas pré-definidas e avaliados em dois corpora, um deles desenvolvido no âmbito deste trabalho. Os resultados obtidos com esses experimentos demonstraram bons resultados com ambas as plataformas Spacy e FLAIR, com desempenho superior quando adotado o BiLSTM-CRF com FLAIR embeddings.
Referências
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., and Vollgraf, R. (2019). Flair: An easy-to-use framework for state-of-the-art nlp. In NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 54–59.
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In COLING 2018, 27th International Conference on Computational Linguistics, pages 1638–1649.
Alom, M. Z., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M., Hasan, M., Essen, B., Awwal, A., and Asari, V. (2019). A state-of-the-art survey on deep learning theory and architectures. Electronics, 8:292.
Cardellino, C., Alemany, L. A., Teruel, M., and Villata, S. (2017). A low-cost, high-coverage legal named entity recognizer, classifier and linker. Proceedings of the International Conference on Artificial Intelligence and Law, pages 9–18.
Chen, J., Huang, Y., Yang, F., and Li, C. (2020). A novel named entity recognition approach of judicial case texts based on BiLSTM-CRF. 12th International Conference on Advanced Computational Intelligence, ICACI 2020, pages 263–268.
Costa, C. M., Veiga, G., Sousa, A., and Nunes, S. (2017). Evaluation of stanford ner for extraction of assembly information from instruction manuals. In 2017 ieee international conference on autonomous robot systems and competitions (icarsc), pages 302–309. IEEE.
Dale, R. (2019). Law and word order: Nlp in legal tech. Natural Language Engineering, 25(1):211–217.
D’Angelo (2019). Em dois anos, número de startups jurídicas cresce 300% no brasil. Disponível em: [link]. Acesso em: 30 maio 2021.
Honnibal, M., Montani, I., Van Landeghem, S., and Boyd, A. (2020). spaCy: Industrial-strength Natural Language Processing in Python.
Ji, B., Liu, R., Li, S., Yu, J., Wu, Q., Tan, Y., and Wu, J. (2019). A hybrid approach for named entity recognition in chinese electronic medical record. BMC Medical Informatics and Decision Making, 19.
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. (2016). Neural architectures for named entity recognition. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 Proceedings of the Conference, pages 260–270.
Luz de Araujo, P. H., de Campos, T. E., de Oliveira, R. R. R., Stauffer, M., Couto, S., and Bermejo, P. (2018). LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In International Conference on the Computational Processing of Portuguese (PROPOR), Lecture Notes on Computer Science (LNCS), pages 313–323, Canela, RS, Brazil. Springer.
Minkov, E., Wang, R. C., and Cohen, W. (2005). Extracting personal names from email: Applying named entity recognition to informal text. In Proceedings of human language technology conference and conference on empirical methods in natural language processing, pages 443–450.
Panchendrarajan, R. and Amaresan, A. (2018). Bidirectional lstm-crf for named entity recognition. In PACLIC.
Son, N. T., Nguyen, L. M., Quoc, H. B., and Shimazu, A. (2016). Recognizing logical parts in legal texts using neural architectures. Proceedings 2016 8th International Conference on Knowledge and Systems Engineering, KSE 2016, pages 252–257.
Storks, S., Gao, Q., and Chai, J. Y. (2019). Recent advances in natural language inference: A survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172.
Yadav, V. and Bethard, S. (2019). A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. arXiv.
Yin, W., Kann, K., Yu, M., and Schütze, H. (2017). Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923.