A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Ana Begnini; Matheus Vicente; Leonardo Souza

doi:10.5753/stil.2025.37813

Ana Begnini Instituto de Pesquisas Eldorado
Matheus Vicente Instituto de Pesquisas Eldorado
Leonardo Souza Instituto de Pesquisas Eldorado

DOI: https://doi.org/10.5753/stil.2025.37813

Resumo

In business-to-business relations, it is common to establish Non-Disclosure Agreements (NDAs). However, these documents exhibit significant variation in format, structure, and writing style, making manual analysis slow and error-prone. We propose an architecture based on LLMs to automate the segmentation and clauses classification within these contracts. We employed two models: LLaMA-3.1-8B-Instruct for NDA segmentation (clause extraction) and a fine-tuned Legal-Roberta-Large for clause classification. In the segmentation task, we achieved a ROUGE F1 of 0.95 ± 0.0036; for classification, we obtained a weighted F1 of 0.85, demonstrating the feasibility and precision of the approach.

Referências

Almuslim, I. and Inkpen, D. (2020). Document level embeddings for identifying similar legal cases and laws. Proc. Work. Notes-Forum Inf. Retr. Eval, page 42–48.

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., and Androutsopoulos, I. (2020). Legal-bert: The muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898–2904.

Chalkidis, I., Garneau, N., Goanta, C., Katz, D. M., and Søgaard, A. (2023). LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada. Association for Computational Linguistics.

D. Mamakas, P. Tsotsi, I. A. and Chalkidis, I. (2022). Processing long legal documents with pre-trained transformers: Modding legalbert and longformer. Proc. Natural Legal Lang. Process. Workshop.

DeepSeek-AI (2024). Deepseek-v3 technical report.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186.

E. Leitner, G. Rehm, J. M. S. (2020). A dataset of german legal documents for named entity recognition. Proc. 12th Lang. Resour. Eval. Conf., page 4478–4485.

Eger, S. (2013). Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics. Information Sciences, 237:287–304. Prediction, Control and Diagnosis using Advanced Neural Computations.

Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for multi-class classification: an overview.

Greco, C.M. e Tagarelli, A. (2023). Bringing order into the realm of transformer-based language models for artificial intelligence and law. Artif Intell Law 32.

H. Zhong, Z. Guo, C. T. C. X. Z. L. and Sun, M. (2018). Legal judgment prediction via topological learning. Proc. Conf. Empirical Methods Natural Lang. Process., page 3540–3549.

Hambarde, K. and Proença, H. (2021). Information retrieval: Recent advances and beyond. Not Specified.

Howard, J. and Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339.

J. Rabelo, R. Goebel, M.-Y. K. Y. K. M. Y. and Satoh, K. (2021). Overview and discussion of the competition on legal information extraction/entailment (coliee) 2021. Rev. Socionetw. Strategies, 16.

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. (2023). Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.

L. Zheng, N. Guha, B. R. A. P. H. D. E. H. (2021). When does pretraining help?: Assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. Proc. 18th Int. Conf. Artif. Intell. Law, page 159–168.

LangChain (2025). Langgraph. [link]. Acesso em: jun. 2025.

Licari, D. and Comandè, G. (2024). Italian-legal-bert models for improving natural language processing tasks in the italian legal domain. Comput. Law Secur. Rev., 52.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P. (2018). Focal loss for dense object detection.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.

Mahari, R. Z. (2021). Autolaw: Augmented legal reasoning through legal precedent prediction. Not Specified.

Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) Protein Structure, 405(2):442–451.

Miller, H., Kuflik, T., and Lavee, M. (2025). Text alignment in the service of text reuse detection. Applied Sciences, 15(6).

Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453.

OpenAI (2023). Gpt-4 technical report. [link]. Accessed: 2025-06-04.

OpenAI (2024). Openai api. [link]. Acesso em: jun. 2025.

OpenAI (2024). text-embedding-3 models. [link]. Accessed: 2025-06-04.

P. Bhattacharya, K. Ghosh, A. P. and Ghosh, S. (2022). Legal case document similarity: You need both network and text. Inf. Process. Manage., 59.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011). Scikit-learn: Machine learning in python.

Ragas Documentation (2025a). Factual correctness metric. [link]. Acesso em: 01 jun. 2025.

Ragas Documentation (2025b). Semantic similarity. [link]. Acesso em: 01 jun. 2025.

S. Auriemma, M. Madeddu, M. M. A. B. L. C. P. and Lenci, A. (2023). Bureauberto: adapting umberto to the italian bureaucratic language. Proc. Italia Intelligenza Artificiale, 3486:240–248.

S. Leivaditi, J. Rossi, E. K. (2022). A benchmark for lease contract review. Not Specified.

S. Shaghaghian, L. Y. Feng, B. J. N. P. (2020). Customizing contextualized language models for legal document reviews. Proc. IEEE Int. Conf. Big Data (Big Data), page 2139–2148.

Shah, S., Lucia Manzoni, S., Zaman, F., Es Sabery, F., Epifania, F., and Francesco Zoppis, I. (2024). Fine-tuning of distil-bert for continual learning in text classification: An experimental analysis. IEEE Access, 12:104964–104982.

Siino, M., Falco, M., Croce, D., and Rosso, P. (2025). Exploring llms applications in law: A literature review on current legal nlp approaches. IEEE Access.

Silva, D. P. e. (2017). Vocabulário Jurídico. Forense, Rio de Janeiro, 34 edition.

Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: Pretrained BERT models for brazilian portuguese. In Brazilian Conference on Intelligent Systems (BRACIS), pages 403–417. Springer.

Taecharungroj, V. (2023). What can ChatGPT do? analyzing early reactions to the innovative ai chatbot on twitter. Big Data and Cognitive Computing, 7(1):35.

Tagarelli, A. and Simeri, A. (2022). Unsupervised law article mining based on deep pre-trained language representation models with application to the italian civil code. Artif. Intell. Law, 20:417–473.

Touvron, H., Lavril, T., Izacard, G., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

Van Asch, V. (2013). Macroand micro-averaged evaluation measures. In CLIN Journal.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, volume 30.

Y. Hu, M. Hosseini, E. S. P. J. O. L. K. P. B. and D’Orazio, V. (2018). Legal judgment prediction via topological learning. Proc. Conf. Empirical Methods Natural Lang. Process., page 3540–3549.

Y. Koreeda, C. M. (2021). Contractnli: A dataset for documentlevel natural language inference for contracts. Proc. Findings Assoc. Comput. Linguistics, page 1907–1919.

Y. Shao, J. Mao, Y. L. W. M. K. S. M. Z. and Ma, S. (2020). Bertpli: Modeling paragraph-level interactions for legal case retrieval. Proc. 29th Int. Joint Conf. Artif. Intell., page 3501–3507.

Yu, S., Su, J., and Luo, D. (2019). Improving bert-based text classification with auxiliary sentence and domain knowledge. IEEE Access, 7:176600–176612.

Zhang, M.-L. and Zhou, Z.-H. (2006). A k-nearest neighbor based algorithm for multi-label classification. Proceedings of the IEEE International Conference on Granular Computing, pages 718–721.