Evaluating Large Language Models for Tax Law Reasoning

  • João Paulo Cavalcante Presa UFG
  • Celso Gonçalves Camilo Junior UFG
  • Sávio Salvarino Teles de Oliveira UFG

Resumo


The ability to reason over laws is essential for legal professionals, facilitating interpreting and applying legal principles to complex real-world situations. Tax laws are crucial for funding government functions and shaping economic behavior, yet their interpretation poses challenges due to their complexity, constant evolution, and susceptibility to differing viewpoints. Large Language Models (LLMs) show considerable potential in supporting this reasoning process by processing extensive legal texts and generating relevant information. This study evaluates the performance of LLMs in legal reasoning within the domain of tax law for legal entities, utilizing a dataset of real-world questions and expert answers in Brazilian Portuguese. We employed quantitative metrics (BLEU, ROUGE) and qualitative assessment using a solid LLM to ensure factual accuracy and relevance. A novel dataset was curated, comprising genuine questions from legal entities in tax law, answered by legal experts with corresponding legal texts. The evaluation includes both open-source and proprietary LLMs, providing a assessment of their effectiveness in legal reasoning tasks. The strong correlation between robust LLM evaluator metric and Bert Score F1 suggests these metrics effectively capture semantic aspects pertinent to human-perceived quality.
Publicado
17/11/2024
PRESA, João Paulo Cavalcante; CAMILO JUNIOR, Celso Gonçalves; OLIVEIRA, Sávio Salvarino Teles de. Evaluating Large Language Models for Tax Law Reasoning. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 460-474. ISSN 2643-6264.