Comparing LLMs in business rule-following

  • Nikson Bernardes Fernandes Ferreira INDT
  • William Freitas INDT
  • Hallyson Melo INDT
  • Andre Carvalho UFAM
  • Thiago Borges INDT
  • Rodrigo Marques INDT

Resumo


Large Language Models (LLMs) have shown great capabilities in language understanding and instruction following. However, to the author’s knowledge, no prior work has evaluated their performance in real industry internal rule-following scenarios compared to humans. The present R&D project aims to analyze the applicability of LLMs in improving efficiency in task analysis and scheduling through automatic team assignment, following a set of internal business rules. The study was funded by SUFRAMA and is a collaboration between INDT and Motorola Mobility. The experiment results show that lightweight open LLMs, on average, have worse accuracy than mean worker (57.5% x 86.25%) with a higher divergence rate (90% x 45%).
Palavras-chave: LLM, LLMs, rule-following, Business-rules, open-source models

Referências

Kahng, M., Tenney, I., Pushkarna, M., Liu, M. X., Wexler, J., Reif, E., Kallarackal, K., Chang, M., Terry, M., and Dixon, L. (2024). Llm comparator: Visual analytics for side-by-side evaluation of large language models.

Sun, W., Zhang, C., Zhang, X., Yu, X., Huang, Z., Chen, P., Xu, H., He, S., Zhao, J., and Liu, K. (2024). Beyond instruction following: Evaluating inferential rule following of large language models.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models.
Publicado
12/05/2025
FERREIRA, Nikson Bernardes Fernandes; FREITAS, William; MELO, Hallyson; CARVALHO, Andre; BORGES, Thiago; MARQUES, Rodrigo. Comparing LLMs in business rule-following. In: CONGRESSO IBERO-AMERICANO EM ENGENHARIA DE SOFTWARE (CIBSE), 28. , 2025, Ciudad Real/Espanha. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 368-371. DOI: https://doi.org/10.5753/cibse.2025.35327.