Comparing LLMs in business rule-following

Nikson Bernardes Fernandes Ferreira; William Freitas; Hallyson Melo; Andre Carvalho; Thiago Borges; Rodrigo Marques

doi:10.5753/cibse.2025.35327

Nikson Bernardes Fernandes Ferreira INDT
William Freitas INDT
Hallyson Melo INDT
Andre Carvalho UFAM
Thiago Borges INDT
Rodrigo Marques INDT

DOI: https://doi.org/10.5753/cibse.2025.35327

Resumo

Large Language Models (LLMs) have shown great capabilities in language understanding and instruction following. However, to the author’s knowledge, no prior work has evaluated their performance in real industry internal rule-following scenarios compared to humans. The present R&D project aims to analyze the applicability of LLMs in improving efficiency in task analysis and scheduling through automatic team assignment, following a set of internal business rules. The study was funded by SUFRAMA and is a collaboration between INDT and Motorola Mobility. The experiment results show that lightweight open LLMs, on average, have worse accuracy than mean worker (57.5% x 86.25%) with a higher divergence rate (90% x 45%).

Palavras-chave: LLM, LLMs, rule-following, Business-rules, open-source models

Referências

Kahng, M., Tenney, I., Pushkarna, M., Liu, M. X., Wexler, J., Reif, E., Kallarackal, K., Chang, M., Terry, M., and Dixon, L. (2024). Llm comparator: Visual analytics for side-by-side evaluation of large language models.

Sun, W., Zhang, C., Zhang, X., Yu, X., Huang, Z., Chen, P., Xu, H., He, S., Zhao, J., and Liu, K. (2024). Beyond instruction following: Evaluating inferential rule following of large language models.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models.