Specializing Small Language Models into Business and Industry Idea Reviewer Experts with Supervised Fine-Tuning

Gabriel Braga Ladislau; Guilherme Ratti Moraes; Guilherme Goes Zanetti; Abner Grahan Jacobsen; Claudine Badue; Alberto Ferreira De Souza; Thiago Oliveira-Santos

doi:10.5753/sbsi.2026.248583

Gabriel Braga Ladislau UFES / Aumo S.A.
Guilherme Ratti Moraes UFES / Aumo S.A.
Guilherme Goes Zanetti UFES / Aumo S.A.
Abner Grahan Jacobsen Aumo S.A.
Claudine Badue UFES
Alberto Ferreira De Souza UFES / Aumo S.A.
Thiago Oliveira-Santos UFES

DOI: https://doi.org/10.5753/sbsi.2026.248583

Resumo

Research Context: The application of Natural Language Models in industrial and business environments is rapidly expanding. While powerful, these models often require specialization to match the performance of human experts. Practical Problem: Large Language Models (LLMs) face two major barriers for enterprise adoption: 1) the lack of specific, private knowledge required for nuanced tasks, such as classifying internal company innovations, and 2) the operational costs are prohibitively high for long-term, large-scale use. Proposed Solution: We propose a cost-effective alternative by fine-tuning Small Language Models (SLMs) and encoder models (BERTs) in business ideas classification, transforming them into expert systems tailored to a company’s unique context. Related IS Theory: This research is grounded in Task-Technology Fit (TTF) theory, examining the alignment between the task’s characteristics (classifying specialized ideas) and the technology’s attributes (general-purpose LLMs vs. fine-tuned SLMs and BERTs) to determine the optimal fit. Research Method: The research involves developing and evaluating a training method for SLMs and BERTs, with real-world data augmented by an artificial dataset. Additionally, the artificial dataset creation pipeline is showcased by the research. The performance of the resulting SLMs and BERTs are then compared against that of larger, general-purpose LLMs. Results: The findings indicate that the fine-tuned SLMs and BERTs achieve superior performance on the specialized classification task compared to larger, non-fine-tuned LLMs, while significantly reducing operational costs. The results also highlight that augmenting scarce real-world data with diverse artificial data can lead to a more robust, generalizable and rich model. Contributions: This work contributes to a practical and economically viable method for specialized AI agents creation and augmentation of scarce real-world data through synthetically made datasets. Its impact lies in enabling businesses to deploy tailored, high-performing AI solutions for specific and knowledge-based tasks without the high costs of large-scale, general-purpose models.

Referências

Arvidsson, R., Gunnarsson, R., Entezarjou, A., Sundemo, D., and Wikberg, C. (2024). Chatgpt (gpt-4) versus doctors on complex cases of the swedish family medicine specialist examination: an observational comparative study. BMJ Open, 14(12).

Aryan, A., Nain, A. K., McMahon, A., Meyer, L. A., and Sahota, H. S. (2023). The costly dilemma: Generalization, evaluation and cost-optimal deployment of large language models.

Chae, Y. and Davidson, T. R. (2025). Large language models for text classification: From zero-shot learning to instruction-tuning. Sociological Methods & Research.

Chinnalagu, A. (2024). Comparative analysis of fine-tuned llm, bert and dl models for customer sentiment analysis. pages 255–259.

Fatemi, S., Hu, Y., and Mousavi, M. (2025). A comparative analysis of instruction fine-tuning large language models for financial text classification. ACM Transactions on Management Information Systems, 16:1 – 30.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models.

Irugalbandara, C., Mahendra, A., Daynauth, R., Arachchige, T. K., Dantanarayana, J., Flautner, K., Tang, L., Kang, Y., and Mars, J. (2024). Scaling down to scale up: A cost-benefit analysis of replacing openai’s llm with open source slms in production. In 2024 IEEE ISPASS, pages 280–291.

Javaid, M., Haleem, A., and Singh, R. P. (2023). A study on chatgpt for industry 4.0: Background, potentials, challenges, and eventualities. Journal of Economy and Technology, 1:127–143.

Lepagnol, P., Gerald, T., Ghannay, S., Servan, C., and Rosset, S. (2024). Small language models are good too: An empirical study of zero-shot classification. In Calzolari, N., Kan, M.-Y., Hoste, V., Lenci, A., Sakti, S., and Xue, N., editors, LREC-COLING 2024, pages 14923–14936, Torino, Italia. ELRA and ICCL.

Li, Z., Zhu, H., Lu, Z., and Yin, M. (2023). Synthetic data generation with large language models for text classification: Potential and limitations. In The 2023 Conference on Empirical Methods in Natural Language Processing.

López, D. and Oliver, M. (2023). Integrating innovation into business strategy: Perspectives from innovation managers. Sustainability, 15(8).

McInnes, L., Healy, J., and Melville, J. (2020). Umap: Uniform manifold approximation and projection for dimension reduction.

Naidu, G., Zuva, T., and Sibanda, E. M. (2023). A review of evaluation metrics in machine learning algorithms. In Silhavy, R. and Silhavy, P., editors, Artificial Intelligence Application in Networks and Systems, pages 15–25, Cham. Springer International Publishing.

Sun, X., Li, X., Li, J., Wu, F., Guo, S., Zhang, T., and Wang, G. (2023). Text classification via large language models. In 2023 Conference on Empirical Methods in Natural Language Processing.

Wang, L., Shi, C., Du, S., Tao, Y., Shen, Y., Zheng, H., and Qiu, X. (2024). Performance review on llm for solving leetcode problems. 2024 4th AIIM, pages 1050–1054.

Specializing Small Language Models into Business and Industry Idea Reviewer Experts with Supervised Fine-Tuning

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)