A Pipeline for Automated Supervised Tuning of LLMs

  • Diego D. Fernandes UECE
  • Thalyson G. N. Silva UECE / IFCE
  • Rafael S. Santos UECE
  • Felipe G. Marajo UECE
  • Cleilton L. Rocha Atlantic Institute
  • Ana Luiza B. P. Barros UECE
  • Gustavo A. L. Campos UECE

Abstract


The increasing complexity of language models and the growing diversity of Natural Language Processing (NLP) tasks have emphasized the need for automated and efficient tuning processes. But, while advances in Automated Machine Learning (AutoML) and parameter-efficient fine-tuning (PEFT) techniques have been made, there is still a lack of structured workflows adapted to the specific challenges of Large Language Models (LLM). This work proposes a conceptual, modular Automated Large Language Model (AutoLLM) pipeline that integrates both full and lightweight supervised fine-tuning strategies under a unified optimization framework. The pipeline leverages simulation-based search methods, particularly bioinspired algorithms such as Genetic Algorithms (GA), to automate hyperparameter tuning in supervised LLM adaptation. Preliminary experiments were conducted using Quantized Low-Rank Adaptation (QLoRA) for lightweight fine-tuning on the summarization task. The results illustrate the adaptability and efficiency of the approach in optimizing LLMs for one downstream application while maintaining computational feasibility. The proposed pipeline offers a structured foundation for advancing AutoML in the context of language model tuning.

References

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. The journal of machine learning research, 13(1):281–305.

Bouthillier, X., Delaunay, P., Bronzi, M., Trofimov, A., Nichyporuk, B., Szeto, J., Mohammadi Sepahvand, N., Raff, E., Madan, K., Voleti, V., et al. (2021). Accounting for variance in machine learning benchmarks. Proceedings of Machine Learning and Systems, 3:747–769.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186.

Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2):179–211.

Elsken, T., Metzen, J. H., and Hutter, F. (2019). Neural architecture search: A survey. Journal of Machine Learning Research, 20(55):1–21.

Halfon, A., Gretz, S., Arviv, O., Spector, A., Toledo-Ronen, O., Katz, Y., Ein-Dor, L., Shmueli-Scheuer, M., and Slonim, N. (2024). Stay tuned: An empirical study of the impact of hyperparameters on llm tuning in real-world applications. arXiv preprint arXiv:2407.18990.

Hasan, A., Ehsan, M. A., Shahnoor, K. B., and Tasneem, S. S. (2024). Automatic question & answer generation using generative Large Language Model (LLM). PhD thesis, Brac University.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.

Hossain, M. R. and Timmer, D. (2021). Machine learning model optimization with hyper parameter tuning approach. Glob. J. Comput. Sci. Technol. D Neural Artif. Intell, 21(2):31.

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. (2019). Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR.

Jelinek, F. (1998). Statistical methods for speech recognition. MIT press.

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.

Lu, Y., Huang, C., Zhan, H., and Zhuang, Y. (2021). Federated natural language generation for personalized dialogue system. arXiv preprint arXiv:2110.06419.

Mulakala, B., Saini, M. L., Singh, A., Bhukya, V., and Mukhopadhyay, A. (2024). Adaptive multi-fidelity hyperparameter optimization in large language models. In 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS), pages 1–5. IEEE.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67.

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.

Tornede, A., Deng, D., Eimer, T., Giovanelli, J., Mohan, A., Ruhkopf, T., Segel, S., Theodorakopoulos, D., Tornede, T., Wachsmuth, H., et al. (2023). Automl in the age of large language models: Current challenges, future opportunities and risks. arXiv preprint arXiv:2306.08107.

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

Tribes, C., Benarroch-Lelong, S., Lu, P., and Kobyzev, I. (2023). Hyperparameter optimization for large language model instruction-tuning. arXiv preprint arXiv:2312.00949.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wang, B., Shin, R., Liu, X., Polozov, O., and Richardson, M. (2019). Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. arXiv preprint arXiv:1911.04942.

Yao, Q., Wang, M., Chen, Y., Dai, W., Li, Y.-F., Tu, W.-W., Yang, Q., and Yu, Y. (2018). Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306, 31.

Zöller, M.-A. and Huber, M. F. (2019). Survey on automated machine learning. arXiv preprint arXiv:1904.12054, 9:844.
Published
2025-09-29
FERNANDES, Diego D.; SILVA, Thalyson G. N.; SANTOS, Rafael S.; MARAJO, Felipe G.; ROCHA, Cleilton L.; BARROS, Ana Luiza B. P.; CAMPOS, Gustavo A. L.. A Pipeline for Automated Supervised Tuning of LLMs. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1890-1901. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14226.