Domain-Specific Fine-Tuning of Large Language Models for Pharmacological Question Answering
Resumo
Large Language Models (LLMs) perform well in general NLP tasks but face challenges in specialized domains such as pharmacology. This study investigates whether fine-tuning with DrugBank data improves response reliability. We construct a question–answer dataset from absorption and metabolism sections and fine-tune a LLaMA 3.1 8B model using efficient adaptation techniques. The effectiveness of the fine-tuned model is evaluated against its original version using ROUGE-L, BLEU, and Exact Match metrics, as well as qualitative analysis. The results show improvements and more domain-specific responses, indicating that fine-tuning effectively adapts LLMs to pharmacological tasks.Referências
Cao, D., Wang, J., Zhou, R., Li, Y., Yu, H., and Hou, T. (2012). Admet evaluation in drug discovery. 11. pharmacokinetics knowledge base (pkkb): a comprehensive database of pharmacokinetic and toxic properties for drugs. Journal of Chemical Information and Modeling, 52(5):1132–1137.
Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems, 36:10088–10115.
Fan, S., Yang, K., Lu, K., Dong, X., Li, X., Zhu, Q., Li, S., Zeng, J., and Zhou, X. (2024). Drugreppt: a deep pretraining and fine-tuning framework for drug repositioning based on drug’s expression perturbation and treatment effectiveness. Bioinformatics, 40(12):btae692.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2):3.
Kang, H., Li, J., Hou, L., Xu, X., Zheng, S., and Li, Q. (2025). Large language model–enhanced drug repositioning knowledge extraction via long chain-of-thought: Development and evaluation study. JMIR Medical Informatics, 13:e77837.
Kim, M., Kim, Y., Kang, H. J., Seo, H., Choi, H., Han, J., Kee, G., Park, S., Ko, S., Jung, H., et al. (2025). Fine-tuning llms with medical data: can safety be ensured? NEJM AI, 2(1):AIcs2400390.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
Machado, J., Rodrigues, C., Sousa, R., and Gomes, L. M. (2025). Drug–drug interaction extraction-based system: An natural language processing approach. Expert Systems, 42(1):e13303.
Papanikolaou, N., Pavlopoulos, G. A., Theodosiou, T., Vizirianakis, I. S., and Iliopoulos, I. (2016). Drugquest-a text mining workflow for drug association discovery. BMC bioinformatics, 17(Suppl 5):182.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
Ramasamy, S. R., Rathee, G., et al. (2025). Fine-tuning llm for rare disease diagnosis. In 2025 International Conference on Sustainability, Innovation & Technology (ICSIT), pages 1–6. IEEE.
Sam, K. (2024). Llama 3.1: An in-depth analysis of the next-generation large language model. Available at SSRN 6139407.
Tosca, E. M., Aiello, L., De Carlo, A., and Magni, P. (2025). Pharmacometrics in the age of large language models: A vision of the future. Pharmaceutics, 17(10):1274.
Wang, C., Li, M., He, J., Wang, Z., Darzi, E., Chen, Z., Ye, J., Li, T., Su, Y., Ke, J., et al. (2024). A survey for large language models in biomedicine. arXiv preprint arXiv:2409.00133.
Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., Gautam, B., and Hassanali, M. (2008). Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research, 36(suppl 1):D901–D906.
Zhang, Y., Ren, S., Wang, J., Lu, J., Wu, C., He, M., Liu, X., Wu, R., Zhao, J., Zhan, C., et al. (2025). Aligning large language models with humans: a comprehensive survey of chatgpt’s aptitude in pharmacology. Drugs, 85(2):231–254.
Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems, 36:10088–10115.
Fan, S., Yang, K., Lu, K., Dong, X., Li, X., Zhu, Q., Li, S., Zeng, J., and Zhou, X. (2024). Drugreppt: a deep pretraining and fine-tuning framework for drug repositioning based on drug’s expression perturbation and treatment effectiveness. Bioinformatics, 40(12):btae692.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2):3.
Kang, H., Li, J., Hou, L., Xu, X., Zheng, S., and Li, Q. (2025). Large language model–enhanced drug repositioning knowledge extraction via long chain-of-thought: Development and evaluation study. JMIR Medical Informatics, 13:e77837.
Kim, M., Kim, Y., Kang, H. J., Seo, H., Choi, H., Han, J., Kee, G., Park, S., Ko, S., Jung, H., et al. (2025). Fine-tuning llms with medical data: can safety be ensured? NEJM AI, 2(1):AIcs2400390.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
Machado, J., Rodrigues, C., Sousa, R., and Gomes, L. M. (2025). Drug–drug interaction extraction-based system: An natural language processing approach. Expert Systems, 42(1):e13303.
Papanikolaou, N., Pavlopoulos, G. A., Theodosiou, T., Vizirianakis, I. S., and Iliopoulos, I. (2016). Drugquest-a text mining workflow for drug association discovery. BMC bioinformatics, 17(Suppl 5):182.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
Ramasamy, S. R., Rathee, G., et al. (2025). Fine-tuning llm for rare disease diagnosis. In 2025 International Conference on Sustainability, Innovation & Technology (ICSIT), pages 1–6. IEEE.
Sam, K. (2024). Llama 3.1: An in-depth analysis of the next-generation large language model. Available at SSRN 6139407.
Tosca, E. M., Aiello, L., De Carlo, A., and Magni, P. (2025). Pharmacometrics in the age of large language models: A vision of the future. Pharmaceutics, 17(10):1274.
Wang, C., Li, M., He, J., Wang, Z., Darzi, E., Chen, Z., Ye, J., Li, T., Su, Y., Ke, J., et al. (2024). A survey for large language models in biomedicine. arXiv preprint arXiv:2409.00133.
Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., Gautam, B., and Hassanali, M. (2008). Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research, 36(suppl 1):D901–D906.
Zhang, Y., Ren, S., Wang, J., Lu, J., Wu, C., He, M., Liu, X., Wu, R., Zhao, J., Zhan, C., et al. (2025). Aligning large language models with humans: a comprehensive survey of chatgpt’s aptitude in pharmacology. Drugs, 85(2):231–254.
Publicado
01/06/2026
Como Citar
VEROL, Felipe; REGINO, Andre Gomes; ZAGATTI, Fernando Rezende; ROSA, Ferrucio de Franco; REIS, Julio Cesar Dos; BONACIN, Rodrigo.
Domain-Specific Fine-Tuning of Large Language Models for Pharmacological Question Answering. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 26. , 2026, Ouro Preto/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 870-881.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2026.21567.
