From Sequence to Stability: Rational Insulin Design via Genetic Algorithms and Deep Learning Models in Structural Bioinformatics

Resumo


Insulin’s therapeutic efficacy is hindered by its thermal instability, a major limitation in global diabetes treatment, especially in regions lacking reliable refrigeration. Here we present an in silico protein design pipeline that integrates a multi-objective genetic algorithm with deep learning models to engineer insulin variants with enhanced thermostability and reduced aggregation propensity. The algorithm evolves populations of mutated insulin sequences, evaluated by TemBERTure for thermostability and Aggrescan3D for solubility, incorporating ESMFold for 3D structure prediction. A microservices architecture using Docker ensures scalable and efficient execution. Our results identify candidate variants that maintain high sequence identity and preserve key functional motifs while showing superior biophysical properties. These findings illustrate how the combination of evolutionary algorithms and protein language models can support rational, data-driven strategies in protein engineering. The source code and reproducible experiments are publicly available at https://github.com/gabrielfruet/protein-aggregation.

Palavras-chave: Protein Engineering, Insulin, Thermostability, Genetic Algorithm, Deep Learning

Referências

Bocian, W., Sitkowski, J., Bednarek, E., Tarnowska, A., Kawecki, R., and Kozerski, L. (2008). Structure of human insulin monomer in water/acetonitrile solution. Journal of Biomolecular NMR, 40:55–64.

Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., Bhowmik, D., and Rost, B. (2022). Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7112–7127.

Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA.

Ogurtsova, K., da Rocha Fernandes, J., Huang, Y., Linnenkamp, U., Guariguata, L., Cho, N., Cavan, D., Shaw, J., and Makaroff, L. (2017). Idf diabetes atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Research and Clinical Practice, 128:40–50.

Polonsky, K. S. (2012). The past 200 years in diabetes. New England Journal of Medicine, 367:1332–1340.

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118. bioRxiv 10.1101/622803.

Rodella, C., Lazaridi, S., and Lemmin, T. (2024). TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. Bioinformatics Advances, 4(1):vbae103.

Sanger, F. (1959). Chemistry of insulin. Science, 129:1340–1344.

Schrödinger, LLC (2015). The PyMOL molecular graphics system, version 1.8.

Zambrano, R., Jamroz, M., Szczasiuk, A., Pujols, J., Kmiecik, S., and Ventura, S. (2015). Aggrescan3d (a3d): server for prediction of aggregation properties of protein structures. Nucleic Acids Research, 43(W1):W306–W313.
Publicado
29/09/2025
GOMES, Mateus R.; FRUET, Gabriel Vasconcelos; MEDEIROS, Ingryd; DA COSTA, Roner F.; BEZERRA, Eveline M.; G. GOMES, Danielo. From Sequence to Stability: Rational Insulin Design via Genetic Algorithms and Deep Learning Models in Structural Bioinformatics. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 18. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 96-105. ISSN 2316-1248. DOI: https://doi.org/10.5753/bsb.2025.14620.