From Sequence to Stability: Rational Insulin Design via Genetic Algorithms and Deep Learning Models in Structural Bioinformatics
Resumo
Insulin’s therapeutic efficacy is hindered by its thermal instability, a major limitation in global diabetes treatment, especially in regions lacking reliable refrigeration. Here we present an in silico protein design pipeline that integrates a multi-objective genetic algorithm with deep learning models to engineer insulin variants with enhanced thermostability and reduced aggregation propensity. The algorithm evolves populations of mutated insulin sequences, evaluated by TemBERTure for thermostability and Aggrescan3D for solubility, incorporating ESMFold for 3D structure prediction. A microservices architecture using Docker ensures scalable and efficient execution. Our results identify candidate variants that maintain high sequence identity and preserve key functional motifs while showing superior biophysical properties. These findings illustrate how the combination of evolutionary algorithms and protein language models can support rational, data-driven strategies in protein engineering. The source code and reproducible experiments are publicly available at https://github.com/gabrielfruet/protein-aggregation.
Referências
Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., Bhowmik, D., and Rost, B. (2022). Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7112–7127.
Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA.
Ogurtsova, K., da Rocha Fernandes, J., Huang, Y., Linnenkamp, U., Guariguata, L., Cho, N., Cavan, D., Shaw, J., and Makaroff, L. (2017). Idf diabetes atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Research and Clinical Practice, 128:40–50.
Polonsky, K. S. (2012). The past 200 years in diabetes. New England Journal of Medicine, 367:1332–1340.
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118. bioRxiv 10.1101/622803.
Rodella, C., Lazaridi, S., and Lemmin, T. (2024). TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. Bioinformatics Advances, 4(1):vbae103.
Sanger, F. (1959). Chemistry of insulin. Science, 129:1340–1344.
Schrödinger, LLC (2015). The PyMOL molecular graphics system, version 1.8.
Zambrano, R., Jamroz, M., Szczasiuk, A., Pujols, J., Kmiecik, S., and Ventura, S. (2015). Aggrescan3d (a3d): server for prediction of aggregation properties of protein structures. Nucleic Acids Research, 43(W1):W306–W313.
