Abstract
Drug development is often a complex and time-consuming process. Especially in the initial phase, selecting a target for drug development can take many years. Essential genes and proteins are biological entities responsible for the biological processes of survival and reproduction of organisms. Studies indicate that essential genes tend to have higher expression and encode proteins that engage in more protein-protein interactions. All these characteristics make essential proteins potential drug targets. Thus, this work proposes using protein-protein interaction-based features to train and evaluate machine learning algorithms to identify essential proteins. Experiments with the organism Saccharomyces cerevisiae indicate that the application of the Random Forest algorithm and balancing techniques obtained better recall values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhang, Z., Ren, Q.: Why are essential genes essential?-the essentiality of Saccharomyces genes. Microbial Cell 2(8), 280 (2015)
Hughes, J.P., et al.: Principles of early drug discovery. Brit. J. Pharmacol. 162(6), 1239–1249 (2011)
Peng, C., et al.: A comprehensive overview of online resources to identify and predict bacterial essential genes. Front. Microbiol. 8, 2331 (2017)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005). https://doi.org/10.1109/TKDE.2005.50
Belloze, K., et al.: A review of artificial neural networks for the prediction of essential proteins. Netw. Syst. Biol., 45–68 (2020)
Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021). https://doi.org/10.1093/nar/gkaa1074
Rigden, D.J., Fernández, X.M.: The 2022 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res. 50(D1), D1–D10 (2022)
Azhagesan, K., et al.: Network-based features enable prediction of essential genes across diverse organisms. PloS one 13(12), e0208722 (2018). https://doi.org/10.1371/journal.pone.0208722
Zhang, J., et al.: NetEPD: a network-based essential protein discovery platform. Tsinghua Sci. Technol. 25(4), 542–552 (2020)
Garcia, F.P., Guedes, G.P., Belloze, K.T.: Identifying Schistosoma mansoni essential protein candidates based on machine learning. In: Kowada, L., de Oliveira, D. (eds.) BSB 2019. LNCS, vol. 11347, pp. 123–128. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46417-2_12
Wang, T., et al.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)
Biswas, R., et al.: Drug discovery and drug identification using AI. In: 2020 Indo-Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN). IEEE (2020)
Srinivasa, K.G., Siddesh, G.M., Manisekhar, S.R. (eds.): Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications. AIS, Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2445-5
Luo, H., et al.: DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res. 49(D1), D677–D686 (2021)
Hagberg, A., Pieter S., Chult, D.S.: Exploring network structure, dynamics, and function using NetworkX. No. LA-UR-08-05495; LA-UR-08-5495. Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2008)
Aromolaran, O., et al.: Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput. Struct. Biotechnol. J. 18, 612–621 (2020)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva Costa, J., Rodrigues, J.G., Belloze, K. (2022). Evaluating Machine Learning Models for Essential Protein Identification. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-21175-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21174-4
Online ISBN: 978-3-031-21175-1
eBook Packages: Computer ScienceComputer Science (R0)