MathPIP: Classification of Proinflammatory Peptides Using Mathematical Descriptors
Resumo
Proinflammatory peptide (PIP) is a relevant part of the inflammatory response, often the first response of our immune system to strange bodies, i.e., inflammatory-inducing infection, such as COVID-19. Thus, it is essential to have reliable ways to classify and analyze new instances of PIPs. Machine learning (ML) models have been widely employed for the classification of biological sequences, being the basis for most studies in extensive databases of biological information. Most ML algorithms have difficulty to directly deal with these sequences. Thereby, relevant features are extracted from these sequences, making feature extraction one of the key steps in the application of ML algorithms to biological data. Different features have been proposed, many of them based on prior knowledge, such as molecular structures. However, many biological sequences publicly available do not come with prior knowledge. To deal with this limitation, we propose to investigate the use of mathematical descriptors to extract features from PIP sequences. To assess how relevant are the features extracted using mathematical descriptors, we run experiments where we apply three ML algorithms. In these experiments, we obtained a predictive accuracy of 0.7034, which is on par with current PIP classifiers.
Palavras-chave:
Feature extraction, Biological sequences, Mathematical descriptors, Machine learning
Referências
Tay, M.Z., Poh, C.M., Rénia, L., et al.: The trinity of COVID-19: immunity, inflammation and intervention. Nat. Rev. Immunol. 20, 363–374 (2020). https://doi.org/10.1038/s41577-020-0311-8
Bonidia, R.P., Sanches, D.S., de Carvalho, A.C.: Mathfeature: feature extraction package for biological sequences based on mathematical descriptors. bioRxiv (2020)
Gupta, S., Madhu, M.K., Sharma, A.K., Sharma, V.K.: ProInfam: a webserver for the prediction of proinflammatory antigenicity of peptides and proteins. J. Transl. Med. 14(1), 178 (2016)
Manavalan, B., Shin, T.H., Kim, M.O., Lee, G.: PIP-EL: a new ensemble learning method for improved proinflammatory peptide predictions. Front. Immunol. 9, 1783 (2018)
Khatun, M.S., Hasan, M.M., Shoombuatong, W., Kurata, H.: ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J. Comput.-Aided Molecular Des. 34(12), 1229–1236 (2020). https://doi.org/10.1007/s10822-020-00343-9
Bonidia, R.P.: Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief. Bioinform. 22(5), bbab011 (2021)
Cochran, W.T.: What is the fast Fourier transform? Proc. IEEE, 55(10), 1664–1674 (1967)
Machado, J.T., Costa, A.C., Quelhas, M.D.: Shannon, Rényie and Tsallis entropy analysis of DNA using phase plane. Nonlinear Anal. Real World Appl. 12(6), 3135–3144 (2011)
Costa, L.D.F., Rodrigues, F.A., Cristino, A.S.: Complex networks: the key to systems biology. Gene. Molecular Biol. 31(3), 591–601 (2008)
Bonidia, R.P., Sanches, D.S., de Carvalho, A.C.: Mathfeature: feature extraction package for biological sequences based on mathematical descriptors. bioRxiv (2020)
Gupta, S., Madhu, M.K., Sharma, A.K., Sharma, V.K.: ProInfam: a webserver for the prediction of proinflammatory antigenicity of peptides and proteins. J. Transl. Med. 14(1), 178 (2016)
Manavalan, B., Shin, T.H., Kim, M.O., Lee, G.: PIP-EL: a new ensemble learning method for improved proinflammatory peptide predictions. Front. Immunol. 9, 1783 (2018)
Khatun, M.S., Hasan, M.M., Shoombuatong, W., Kurata, H.: ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J. Comput.-Aided Molecular Des. 34(12), 1229–1236 (2020). https://doi.org/10.1007/s10822-020-00343-9
Bonidia, R.P.: Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief. Bioinform. 22(5), bbab011 (2021)
Cochran, W.T.: What is the fast Fourier transform? Proc. IEEE, 55(10), 1664–1674 (1967)
Machado, J.T., Costa, A.C., Quelhas, M.D.: Shannon, Rényie and Tsallis entropy analysis of DNA using phase plane. Nonlinear Anal. Real World Appl. 12(6), 3135–3144 (2011)
Costa, L.D.F., Rodrigues, F.A., Cristino, A.S.: Complex networks: the key to systems biology. Gene. Molecular Biol. 31(3), 591–601 (2008)
Publicado
22/11/2021
Como Citar
CAVALCANTE, João Pedro Uchôa; GONÇALVES, Anderson Cardoso; BONIDIA, Robson Parmezan; SANCHES, Danilo Sipoli; DE CARVALHO, André Carlos Ponce de Leon Ferreira.
MathPIP: Classification of Proinflammatory Peptides Using Mathematical Descriptors. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 14. , 2021, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 131-136.
ISSN 2316-1248.