A multi-level approach using deep learning and transfer learning for classifying non-coding RNAs
Resumo
Neste artigo, apresentamos uma nova abordagem para classificar RNAs não-codificadores (ncRNAs), combinando deep learning (DL) com transfer learning (TL) em uma abordagem multinível. No pré-treinamento, DL foi usado com dados de sete classes de ncRNAs, CD-box, HACA-box, scaRNA, miRNA, tRNA, 5S rRNA e 5.8S rRNA. No estudo de caso, TL foi usado para classificar riboswitches. Os dados de treinamento e teste foram cuidadosamente escolhidos, buscando sequências em árvores de espécies para maximizar a diversidade taxonômica. Esta abordagem foi comparada com outros métodos da literatura e os nossos resultados foram melhores para conjuntos de dados pequenos. Além disso, pode ser aplicado a outras classes de ncRNAs.
Referências
Asim, M. N. et al. (2021). Advances in computational methodologies for classification and sub-cellular locality prediction of non-coding rnas. International Journal of Molecular Sciences, 22:1–43.
Asim, M. N. et al. (2020). A robust and precise convnet for small non-coding RNA classification (RPC-snRC). IEEE Access, 9:19379–19390.
Bansal, S. et al. (2024). Exploration of deep learning and transfer learning techniques in bioinformatics. In Applying Machine Learning Techniques to Bioinformatics: Few-Shot and Zero-Shot Methods, pages 238–257. IGI Global.
Beyene, S. S. et al. (2020). A novel riboswitch classification based on imbalanced sequences achieved by machine learning. PLoS computational biology, 16(7):e1007760.
Breaker, R. R. (2011). Prospects for riboswitch discovery and analysis. Molecular Cell, 43(6):867—-879.
Chantsalnyam, T. et al. (2020). ncRDeep: non-coding RNA classification with convolutional neural network. Computational Biology and Chemistry, 88:107364.
Chantsalnyam, T. et al. (2021). ncRDense: a novel computational approach for classification of non-coding RNA family by deep learning. Genomics, 113(5):3030–3038.
Chen, K. et al. (2023). ncDENSE: a novel computational method based on a deep learning framework for non-coding RNAs family prediction. BMC Bioinformatics, 24(1):68.
Federhen, S. (2012). The NCBI taxonomy database. Nucleic acids research, 40(D1):D136–D143.
Fiannaca, A. et al. (2017). nRC: non-coding RNA classifier based on structural features. BioData Mining, 10(1):1–18.
Geer, L. Y. et al. (2009). The ncbi biosystems database. Nucleic Acids Research, 38(suppl 1):D492–D496.
Kavita, K. and Breaker, R. R. (2023). Discovering riboswitches: the past and the future. Trends in Biochemical Sciences, 48(2):119–141.
Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
LeCun, Y. et al. (2015). Deep learning. nature, 521(7553):436.
Leinster, T. and Meckes, M. W. (2016). Maximizing diversity in biology and beyond. Entropy, 18(3):88.
Liu, J. et al. (2006). Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS genetics, 2(4):e29.
Lorenz, R. et al. (2011). ViennaRNA package 2.0. Algorithms for Molecular Biology, 6:1–14.
McCown, P. J. et al. (2017). Riboswitch diversity and distribution. RNA, 23(7):995–1011.
Nawrocki, E. P. et al. (2009). Infernal 1.0: inference of RNA alignments. Bioinformatics, 25(10):1335–1337.
Olenginski, L. T. et al. (2024). Flipping the script: Understanding riboswitches from an alternative perspective. Journal of Biological Chemistry, 300(3):105730.
Oliveira, J., Costa, F., and Backofen, R. e. a. (2016). SnoReport 2.0: new features and a refined support vector machine to improve snoRNA identification. BMC Bioinformatics, 17(18):73–86.
Ontiveros-Palacios, N. et al. (2024). Rfam 15: RNA families database in 2025. Nucleic Acids Research, 53(D1):D258–D267.
Pardi, F. and Goldman, N. (2005). Species choice for comparative genomics: being greedy works. PLoS Genetics, 1(6):e71.
Premkumar, K. A. R. et al. (2020). Riboflow: Using deep learning to classify riboswitches with 99 accuracy. Frontiers in Bioengineering and Biotechnology, 8:808.
Rossi, E. et al. (2019). ncRNA classification with graph convolutional networks. arXiv preprint arXiv:1905.06515.
Sakamoto, T. et al. (2021). Taxallnomy: an extension of ncbi taxonomy that produces a hierarchically complete taxonomic tree. BMC bioinformatics, 22:1–23.
Singh, J. et al. (2021). Improved rna secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics, 37(17):2589–2600.
Stagno, J. R. and Wang, Y.-X. (2024). Riboswitch mechanisms for regulation of p1 helix stability. International Journal of Molecular Sciences, 25(19):10682.
Torrey, L. and Shavlik, J. (2010). Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pages 242–264. IGI global.
Wang, L. et al. (2020). ncRFP: a novel end-to-end method for non-coding RNAs family prediction based on deep learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(2):784–789.
Wang, L. et al. (2021). ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet. BMC Bioinformatics, 22:1–14.
Weiss, K. et al. (2016). A survey of transfer learning. Journal of Big data, 3(1):1–40.
Zhan, Z. et al. (2022). Evolutionary deep learning: a survey. Neurocomputing, 483:42–58.
Zhang, X. et al. (2022). Pinc: a tool for non-coding RNA identification in plants based on an automated machine learning framework. International Journal of Molecular Sciences, 23(19):11825.
