circTIS: A Weighted Degree String Kernel with Support Vector Machine Tool for Translation Initiation Sites Prediction in circRNA

Resumo


Recent studies discovered that peptides generated from the translation of circRNAs participate in several biological processes, many related to human diseases. Researchers have observed that initiation of translation in circRNAs frequently occurs from non-AUG start codons. However, most existing computational tools for translation initiation site (TIS) prediction consider only the canonical AUG start codon. Thus, we developed a new methodology for predicting TIS AUG and near-cognates, considering the circularization of ORFs occurring in circRNAs. Initially, we used the weighted degree string kernel to create a data representation of the circRNA sequence fragments around possible TIS. Next, we applied a support vector machine to calculate a score representing the potential of the sequence fragment to contain an actual TIS. We used datasets from annotated TIS on circRNAs sequences to train and test our methodology. The first experiment showed that the sequence fragment length is the best value for the kernel’s degree hyperparameter. Next, we investigated the most suitable sequence fragment length. Finally, we compared our methodology with three tools, TITER, TIS Predictor, and TIS Transformer. For TIS AUG prediction, circTIS obtained an AUROC of 98.64%, while TITER, TIS Predictor, and TIS Transformer obtained 78.97%, 78.39%, and 81.3%, respectively. For the TIS near-cognate prediction, our method obtained an AUROC equal to 96.84%, while TITER, TIS Predictor, and TIS Transformer got 81.37%, 72.68%, and 66.33%, respectively. We implemented our methodology in the circTIS tool, freely available at https://github.com/denilsonfbar/circTIS.

Palavras-chave: circRNA, Translation initiation site prediction, Weighted degree string kernel, Support vector machine

Referências

Abe, N., Matsumoto, K., Nishihara, M., Nakano, Y., Shibata, A., Maruyama, H., Shuto, S., Matsuda, A., Yoshida, M., Ito, Y., Abe, H.: Rolling Circle Translation of Circular RNA in Living Human Cells. Scientific Reports 5, 1–9 (2015). https://doi.org/10.1038/srep16435

Aufiero, S., Reckman, Y.J., Pinto, Y.M., Creemers, E.E.: Circular RNAs open a new chapter in cardiovascular biology. Nature Reviews Cardiology 16(8), 503–514 (2019). http://dx.doi.org/10.1038/s41569-019-0185-2

Chen, C.y., Sarnow, P.: Initiation of Protein Synthesis by the Eukaryotic Translational Apparatus on Circular RNAs. Science 268(5209), 415–417 (apr 1995). https://doi.org/10.1126/science.7536344

Clauwaert, J., McVey, Z., Gupta, R., Menschaert, G.: TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics 5(1), 1–8 (2023). https://doi.org/10.1093/nargab/lqad021

Fang, Y., Wang, X., Li, W., Han, J., Jin, J., Su, F., Zhang, J., Huang, W., Xiao, F., Pan, Q., Zou, L.: Screening of circular RNAs and validation of circANKRD36 associated with inflammation in patients with type 2 diabetes mellitus. International Journal of Molecular Medicine 42(4), 1865–1874 (2018). https://doi.org/10.3892/ijmm.2018.3783

Gleason, A.C., Ghadge, G., Chen, J., Sonobe, Y., Roos, R.P.: Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions. PLoS ONE 17(6 June), 1–30 (2022). https://doi.org/10.1371/journal.pone.0256411

Hanan, M., Soreq, H., Kadener, S.: CircRNAs in the brain. RNA Biology 14(8), 1028–1034 (2017). https://doi.org/10.1080/15476286.2016.1255398

Huang, W., Ling, Y., Zhang, S., Xia, Q., Cao, R., Fan, X., Fang, Z., Wang, Z., Zhang, G.: TransCirc: An interactive database for translatable circular RNAs based on multi-omics evidence. Nucleic Acids Research 49(D1), D236–D242 (2021). https://doi.org/10.1093/nar/gkaa823

Jeck, W.R., Sorrentino, J.A., Wang, K., Slevin, M.K., Burd, C.E., Liu, J., Marzluff, W.F., Sharpless, N.E.: Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19(2), 141–157 (2013). https://doi.org/10.1261/rna.035667.112

Kristensen, L.S., Andersen, M.S., Stagsted, L.V., Ebbesen, K.K., Hansen, T.B., Kjems, J.: The biogenesis, biology and characterization of circular RNAs. Nature Reviews Genetics 20(11), 675–691 (2019). https://doi.org/10.1038/s41576019-0158-7

Li, H., Li, K., Lai, W., Li, X., Wang, H., Yang, J., Chu, S., Wang, H., Kang, C., Qiu, Y.: Comprehensive circular RNA profiles in plasma reveals that circular RNAs can be used as novel biomarkers for systemic lupus erythematosus. Clinica Chimica Acta 480(January), 17–25 (2018). https://doi.org/10.1016/j.cca.2018.01.026

Memczak, S., Jens, M., Elefsinioti, A., Torti, F., Krueger, J., Rybak, A., Maier, L., Mackowiak, S.D., Gregersen, L.H., Munschauer, M., Loewer, A., Ziebold, U., Landthaler, M., Kocks, C., Le Noble, F., Rajewsky, N.: Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441), 333–338 (2013). https://doi.org/10.1038/nature11928

Patop, I.L., Wüst, S., Kadener, S.: Past, present, and future of circRNAs. The EMBO Journal 38(16), 1–13 (2019). https://doi.org/10.15252/embj.2018100836

Qi, R., Guo, F., Zou, Q.: String kernels construction and fusion: a survey with bioinformatics application. Frontiers of Computer Science 16(6) (2022). https://doi.org/10.1007/s11704-021-1118-x

Ratsch, G., Sonnenburg, S.: Accurate Splice Site Detection for Caenorhabditis elegans. In: Kernel Methods in Computational Biology. The MIT Press (2004). https://doi.org/10.7551/mitpress/4057.003.0018

Reuter, K., Biehl, A., Koch, L., Helms, V.: PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse. PLoS Computational Biology 12(10), 1–22 (2016). https://doi.org/10.1371/journal.pcbi.1005170

Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge, Massachusetts (2018). https://doi.org/10.7551/mitpress/4175.001.0001

Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (jun 2004). https://doi.org/10.1017/CBO9780511809682,

Shi, Y., Jia, X., Xu, J.: The new function of circRNA: translation. Clinical and Translational Oncology 22(12), 2162–2169 (2020). https://doi.org/10.1007/s12094-020-02371-1

Sinha, T., Panigrahi, C., Das, D., Chandra Panda, A.: Circular RNA translation, a path to hidden proteome. Wiley Interdisciplinary Reviews: RNA 13(1), 1–15 (2021). https://doi.org/10.1002/wrna.1685

Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., De Bona, F., Binder, A., Gehl, C., Franc, V.: The Shogun machine learning toolbox. Journal of Machine Learning Research 11(June), 1799–1802 (2010)

Vo, J.N., Cieslik, M., Zhang, Y., Shukla, S., Xiao, L., Zhang, Y., Wu, Y.M., Dhanasekaran, S.M., Engelke, C.G., Cao, X., Robinson, D.R., Nesvizhskii, A.I., Chinnaiyan, A.M.: The Landscape of Circular RNA in Cancer. Cell 176(4), 869–881.e13 (feb 2019). https://doi.org/10.1016/j.cell.2018.12.021

Vromman, M., Vandesompele, J., Volders, P.J.: Closing the circle: Current state and perspectives of circular RNA databases. Briefings in Bioinformatics 22(1), 288–297 (2021). https://doi.org/10.1093/bib/bbz175

Wan, J., Qian, S.B.: TISdb: A database for alternative translation initiation in mammalian cells. Nucleic Acids Research 42(D1), 845–850 (2014). https://doi.org/10.1093/nar/gkt1085

Zhang, S., Hu, H., Jiang, T., Zhang, L., Zeng, J.: TITER: Predicting translation initiation sites by deep learning. Bioinformatics 33(14), i234–i242 (2017). https://doi.org/10.1093/bioinformatics/btx247
Publicado
13/06/2023
BARBOSA, Denilson Fagundes; OLIVEIRA, Liliane Santana; KASHIWABARA, André Yoshiaki. circTIS: A Weighted Degree String Kernel with Support Vector Machine Tool for Translation Initiation Sites Prediction in circRNA. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 16. , 2023, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 14-24. ISSN 2316-1248.