circTIS: A Weighted Degree String Kernel with Support Vector Machine Tool for Translation Initiation Sites Prediction in circRNA


Recent studies discovered that peptides generated from the translation of circRNAs participate in several biological processes, many related to human diseases. Researchers have observed that initiation of translation in circRNAs frequently occurs from non-AUG start codons. However, most existing computational tools for translation initiation site (TIS) prediction consider only the canonical AUG start codon. Thus, we developed a new methodology for predicting TIS AUG and near-cognates, considering the circularization of ORFs occurring in circRNAs. Initially, we used the weighted degree string kernel to create a data representation of the circRNA sequence fragments around possible TIS. Next, we applied a support vector machine to calculate a score representing the potential of the sequence fragment to contain an actual TIS. We used datasets from annotated TIS on circRNAs sequences to train and test our methodology. The first experiment showed that the sequence fragment length is the best value for the kernel’s degree hyperparameter. Next, we investigated the most suitable sequence fragment length. Finally, we compared our methodology with three tools, TITER, TIS Predictor, and TIS Transformer. For TIS AUG prediction, circTIS obtained an AUROC of 98.64%, while TITER, TIS Predictor, and TIS Transformer obtained 78.97%, 78.39%, and 81.3%, respectively. For the TIS near-cognate prediction, our method obtained an AUROC equal to 96.84%, while TITER, TIS Predictor, and TIS Transformer got 81.37%, 72.68%, and 66.33%, respectively. We implemented our methodology in the circTIS tool, freely available at

Palavras-chave: circRNA, Translation initiation site prediction, Weighted degree string kernel, Support vector machine


Abe, N., Matsumoto, K., Nishihara, M., Nakano, Y., Shibata, A., Maruyama, H., Shuto, S., Matsuda, A., Yoshida, M., Ito, Y., Abe, H.: Rolling Circle Translation of Circular RNA in Living Human Cells. Scientific Reports 5, 1–9 (2015).

Aufiero, S., Reckman, Y.J., Pinto, Y.M., Creemers, E.E.: Circular RNAs open a new chapter in cardiovascular biology. Nature Reviews Cardiology 16(8), 503–514 (2019).

Chen, C.y., Sarnow, P.: Initiation of Protein Synthesis by the Eukaryotic Translational Apparatus on Circular RNAs. Science 268(5209), 415–417 (apr 1995).

Clauwaert, J., McVey, Z., Gupta, R., Menschaert, G.: TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics 5(1), 1–8 (2023).

Fang, Y., Wang, X., Li, W., Han, J., Jin, J., Su, F., Zhang, J., Huang, W., Xiao, F., Pan, Q., Zou, L.: Screening of circular RNAs and validation of circANKRD36 associated with inflammation in patients with type 2 diabetes mellitus. International Journal of Molecular Medicine 42(4), 1865–1874 (2018).

Gleason, A.C., Ghadge, G., Chen, J., Sonobe, Y., Roos, R.P.: Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions. PLoS ONE 17(6 June), 1–30 (2022).

Hanan, M., Soreq, H., Kadener, S.: CircRNAs in the brain. RNA Biology 14(8), 1028–1034 (2017).

Huang, W., Ling, Y., Zhang, S., Xia, Q., Cao, R., Fan, X., Fang, Z., Wang, Z., Zhang, G.: TransCirc: An interactive database for translatable circular RNAs based on multi-omics evidence. Nucleic Acids Research 49(D1), D236–D242 (2021).

Jeck, W.R., Sorrentino, J.A., Wang, K., Slevin, M.K., Burd, C.E., Liu, J., Marzluff, W.F., Sharpless, N.E.: Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19(2), 141–157 (2013).

Kristensen, L.S., Andersen, M.S., Stagsted, L.V., Ebbesen, K.K., Hansen, T.B., Kjems, J.: The biogenesis, biology and characterization of circular RNAs. Nature Reviews Genetics 20(11), 675–691 (2019).

Li, H., Li, K., Lai, W., Li, X., Wang, H., Yang, J., Chu, S., Wang, H., Kang, C., Qiu, Y.: Comprehensive circular RNA profiles in plasma reveals that circular RNAs can be used as novel biomarkers for systemic lupus erythematosus. Clinica Chimica Acta 480(January), 17–25 (2018).

Memczak, S., Jens, M., Elefsinioti, A., Torti, F., Krueger, J., Rybak, A., Maier, L., Mackowiak, S.D., Gregersen, L.H., Munschauer, M., Loewer, A., Ziebold, U., Landthaler, M., Kocks, C., Le Noble, F., Rajewsky, N.: Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441), 333–338 (2013).

Patop, I.L., Wüst, S., Kadener, S.: Past, present, and future of circRNAs. The EMBO Journal 38(16), 1–13 (2019).

Qi, R., Guo, F., Zou, Q.: String kernels construction and fusion: a survey with bioinformatics application. Frontiers of Computer Science 16(6) (2022).

Ratsch, G., Sonnenburg, S.: Accurate Splice Site Detection for Caenorhabditis elegans. In: Kernel Methods in Computational Biology. The MIT Press (2004).

Reuter, K., Biehl, A., Koch, L., Helms, V.: PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse. PLoS Computational Biology 12(10), 1–22 (2016).

Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge, Massachusetts (2018).

Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (jun 2004).,

Shi, Y., Jia, X., Xu, J.: The new function of circRNA: translation. Clinical and Translational Oncology 22(12), 2162–2169 (2020).

Sinha, T., Panigrahi, C., Das, D., Chandra Panda, A.: Circular RNA translation, a path to hidden proteome. Wiley Interdisciplinary Reviews: RNA 13(1), 1–15 (2021).

Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., De Bona, F., Binder, A., Gehl, C., Franc, V.: The Shogun machine learning toolbox. Journal of Machine Learning Research 11(June), 1799–1802 (2010)

Vo, J.N., Cieslik, M., Zhang, Y., Shukla, S., Xiao, L., Zhang, Y., Wu, Y.M., Dhanasekaran, S.M., Engelke, C.G., Cao, X., Robinson, D.R., Nesvizhskii, A.I., Chinnaiyan, A.M.: The Landscape of Circular RNA in Cancer. Cell 176(4), 869–881.e13 (feb 2019).

Vromman, M., Vandesompele, J., Volders, P.J.: Closing the circle: Current state and perspectives of circular RNA databases. Briefings in Bioinformatics 22(1), 288–297 (2021).

Wan, J., Qian, S.B.: TISdb: A database for alternative translation initiation in mammalian cells. Nucleic Acids Research 42(D1), 845–850 (2014).

Zhang, S., Hu, H., Jiang, T., Zhang, L., Zeng, J.: TITER: Predicting translation initiation sites by deep learning. Bioinformatics 33(14), i234–i242 (2017).
BARBOSA, Denilson Fagundes; OLIVEIRA, Liliane Santana; KASHIWABARA, André Yoshiaki. circTIS: A Weighted Degree String Kernel with Support Vector Machine Tool for Translation Initiation Sites Prediction in circRNA. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 16. , 2023, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 14-24. ISSN 2316-1248.