Comparison of computational fusion detection methods for short-read RNA-seq data

  • Lucas P. Oliveira UNICAMP
  • Victor Rigatto UNICAMP
  • Natacha A. Migita Centro Infantil Boldrini
  • José A. Yunes Centro Infantil Boldrini
  • João Meidanis UNICAMP

Abstract


Gene fusions are abnormal genetic events often correlated with oncogenesis. Hence, detecting them from RNA-seq data using bioinformatics methods is an important task in cancer research. Several tools have been developed for this task, but current benchmarks are inconclusive regarding their accuracy and are difficult to reproduce with new data. In this paper, we propose a computational pipeline that gathers fusion detection tools and compares them using standard classification metrics. It can also be used as an ensemble method to detect gene fusions using several tools. This pipeline was applied to simulated and real data, and supplements current benchmarks in the literature towards aiding the users in choosing the tools for their analyses.

References

Apostolides, M., Jiang, Y., Husić, M., Siddaway, R., Hawkins, C., Turinsky, A. L., Brudno, M., and Ramani, A. K. (2021). MetaFusion: a high-confidence metacaller for filtering and prioritizing RNA-seq gene fusion candidates. Bioinformatics, 37(19):3144–3151.

Bray, N. L., Pimentel, H., Melsted, P., and Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525–527. Number: 5 Publisher: Nature Publishing Group.

Carrara, M., Beccuti, M., Lazzarato, F., Cavallo, F., Cordero, F., Donatelli, S., and Calogero, R. A. (2013). State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity. BioMed Research International, 2013:e340620. Publisher: Hindawi.

Creason, A., Haan, D., Dang, K., Chiotti, K. E., Inkman, M., Lamb, A., Yu, T., Hu, Y., Norman, T. C., Buchanan, A., van Baren, M. J., Spangler, R., Rollins, M. R., Spellman, P. T., Rozanov, D., Zhang, J., Maher, C. A., Caloian, C., Watson, J. D., Uhrig, S., Haas, B. J., Jain, M., Akeson, M., Ahsen, M. E., Zhang, H., Wang, Y., Guan, Y., Nguyen, C., Sugai, C., Jha, A., Li, J. W., Dobin, A., Stolovitzky, G., Guinney, J., Boutros, P. C., Stuart, J. M., and Ellrott, K. (2021). A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Systems, 12(8):827–838.e5.

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21.

Frankish, A., Carbonell-Sala, S., Diekhans, M., Jungreis, I., Loveland, J. E., Mudge, J. M., Sisu, C., Wright, J. C., Arnan, C., Barnes, I., Banerjee, A., Bennett, R., Berry, A., Bignell, A., Boix, C., Calvet, F., Cerdán-Vélez, D., Cunningham, F., Davidson, C., Donaldson, S., Dursun, C., Fatima, R., Giorgetti, S., Giron, C. G., Gonzalez, J. M., Hardy, M., Harrison, P. W., Hourlier, T., Hollis, Z., Hunt, T., James, B., Jiang, Y., Johnson, R., Kay, M., Lagarde, J., Martin, F. J., Gómez, L. M., Nair, S., Ni, P., Pozo, F., Ramalingam, V., Ruffier, M., Schmitt, B. M., Schreiber, J. M., Steed, E., Suner, M.-M., Sumathipala, D., Sycheva, I., Uszczynska-Ratajczak, B., Wass, E., Yang, Y. T., Yates, A., Zafrulla, Z., Choudhary, J. S., Gerstein, M., Guigo, R., Hubbard, T. J. P., Kellis, M., Kundaje, A., Paten, B., Tress, M. L., and Flicek, P. (2022). GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Research, 51(D1):D942–D949.

Haas, B. J., Dobin, A., Ghandi, M., Van Arsdale, A., Tickle, T., Robinson, J. T., Gillani, R., Kasif, S., and Regev, A. (2023). Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. Cell Reports Methods, 3(5):100467.

Haas, B. J., Dobin, A., Li, B., Stransky, N., Pochet, N., and Regev, A. (2019). Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biology, 20(1):213.

Huang, X. and Madan, A. (1999). CAP3: A DNA Sequence Assembly Program. Genome Research, 9(9):868–877. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.

Kent, W. J. (2002). BLAT—The BLAST-Like Alignment Tool. Genome Research, 12(4):656–664. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.

Kumar, S., Vo, A. D., Qin, F., and Li, H. (2016). Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Scientific Reports, 6(1):21597.

LaHaye, S., Fitch, J. R., Voytovich, K. J., Herman, A. C., Kelly, B. J., Lammi, G. E., Arbesfeld, J. A., Wijeratne, S., Franklin, S. J., Schieffer, K. M., Bir, N., McGrath, S. D., Miller, A. R., Wetzel, A., Miller, K. E., Bedrosian, T. A., Leraas, K., Varga, E. A., Lee, K., Gupta, A., Setty, B., Boué, D. R., Leonard, J. R., Finlay, J. L., Abdelbaki, M. S., Osorio, D. S., Koo, S. C., Koboldt, D. C., Wagner, A. H., Eisfeld, A.-K., Mrózek, K., Magrini, V., Cottrell, C. E., Mardis, E. R., Wilson, R. K., and White, P. (2021). Discovery of clinically relevant fusions in pediatric cancer. BMC Genomics, 22(1):872.

Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359. Publisher: Nature Publishing Group.

Li, Y., Heavican, T. B., Vellichirammal, N. N., Iqbal, J., and Guda, C. (2017). ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data. Nucleic Acids Research, 45(13):e120.

Liu, S., Tsai, W.-H., Ding, Y., Chen, R., Fang, Z., Huo, Z., Kim, S., Ma, T., Chang, T.-Y., Priedigkeit, N. M., Lee, A. V., Luo, J., Wang, H.-W., Chung, I.-F., and Tseng, G. C. (2016). Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Research, 44(5):e47.

Melsted, P., Hateley, S., Joseph, I. C., Pimentel, H., Bray, N., and Pachter, L. (2017). Fusion detection and quantification by pseudoalignment.

Migita, N. A., Jotta, P. Y., Nascimento, N. P. d., Vasconcelos, V. S., Centoducatte, G. L., Massirer, K. B., Azevedo, A. C. d., Brandalise, S. R., and Yunes, J. A. (2023). Classification and genetics of pediatric B-other acute lymphoblastic leukemia by targeted RNA sequencing. Blood Advances, 7(13):2957–2971.

Nicorici, D., Şatalan, M., Edgren, H., Kangaspeska, S., Murumägi, A., Kallioniemi, O., Virtanen, S., and Kilkku, O. (2014). FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Pages: 011650 Section: New Results.

Singh, S. and Li, H. (2021). Comparative study of bioinformatic tools for the identification of chimeric RNAs from RNA Sequencing. RNA Biology, 18(sup1):254–267.

Srivastava, A., Sarkar, H., Gupta, N., and Patro, R. (2016). RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics, 32(12):i192–i200.

Tian, L., Li, Y., Edmonson, M. N., Zhou, X., Newman, S., McLeod, C., Thrasher, A., Liu, Y., Tang, B., Rusch, M. C., Easton, J., Ma, J., Davis, E., Trull, A., Michael, J. R., Szlachta, K., Mullighan, C., Baker, S. J., Downing, J. R., Ellison, D. W., and Zhang, J. (2020). CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biology, 21(1):126.

Uhrig, S., Ellermann, J., Walther, T., Burkhardt, P., Fröhlich, M., Hutter, B., Toprak, U. H., Neumann, O., Stenzinger, A., Scholl, C., Fröhling, S., and Brors, B. (2021). Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Research, 31(3):448–460.

Vicente-Garcés, C., Maynou, J., Fernández, G., Esperanza-Cebollada, E., Torrebadell, M., Català, A., Rives, S., Camós, M., and Vega-García, N. (2023). Fusion InPipe, an integrative pipeline for gene fusion detection from RNA-seq data in acute pediatric leukemia. Frontiers in Molecular Biosciences, 10.

Vu, T. N., Deng, W., Trac, Q. T., Calza, S., Hwang, W., and Pawitan, Y. (2018). A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics, 19(1):786.
Published
2024-12-02
OLIVEIRA, Lucas P.; RIGATTO, Victor; MIGITA, Natacha A.; YUNES, José A.; MEIDANIS, João. Comparison of computational fusion detection methods for short-read RNA-seq data. In: BRAZILIAN SYMPOSIUM ON BIOINFORMATICS (BSB), 17. , 2024, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 24-35. ISSN 2316-1248. DOI: https://doi.org/10.5753/bsb.2024.245179.