Comparison of computational fusion detection methods for short-read RNA-seq data
Resumo
Fusões gênicas são eventos genéticos anormais frequentemente correlacionados com a oncogênese. Por isso, detectá-los a partir de dados de RNA-seq usando métodos de bioinformática é uma tarefa importante na pesquisa do câncer. Várias ferramentas foram desenvolvidas para esta tarefa, mas os benchmarks atuais são inconclusivos quanto à precisão das mesmas e são difíceis de reproduzir com novos dados. Neste artigo, propomos um pipeline computacional que reúne ferramentas de detecção de fusão e as compara usando métricas padrão de classificação. Este também pode ser usado como um método agregado para detectar fusões gênicas usando diversas ferramentas. Esse pipeline foi aplicado a dados simulados e reais, e complementa os benchmarks atuais da literatura para auxiliar os usuários na escolha das ferramentas para suas análises.Referências
Apostolides, M., Jiang, Y., Husić, M., Siddaway, R., Hawkins, C., Turinsky, A. L., Brudno, M., and Ramani, A. K. (2021). MetaFusion: a high-confidence metacaller for filtering and prioritizing RNA-seq gene fusion candidates. Bioinformatics, 37(19):3144–3151.
Bray, N. L., Pimentel, H., Melsted, P., and Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525–527. Number: 5 Publisher: Nature Publishing Group.
Carrara, M., Beccuti, M., Lazzarato, F., Cavallo, F., Cordero, F., Donatelli, S., and Calogero, R. A. (2013). State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity. BioMed Research International, 2013:e340620. Publisher: Hindawi.
Creason, A., Haan, D., Dang, K., Chiotti, K. E., Inkman, M., Lamb, A., Yu, T., Hu, Y., Norman, T. C., Buchanan, A., van Baren, M. J., Spangler, R., Rollins, M. R., Spellman, P. T., Rozanov, D., Zhang, J., Maher, C. A., Caloian, C., Watson, J. D., Uhrig, S., Haas, B. J., Jain, M., Akeson, M., Ahsen, M. E., Zhang, H., Wang, Y., Guan, Y., Nguyen, C., Sugai, C., Jha, A., Li, J. W., Dobin, A., Stolovitzky, G., Guinney, J., Boutros, P. C., Stuart, J. M., and Ellrott, K. (2021). A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Systems, 12(8):827–838.e5.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21.
Frankish, A., Carbonell-Sala, S., Diekhans, M., Jungreis, I., Loveland, J. E., Mudge, J. M., Sisu, C., Wright, J. C., Arnan, C., Barnes, I., Banerjee, A., Bennett, R., Berry, A., Bignell, A., Boix, C., Calvet, F., Cerdán-Vélez, D., Cunningham, F., Davidson, C., Donaldson, S., Dursun, C., Fatima, R., Giorgetti, S., Giron, C. G., Gonzalez, J. M., Hardy, M., Harrison, P. W., Hourlier, T., Hollis, Z., Hunt, T., James, B., Jiang, Y., Johnson, R., Kay, M., Lagarde, J., Martin, F. J., Gómez, L. M., Nair, S., Ni, P., Pozo, F., Ramalingam, V., Ruffier, M., Schmitt, B. M., Schreiber, J. M., Steed, E., Suner, M.-M., Sumathipala, D., Sycheva, I., Uszczynska-Ratajczak, B., Wass, E., Yang, Y. T., Yates, A., Zafrulla, Z., Choudhary, J. S., Gerstein, M., Guigo, R., Hubbard, T. J. P., Kellis, M., Kundaje, A., Paten, B., Tress, M. L., and Flicek, P. (2022). GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Research, 51(D1):D942–D949.
Haas, B. J., Dobin, A., Ghandi, M., Van Arsdale, A., Tickle, T., Robinson, J. T., Gillani, R., Kasif, S., and Regev, A. (2023). Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. Cell Reports Methods, 3(5):100467.
Haas, B. J., Dobin, A., Li, B., Stransky, N., Pochet, N., and Regev, A. (2019). Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biology, 20(1):213.
Huang, X. and Madan, A. (1999). CAP3: A DNA Sequence Assembly Program. Genome Research, 9(9):868–877. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
Kent, W. J. (2002). BLAT—The BLAST-Like Alignment Tool. Genome Research, 12(4):656–664. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
Kumar, S., Vo, A. D., Qin, F., and Li, H. (2016). Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Scientific Reports, 6(1):21597.
LaHaye, S., Fitch, J. R., Voytovich, K. J., Herman, A. C., Kelly, B. J., Lammi, G. E., Arbesfeld, J. A., Wijeratne, S., Franklin, S. J., Schieffer, K. M., Bir, N., McGrath, S. D., Miller, A. R., Wetzel, A., Miller, K. E., Bedrosian, T. A., Leraas, K., Varga, E. A., Lee, K., Gupta, A., Setty, B., Boué, D. R., Leonard, J. R., Finlay, J. L., Abdelbaki, M. S., Osorio, D. S., Koo, S. C., Koboldt, D. C., Wagner, A. H., Eisfeld, A.-K., Mrózek, K., Magrini, V., Cottrell, C. E., Mardis, E. R., Wilson, R. K., and White, P. (2021). Discovery of clinically relevant fusions in pediatric cancer. BMC Genomics, 22(1):872.
Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359. Publisher: Nature Publishing Group.
Li, Y., Heavican, T. B., Vellichirammal, N. N., Iqbal, J., and Guda, C. (2017). ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data. Nucleic Acids Research, 45(13):e120.
Liu, S., Tsai, W.-H., Ding, Y., Chen, R., Fang, Z., Huo, Z., Kim, S., Ma, T., Chang, T.-Y., Priedigkeit, N. M., Lee, A. V., Luo, J., Wang, H.-W., Chung, I.-F., and Tseng, G. C. (2016). Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Research, 44(5):e47.
Melsted, P., Hateley, S., Joseph, I. C., Pimentel, H., Bray, N., and Pachter, L. (2017). Fusion detection and quantification by pseudoalignment.
Migita, N. A., Jotta, P. Y., Nascimento, N. P. d., Vasconcelos, V. S., Centoducatte, G. L., Massirer, K. B., Azevedo, A. C. d., Brandalise, S. R., and Yunes, J. A. (2023). Classification and genetics of pediatric B-other acute lymphoblastic leukemia by targeted RNA sequencing. Blood Advances, 7(13):2957–2971.
Nicorici, D., Şatalan, M., Edgren, H., Kangaspeska, S., Murumägi, A., Kallioniemi, O., Virtanen, S., and Kilkku, O. (2014). FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Pages: 011650 Section: New Results.
Singh, S. and Li, H. (2021). Comparative study of bioinformatic tools for the identification of chimeric RNAs from RNA Sequencing. RNA Biology, 18(sup1):254–267.
Srivastava, A., Sarkar, H., Gupta, N., and Patro, R. (2016). RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics, 32(12):i192–i200.
Tian, L., Li, Y., Edmonson, M. N., Zhou, X., Newman, S., McLeod, C., Thrasher, A., Liu, Y., Tang, B., Rusch, M. C., Easton, J., Ma, J., Davis, E., Trull, A., Michael, J. R., Szlachta, K., Mullighan, C., Baker, S. J., Downing, J. R., Ellison, D. W., and Zhang, J. (2020). CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biology, 21(1):126.
Uhrig, S., Ellermann, J., Walther, T., Burkhardt, P., Fröhlich, M., Hutter, B., Toprak, U. H., Neumann, O., Stenzinger, A., Scholl, C., Fröhling, S., and Brors, B. (2021). Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Research, 31(3):448–460.
Vicente-Garcés, C., Maynou, J., Fernández, G., Esperanza-Cebollada, E., Torrebadell, M., Català, A., Rives, S., Camós, M., and Vega-García, N. (2023). Fusion InPipe, an integrative pipeline for gene fusion detection from RNA-seq data in acute pediatric leukemia. Frontiers in Molecular Biosciences, 10.
Vu, T. N., Deng, W., Trac, Q. T., Calza, S., Hwang, W., and Pawitan, Y. (2018). A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics, 19(1):786.
Bray, N. L., Pimentel, H., Melsted, P., and Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525–527. Number: 5 Publisher: Nature Publishing Group.
Carrara, M., Beccuti, M., Lazzarato, F., Cavallo, F., Cordero, F., Donatelli, S., and Calogero, R. A. (2013). State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity. BioMed Research International, 2013:e340620. Publisher: Hindawi.
Creason, A., Haan, D., Dang, K., Chiotti, K. E., Inkman, M., Lamb, A., Yu, T., Hu, Y., Norman, T. C., Buchanan, A., van Baren, M. J., Spangler, R., Rollins, M. R., Spellman, P. T., Rozanov, D., Zhang, J., Maher, C. A., Caloian, C., Watson, J. D., Uhrig, S., Haas, B. J., Jain, M., Akeson, M., Ahsen, M. E., Zhang, H., Wang, Y., Guan, Y., Nguyen, C., Sugai, C., Jha, A., Li, J. W., Dobin, A., Stolovitzky, G., Guinney, J., Boutros, P. C., Stuart, J. M., and Ellrott, K. (2021). A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Systems, 12(8):827–838.e5.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21.
Frankish, A., Carbonell-Sala, S., Diekhans, M., Jungreis, I., Loveland, J. E., Mudge, J. M., Sisu, C., Wright, J. C., Arnan, C., Barnes, I., Banerjee, A., Bennett, R., Berry, A., Bignell, A., Boix, C., Calvet, F., Cerdán-Vélez, D., Cunningham, F., Davidson, C., Donaldson, S., Dursun, C., Fatima, R., Giorgetti, S., Giron, C. G., Gonzalez, J. M., Hardy, M., Harrison, P. W., Hourlier, T., Hollis, Z., Hunt, T., James, B., Jiang, Y., Johnson, R., Kay, M., Lagarde, J., Martin, F. J., Gómez, L. M., Nair, S., Ni, P., Pozo, F., Ramalingam, V., Ruffier, M., Schmitt, B. M., Schreiber, J. M., Steed, E., Suner, M.-M., Sumathipala, D., Sycheva, I., Uszczynska-Ratajczak, B., Wass, E., Yang, Y. T., Yates, A., Zafrulla, Z., Choudhary, J. S., Gerstein, M., Guigo, R., Hubbard, T. J. P., Kellis, M., Kundaje, A., Paten, B., Tress, M. L., and Flicek, P. (2022). GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Research, 51(D1):D942–D949.
Haas, B. J., Dobin, A., Ghandi, M., Van Arsdale, A., Tickle, T., Robinson, J. T., Gillani, R., Kasif, S., and Regev, A. (2023). Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector. Cell Reports Methods, 3(5):100467.
Haas, B. J., Dobin, A., Li, B., Stransky, N., Pochet, N., and Regev, A. (2019). Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biology, 20(1):213.
Huang, X. and Madan, A. (1999). CAP3: A DNA Sequence Assembly Program. Genome Research, 9(9):868–877. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
Kent, W. J. (2002). BLAT—The BLAST-Like Alignment Tool. Genome Research, 12(4):656–664. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
Kumar, S., Vo, A. D., Qin, F., and Li, H. (2016). Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Scientific Reports, 6(1):21597.
LaHaye, S., Fitch, J. R., Voytovich, K. J., Herman, A. C., Kelly, B. J., Lammi, G. E., Arbesfeld, J. A., Wijeratne, S., Franklin, S. J., Schieffer, K. M., Bir, N., McGrath, S. D., Miller, A. R., Wetzel, A., Miller, K. E., Bedrosian, T. A., Leraas, K., Varga, E. A., Lee, K., Gupta, A., Setty, B., Boué, D. R., Leonard, J. R., Finlay, J. L., Abdelbaki, M. S., Osorio, D. S., Koo, S. C., Koboldt, D. C., Wagner, A. H., Eisfeld, A.-K., Mrózek, K., Magrini, V., Cottrell, C. E., Mardis, E. R., Wilson, R. K., and White, P. (2021). Discovery of clinically relevant fusions in pediatric cancer. BMC Genomics, 22(1):872.
Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359. Publisher: Nature Publishing Group.
Li, Y., Heavican, T. B., Vellichirammal, N. N., Iqbal, J., and Guda, C. (2017). ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data. Nucleic Acids Research, 45(13):e120.
Liu, S., Tsai, W.-H., Ding, Y., Chen, R., Fang, Z., Huo, Z., Kim, S., Ma, T., Chang, T.-Y., Priedigkeit, N. M., Lee, A. V., Luo, J., Wang, H.-W., Chung, I.-F., and Tseng, G. C. (2016). Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Research, 44(5):e47.
Melsted, P., Hateley, S., Joseph, I. C., Pimentel, H., Bray, N., and Pachter, L. (2017). Fusion detection and quantification by pseudoalignment.
Migita, N. A., Jotta, P. Y., Nascimento, N. P. d., Vasconcelos, V. S., Centoducatte, G. L., Massirer, K. B., Azevedo, A. C. d., Brandalise, S. R., and Yunes, J. A. (2023). Classification and genetics of pediatric B-other acute lymphoblastic leukemia by targeted RNA sequencing. Blood Advances, 7(13):2957–2971.
Nicorici, D., Şatalan, M., Edgren, H., Kangaspeska, S., Murumägi, A., Kallioniemi, O., Virtanen, S., and Kilkku, O. (2014). FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Pages: 011650 Section: New Results.
Singh, S. and Li, H. (2021). Comparative study of bioinformatic tools for the identification of chimeric RNAs from RNA Sequencing. RNA Biology, 18(sup1):254–267.
Srivastava, A., Sarkar, H., Gupta, N., and Patro, R. (2016). RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics, 32(12):i192–i200.
Tian, L., Li, Y., Edmonson, M. N., Zhou, X., Newman, S., McLeod, C., Thrasher, A., Liu, Y., Tang, B., Rusch, M. C., Easton, J., Ma, J., Davis, E., Trull, A., Michael, J. R., Szlachta, K., Mullighan, C., Baker, S. J., Downing, J. R., Ellison, D. W., and Zhang, J. (2020). CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biology, 21(1):126.
Uhrig, S., Ellermann, J., Walther, T., Burkhardt, P., Fröhlich, M., Hutter, B., Toprak, U. H., Neumann, O., Stenzinger, A., Scholl, C., Fröhling, S., and Brors, B. (2021). Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Research, 31(3):448–460.
Vicente-Garcés, C., Maynou, J., Fernández, G., Esperanza-Cebollada, E., Torrebadell, M., Català, A., Rives, S., Camós, M., and Vega-García, N. (2023). Fusion InPipe, an integrative pipeline for gene fusion detection from RNA-seq data in acute pediatric leukemia. Frontiers in Molecular Biosciences, 10.
Vu, T. N., Deng, W., Trac, Q. T., Calza, S., Hwang, W., and Pawitan, Y. (2018). A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics, 19(1):786.
Publicado
02/12/2024
Como Citar
OLIVEIRA, Lucas P.; RIGATTO, Victor; MIGITA, Natacha A.; YUNES, José A.; MEIDANIS, João.
Comparison of computational fusion detection methods for short-read RNA-seq data. In: SIMPÓSIO BRASILEIRO DE BIOINFORMÁTICA (BSB), 17. , 2024, Vitória/ES.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 24-35.
ISSN 2316-1248.
DOI: https://doi.org/10.5753/bsb.2024.245179.