RNA-seq Analysis as a Cloud Service: Toward AI-Driven Computational Resource Efficiency

Resumo


Analysis of RNA-seq data generated by high-throughput sequencing poses substantial computational challenges due to its scale and complexity, often exceeding the resources available in typical laboratory environments. In this work, we present a cloud-centric perspective on RNA-seq analysis by deploying a phytosanitary pipeline as a cloud service based on a conventional bioinformatics workflow and analyzing its computational characteristics. Building on this baseline, we investigate the potential of artificial intelligence to improve computational resource efficiency by introducing an attention-based neural network for early-stage read classification. Our results indicate that AI-based filtering can distinguish relevant reads and reduce the volume of data processed by downstream, resource-intensive stages. This suggests a means to reduce compute time and memory usage through selective data reduction, although full integration into the pipeline is left for future work. We discuss how the combination of cloud-native execution and AI-driven preprocessing can enable more resource-efficient and accessible RNA-seq analysis services.

Referências

Bohnsack, K. S., Kaden, M., Abel, J., and Villmann, T. (2023). Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1):119–135.

Cinaglia, P., Vázquez-Poletti, J. L., and Cannataro, M. (2023). Massive Parallel Alignment of RNA-seq Reads in Serverless Computing. Big Data and Cognitive Computing, 7(2):98.

Deshpande, D., Chhugani, K., Chang, Y., Karlsberg, A., Loeffler, C., Zhang, J., Muszyńska, A., Munteanu, V., Yang, H., Rotman, J., Tao, L., Balliu, B., Tseng, E., Eskin, E., Zhao, F., Mohammadi, P., Łabaj, P. P., and Mangul, S. (2023). RNA-seq data science: From raw data to effective interpretation. Frontiers in Genetics, 14:997383.

Hu, X., Hurtado-Gonzales, O. P., Adhikari, B. N., French-Monar, R. D., Malapi, M., Foster, J. A., and McFarland, C. D. (2023). PhytoPipe: a phytosanitary pipeline for plant pathogen detection and diagnosis using RNA-seq data. BMC Bioinformatics, 24(1):470.

Kica, P., Lichołai, S., Orzechowski, M., and Malawski, M. (2025). Accelerating Cloud-Based Transcriptomics: Performance Analysis and Optimization of the STAR Aligner Workflow. ICCS - International Conference on Computer Science, pages 257–265.

Lambert, C., Braxton, C., Charlebois, R. L., Deyati, A., Duncan, P., Neve, F. L., Malicki, H. D., Ribrioux, S., Rozelle, D. K., Michaels, B., Sun, W., Yang, Z., and Khan, A. S. (2018). Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection. Viruses, 10(10):528.

Li, Y., Wang, H., Nie, K., Zhang, C., Zhang, Y., Wang, J., Niu, P., and Ma, X. (2016). VIP: an integrated pipeline for metagenomics of virus identification and discovery. Scientific Reports, 6(1):23774.

Maree, H. J., Fox, A., Rwahnih, M. A., Boonham, N., and Candresse, T. (2018). Application of HTS for Routine Plant Virus Diagnostics: State of the Art and Challenges. Frontiers in Plant Science, 9:1082.

Mateos, P. A., Balboa, R. F., Easteal, S., Eyras, E., and Patel, H. R. (2021). PACIFIC: a lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses. Scientific Reports, 11(1):3209.

Menzel, P., Ng, K. L., and Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7(1):11257.

Mock, F., Kretschmer, F., Kriese, A., Böcker, S., and Marz, M. (2022). Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks. Proceedings of the National Academy of Sciences, 119(35):e2122636119.

Ren, J., Song, K., Deng, C., Ahlgren, N. A., Fuhrman, J. A., Li, Y., Xie, X., Poplin, R., and Sun, F. (2020). Identifying viruses from metagenomic data using deep learning. Quantitative Biology, 8(1):64–77.

Silva, E., Margaria, P., Blawid, R., Oliveira, E. J., Winter, S., and Blawid, S. (2025). Hardware-Aware RNA-seq Diagnostics: Plant Virus Detection via Cloud and AI, PREPRINT (Version 1). available at Research Square.

Sukhorukov, G., Khalili, M., Gascuel, O., Candresse, T., Marais-Colombel, A., and Nikolski, M. (2022). VirHunter: A Deep Learning-Based Method for Detection of Novel RNA Viruses in Plant Sequencing Data. Frontiers in Bioinformatics, 2:867111.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv.

Vazquez-Iglesias, I., Santala, J., Vossenberg, B., Gaafar, Y., and Massart, S. (2022). Considerations for the use of high throughput sequencing in plant health diagnostics. EPPO Bulletin, 52(3):619–642.

Villamor, D. E. V., Ho, T., Rwahnih, M. A., Martin, R. R., and Tzanetakis, I. E. (2019). High Throughput Sequencing For Plant Virus Detection and Discovery. Phytopathology, 109(5):716–725.

Wichmann, A., Buschong, E., Müller, A., Jünger, D., Hildebrandt, A., Hankeln, T., and Schmidt, B. (2023). MetaTransformer: deep metagenomic sequencing read classification using self-attention models. NAR Genomics and Bioinformatics, 5(3):1–16.

Wood, D. E., Lu, J., and Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1):257.

Wood, D. E. and Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15(3):R46.

Wright, R. J., Comeau, A. M., and Langille, M. G. I. (2023). From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microbial Genomics, 9(3):000949.
Publicado
19/07/2026
SILVA, Elisson; BLAWID, Rosana; BLAWID, Stefan. RNA-seq Analysis as a Cloud Service: Toward AI-Driven Computational Resource Efficiency. In: SIMPÓSIO DE INFRAESTRUTURA DIGITAL/NUVEM PARA PESQUISA (PESQUISA@NUVEM), 1. , 2026, Gramado/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 127-133. DOI: https://doi.org/10.5753/pesquisanuvem.2026.23134.