Skip to main content

CellHeap: A Workflow for Optimizing COVID-19 Single-Cell RNA-Seq Data Processing in the Santos Dumont Supercomputer

  • Conference paper
  • First Online:
Advances in Bioinformatics and Computational Biology (BSB 2021)

Abstract

Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. We expect this trend to continue, and it is realistic to predict amounts of COVID-19 scRNA-seq data increasing to several Petabytes in the coming years. However, thoughtful analysis of this data requires large-scale computing infrastructures, and software systems optimized for such platforms to generate biological knowledge. This paper presents CellHeap, a portable and robust workflow for scRNA-seq customizable analyses, with quality control throughout the execution steps and deployable on supercomputers. Furthermore, we present the deployment of CellHeap in the Santos Dumont supercomputer for analyzing COVID-19 scRNA-seq datasets, and discuss a case study that processed dozens of Terabytes of COVID-19 scRNA-seq raw data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Cellranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more. CellRanger Count is executed once for each dataset, and CellRanger Aggregate is optionally executed for aggregating several different datasets/tissues. In addition, CellRanger Count and CellRanger Aggregate generate a gene-count matrix, where the results depend on the analysis performed in a simple or an aggregated way.

  2. 2.

    Supercomputer details in https://sdumont.lncc.br.

  3. 3.

    Slurm details in https://slurm.schedmd.com.

References

  1. Aalst, W.M.P.: Flexible workflow management systems: an approach based on generic process models. In: Proceedings of the Database and Expert Systems Applications (DEXA), pp. 186–195 (1999)

    Google Scholar 

  2. Baran, Y., et al.: MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions. Genome Biol. 20(1), 1–19 (2019)

    Article  CAS  Google Scholar 

  3. Bost, P., et al.: Host-viral infection maps reveal signatures of severe COVID-19 patients. Cell 181(7), 1475–1488 (2020)

    Article  CAS  Google Scholar 

  4. Clough, E., Barrett, T.: The gene expression omnibus database. In: Mathé, E., Davis, S. (eds.) Statistical Genomics. MMB, vol. 1418, pp. 93–110. Springer, New York (2016). https://doi.org/10.1007/978-1-4939-3578-9_5

    Chapter  Google Scholar 

  5. Deelman, E., Peterka, T., Altintas, I., et al.: The future of scientific workflows. Int. J. High Perform. Comput. Appl. 32(1), 159–175 (2018)

    Article  Google Scholar 

  6. Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. 4(46(D1)), D649–D655 (2018)

    Google Scholar 

  7. Franzén, O., Gan, L.M., Björkegren, J.L.: PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database 2019 (2019)

    Google Scholar 

  8. Hao, Y., et al.: Integrated analysis of multimodal single-cell data. Cell (2021)

    Google Scholar 

  9. Heimberg, G., Bhatnagar, R., El-Samad, H., Thomson, M.: Dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2(4), 239–250 (2016)

    Article  CAS  Google Scholar 

  10. Herring, C.A., Banerjee, A., McKinley, E.T., et al.: Unsupervised trajectory analysis of single-cell RNA-seq and imaging data reveals alternative tuft cell origins in the gut. Cell Syst. 6(1), 37–51 (2018)

    Article  CAS  Google Scholar 

  11. Huang, D., Sherman, B., Lempicki, R.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009)

    Article  CAS  Google Scholar 

  12. Hwang, B., Lee, J., Bang, D.: Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 50, 1–14 (2018)

    Article  CAS  Google Scholar 

  13. Islam, S., et al.: Highly multiplexed and strand-specific single-cell RNA 5\(^\prime \) end sequencing. Nat. Protoc. 7(5), 813–828 (2012)

    Article  CAS  Google Scholar 

  14. Kanz, C., Aldebert, P., Althorpe, N., et al.: The EMBL nucleotide sequence database. Nucleic Acids Res. 33(Suppl\(\_\)1), D29–D33 (2005)

    Google Scholar 

  15. Kuchina, A., et al.: Microbial single-cell RNA sequencing by split-pool barcoding. Science (2020)

    Google Scholar 

  16. Kuleshov, M.V., et al.: Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44(W1), W90–W97 (2016)

    Article  CAS  Google Scholar 

  17. Liao, M., et al.: Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat. Med. 26(6), 842–844 (2020)

    Article  CAS  Google Scholar 

  18. Liberzon, A., Subramanian, A., Pinchback, R., Thorvaldsdóttir, H., Tamayo, P., Mesirov, J.P.: Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12), 1739–1740 (2011)

    Google Scholar 

  19. Luecken, M.D., Theis, F.J.: Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15(e8746), 1–23 (2019)

    Google Scholar 

  20. Ma, F., Salome, P.A., Merchant, S.S., Pellegrini, M.: Single-cell RNA sequencing of batch chlamydomonas cultures reveals heterogeneity in their diurnal cycle phase. Plant Cell 33(4), 1042–1057 (2021)

    Article  Google Scholar 

  21. Macosko, E.Z., et al.: Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5), 1202–1214 (2015)

    Article  CAS  Google Scholar 

  22. Malone, J., et al.: Modeling sample variables with an experimental factor ontology. Bioinformatics 26(8), 1112–1118 (2010)

    Article  CAS  Google Scholar 

  23. Mi, H., Ebert, D., Muruganujan, A., et al.: PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res. 49(D1), D394–D403 (2020)

    Article  Google Scholar 

  24. Papatheodorou, I., Moreno, P., Manning, J., Fuentes, et al.: Expression atlas update: from tissues to single cells. Nucleic Acids Res. 48(D1), D77–D83 (2019)

    Google Scholar 

  25. Schulte-Schrepping, J., Reusch, N., Paclik, D., et al.: Severe COVID-19 is marked by a dysregulated myeloid cell compartment. Cell 182(6), 1419–1440 (2020)

    Article  CAS  Google Scholar 

  26. Silvin, A., Chapuis, N., Dunsmore, G., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6) (2020)

    Google Scholar 

  27. Song, E., Bartley, C.M., Chow, R.D.: Divergent and self-reactive immune responses in the CNS of COVID-19 patients with neurological symptoms. Cell Rep. Med. 2(5) (2021)

    Google Scholar 

  28. Street, K., Risso, D., Fletcher, R., et al.: Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19(477), 1–16 (2018)

    Google Scholar 

  29. Stuart, T., et al.: Comprehensive integration of single-cell data. Cell 177(7), 1888–1902 (2019)

    Article  CAS  Google Scholar 

  30. SRA Toolkit Development Team: Sra toolkit. http://ncbi.github.io/sra-tools/. Accessed Aug 2021

  31. Vigneron, A., et al.: Single-cell RNA sequencing of trypanosoma brucei from tsetse salivary glands unveils metacyclogenesis and identifies potential transmission blocking antigens. Proc. Natl. Acad. Sci. 117(5), 2613–2621 (2020)

    Article  CAS  Google Scholar 

  32. Viteri, J.G.G., Sidiropoulos, K., et al.: ReactomeGSA - efficient multi-omics comparative pathway analysis. Mol. Cell. Proteomics 19(12), 2115–2125 (2020)

    Article  Google Scholar 

  33. Wolf, F.A., Hamey, F.K., Plass, M., et al.: PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20(59), 1–9 (2019)

    Google Scholar 

  34. Yao, C., Bora, S.A., Parimon, T., et al.: Cell-type-specific immune dysregulation in severely ill COVID-19 patients. Cell Rep. 34(1) (2020)

    Google Scholar 

  35. Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8(1), 1–12 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported within this paper. URL: http://sdumont.lncc.br. The authors also acknowledge the INOVA-FIOCRUZ program (grant number VPPCB-005-FIO-20-2-34-52) for funding this research. M.E.M.T.Walter thanks CNPq for the research scholarship PQ 310785/2018-9.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrício A. B. Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Silva, V.S. et al. (2021). CellHeap: A Workflow for Optimizing COVID-19 Single-Cell RNA-Seq Data Processing in the Santos Dumont Supercomputer. In: Stadler, P.F., Walter, M.E.M.T., Hernandez-Rosales, M., Brigido, M.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2021. Lecture Notes in Computer Science(), vol 13063. Springer, Cham. https://doi.org/10.1007/978-3-030-91814-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91814-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91813-2

  • Online ISBN: 978-3-030-91814-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics