Skip to main content

Make No Mistake! Why Do Tools Make Incorrect Long Non-coding RNA Classification?

  • Conference paper
  • First Online:
Advances in Bioinformatics and Computational Biology (BSB 2023)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13954))

Included in the following conference series:

Abstract

Long non-coding RNAs (lncRNAs) play important roles in various biological processes, and their accurate identification is essential for understanding their functions and potential therapeutic applications. In a previous study, we assessed the impact of short and long reads sequencing technologies on long non-coding RNA computational identification in human and plant data. We provided evidence of where and how to make potential better approaches for the lncRNA classification. In this follow-up study, we investigate the misclassified sequences by five machine learning tools for lncRNA classification in humans to understand the reasons behind the failures of the tools. Our analysis suggests that the primary cause for the failures of these tools is the overlap of two coding regions by lncRNAs, similar to a chimeric sequence. Furthermore, we emphasize the need to view genes as transcriptional units, as the transcript will define the gene function. These insights underscore the need for further refinement and improvement of these tools to enhance their accuracy and reliability in lncRNA prediction and classification, ultimately contributing to a better understanding of the role of lncRNAs in various biological processes and potential therapeutic applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burgess, D.J.: Genomics: next regeneration sequencing for reference genomes. Nat. Rev. Genet. 19(3), 125 (2018)

    Article  CAS  PubMed  Google Scholar 

  2. Chiquitto, A.G., Silva, L.O.L., Oliveira, L.S., Domingues, D.S., Paschoal, A.R.: Impact of sequencing technologies on long non-coding RNA computational identification. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2022)

    Google Scholar 

  3. Ensembl: Ensembl genome browser enst00000461287 (2023). www.feb2023.archive.ensembl.org/Homo_sapiens/Share/ba38b47c75f9e62e9cd82253bdcc235b?redirect=no. Accessed 31 Mar 2023

  4. Ensembl: Ensembl genome browser enst00000539086 (2023). www.feb2023.archive.ensembl.org/Homo_sapiens/Share/1a6c08c69bf3fcfb9494fcbb2d1676cb?redirect=no. Accessed 31 Mar 2023

  5. Ensembl: Ensembl genome browser enst00000623502 (2023). www.feb2023.archive.ensembl.org/Homo_sapiens/Share/7e058f22d6c8e5c849c29b7be72fd5a0?redirect=no. Accessed 31 Mar 2023

  6. Ensembl: Ensembl genome browser enst00000648391 (2023). www.feb2023.archive.ensembl.org/Homo_sapiens/Share/a66509fa49c2933d0c22da068b44c2c2?redirect=no. Accessed 31 Mar 2023

  7. Ensembl: Ensembl genome browser enst00000668205 (2023). www.feb2023.archive.ensembl.org/Homo_sapiens/Share/3d5e32afaa48f26431ba59ae949b68d9?redirect=no. Accessed 31 Mar 2023

  8. Frankish, A., et al.: GENCODE 2021. Nucleic Acids Res. 49(D1), D916–D923 (2020). https://doi.org/10.1093/nar/gkaa1087

  9. Klapproth, C., Sen, R., Stadler, P.F., Findeiß, S., Fallmann, J.: Common features in lncRNA annotation and classification: a survey. Non-Coding RNA 7(4), 77 (2021). https://doi.org/10.3390/ncrna7040077

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lagarde, J., et al.: High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49(12), 1731–1740 (2017). https://doi.org/10.1038/ng.3988. www.nature.com/articles/ng.3988

  11. Nabi, A., Dilekoglu, B., Adebali, O., Tastan, O.: Discovering misannotated lncRNAs using deep learning training dynamics. Bioinformatics 39(1) (2023). https://doi.org/10.1093/bioinformatics/btac821

  12. Pollard, M.O., Gurdasani, D., Mentzer, A.J., Porter, T., Sandhu, M.S.: Long reads: their purpose and place. Hum. Mol. Genet. 27(R2), R234–R241 (2018). https://doi.org/10.1093/hmg/ddy177

  13. Wang, Y., et al.: Identification of the cross-strand chimeric RNAs generated by fusions of bi-directional transcripts. Nat. Commun. 12(1), 4645 (2021). https://doi.org/10.1038/s41467-021-24910-2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Xie, S.Q., et al.: ISOdb: a comprehensive database of full-length isoforms generated by Iso-Seq. Int. J. Genomics 2018, 1–6 (2018) https://doi.org/10.1155/2018/9207637. www.hindawi.com/journals/ijg/2018/9207637/

  15. Yuan, Y., Bayer, P.E., Batley, J., Edwards, D.: Improvements in genomic technologies: application to crop genomics. Trends Biotechnol. 35(6), 547–558 (2017)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre R. Paschoal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chiquitto, A.G., Silva, L.O.L., Oliveira, L.S., Domingues, D.S., Paschoal, A.R. (2023). Make No Mistake! Why Do Tools Make Incorrect Long Non-coding RNA Classification?. In: Reis, M.S., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2023. Lecture Notes in Computer Science(), vol 13954. Springer, Cham. https://doi.org/10.1007/978-3-031-42715-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42715-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42714-5

  • Online ISBN: 978-3-031-42715-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics