Skip to main content

Physicochemical Properties for Promoter Classification

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Abstract

The accurate identification of promoter regions in DNA sequences holds significant importance in the field of bioinformatics. While this problem has garnered substantial attention in the literature, it remains unresolved. Several researchers have achieved notable outcomes by employing diverse machine-learning techniques to predict promoter regions. However, only a few have thoroughly explored the utilization of features derived from the physicochemical properties of DNA across various organism types. This study investigates the advantages of incorporating these features in the training of machine-learning models. The research evaluates and compares the performance of multiple metrics on diverse datasets encompassing both prokaryotic and eukaryotic organisms. The state-of-the-art CNNProm method is employed as the baseline for our experiments. The models and source code associated with this study can be accessed at the following URL of the project’s repository: https://anonymous.4open.science/r/bracis-paper-1458/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/solovictor/CNNPromoterData.

  2. 2.

    Official page: https://pycaret.gitbook.io/.

References

  1. Arslan, H.: A new promoter prediction method using support vector machines. In: 2019 27th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2019)

    Google Scholar 

  2. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, vol. 24 (2011)

    Google Scholar 

  3. Bhandari, N., Khare, S., Walambe, R., Kotecha, K.: Comparison of machine learning and deep learning techniques in promoter prediction across diverse species. PeerJ Comput. Sci. 7, e365 (2021)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Cartharius, K., et al.: Matinspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 21(13), 2933–2942 (2005)

    Article  Google Scholar 

  6. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  7. Chen, W., Lei, T.Y., Jin, D.C., Lin, H., Chou, K.C.: PSEKNC: a flexible web server for generating pseudo k-tuple nucleotide composition. Anal. Biochem. 456, 53–60 (2014)

    Article  Google Scholar 

  8. Chevez-Guardado, R., Peña-Castillo, L.: Promotech: a general tool for bacterial promoter recognition. Genome Biol. 22, 1–16 (2021)

    Article  Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    Article  MATH  Google Scholar 

  10. Deaton, A.M., Bird, A.: CPG islands and the regulation of transcription. Genes Dev. 25(10), 1010–1022 (2011)

    Article  Google Scholar 

  11. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  12. Dreos, R., Ambrosini, G., Cavin Périer, R., Bucher, P.: EPD and EPDNEW, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res. 41(D1), D157–D164 (2013)

    Article  Google Scholar 

  13. Efron, B.: Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78(382), 316–331 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  14. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  15. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)

    Google Scholar 

  16. Gama-Castro, S., et al.: Regulondb version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 44(D1), D133–D143 (2016)

    Google Scholar 

  17. Goñi, J.R., Pérez, A., Torrents, D., Orozco, M.: Determining promoter location based on DNA structure first-principles calculations. Genome Biol. 8(12), R263 (2007)

    Article  Google Scholar 

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. Ishii, T., Yoshida, K.i., Terai, G., Fujita, Y., Nakai, K.: DBTBS: a database of bacillus subtilis promoters and transcription factors. Nucleic Acids Res. 29(1), 278–280 (2001)

    Google Scholar 

  20. Juven-Gershon, T., Kadonaga, J.T.: Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev. Biol. 339(2), 225–229 (2010)

    Article  Google Scholar 

  21. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  22. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)

    Google Scholar 

  23. Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. John Wiley & Sons (2014)

    Google Scholar 

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  25. Li, F., et al.: Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief. Bioinform. 22(2), 2126–2140 (2021)

    Article  MathSciNet  Google Scholar 

  26. Meng, H., Ma, Y., Mai, G., Wang, Y., Liu, C.: Construction of precise support vector machine based models for predicting promoter strength. Quant. Biol. 5, 90–98 (2017)

    Article  Google Scholar 

  27. Moraes, L., Silva, P., Luz, E., Moreira, G.: CapsProm: a capsule network for promoter prediction. Comput. Biol. Med. 147, 105627 (2022)

    Article  Google Scholar 

  28. Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S.: The biology of eukaryotic promoter prediction-a review. Comput. Chem. 23(3–4), 191–207 (1999)

    Article  Google Scholar 

  29. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  30. Umarov, R.K., Solovyev, V.V.: Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS ONE 12(2), e0171410 (2017)

    Article  Google Scholar 

  31. Wasserman, W.W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)

    Article  Google Scholar 

  32. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  Google Scholar 

  33. Zeng, J., Zhu, S., Yan, H.: Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief. Bioinform. 10(5), 498–508 (2009)

    Article  Google Scholar 

  34. Zhang, M., et al.: Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief. Bioinform. 23(2) (2022)

    Google Scholar 

  35. Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)

    Article  Google Scholar 

Download references

Acknowledgment

The authors would also like to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001, Fundacão de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, grants APQ-01518-21, APQ-01647-22), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 307151/2022-0, 308400/2022-4) and Universidade Federal de Ouro Preto (PROPPI/UFOP) for supporting the development of this study. We want to express our gratitude for the collaboration of the Laboratório Multiusuários de Bioinformática of Núcleo de Pesquisas em Ciências Biológicas (NUPEB/UFOP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lauro Moraes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moraes, L., Luz, E., Moreira, G. (2023). Physicochemical Properties for Promoter Classification. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45389-2_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45388-5

  • Online ISBN: 978-3-031-45389-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics