Physicochemical Properties for Promoter Classification

Moraes, Lauro; Luz, Eduardo; Moreira, Gladston

doi:10.1007/978-3-031-45389-2_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

226 Accesses

Abstract

The accurate identification of promoter regions in DNA sequences holds significant importance in the field of bioinformatics. While this problem has garnered substantial attention in the literature, it remains unresolved. Several researchers have achieved notable outcomes by employing diverse machine-learning techniques to predict promoter regions. However, only a few have thoroughly explored the utilization of features derived from the physicochemical properties of DNA across various organism types. This study investigates the advantages of incorporating these features in the training of machine-learning models. The research evaluates and compares the performance of multiple metrics on diverse datasets encompassing both prokaryotic and eukaryotic organisms. The state-of-the-art CNNProm method is employed as the baseline for our experiments. The models and source code associated with this study can be accessed at the following URL of the project’s repository: https://anonymous.4open.science/r/bracis-paper-1458/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/solovictor/CNNPromoterData.
2.
Official page: https://pycaret.gitbook.io/.

References

Arslan, H.: A new promoter prediction method using support vector machines. In: 2019 27th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2019)
Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, vol. 24 (2011)
Google Scholar
Bhandari, N., Khare, S., Walambe, R., Kotecha, K.: Comparison of machine learning and deep learning techniques in promoter prediction across diverse species. PeerJ Comput. Sci. 7, e365 (2021)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Cartharius, K., et al.: Matinspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 21(13), 2933–2942 (2005)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Chen, W., Lei, T.Y., Jin, D.C., Lin, H., Chou, K.C.: PSEKNC: a flexible web server for generating pseudo k-tuple nucleotide composition. Anal. Biochem. 456, 53–60 (2014)
Article Google Scholar
Chevez-Guardado, R., Peña-Castillo, L.: Promotech: a general tool for bacterial promoter recognition. Genome Biol. 22, 1–16 (2021)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Article MATH Google Scholar
Deaton, A.M., Bird, A.: CPG islands and the regulation of transcription. Genes Dev. 25(10), 1010–1022 (2011)
Article Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Article Google Scholar
Dreos, R., Ambrosini, G., Cavin Périer, R., Bucher, P.: EPD and EPDNEW, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res. 41(D1), D157–D164 (2013)
Article Google Scholar
Efron, B.: Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78(382), 316–331 (1983)
Article MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Google Scholar
Gama-Castro, S., et al.: Regulondb version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 44(D1), D133–D143 (2016)
Google Scholar
Goñi, J.R., Pérez, A., Torrents, D., Orozco, M.: Determining promoter location based on DNA structure first-principles calculations. Genome Biol. 8(12), R263 (2007)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ishii, T., Yoshida, K.i., Terai, G., Fujita, Y., Nakai, K.: DBTBS: a database of bacillus subtilis promoters and transcription factors. Nucleic Acids Res. 29(1), 278–280 (2001)
Google Scholar
Juven-Gershon, T., Kadonaga, J.T.: Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev. Biol. 339(2), 225–229 (2010)
Article Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Google Scholar
Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. John Wiley & Sons (2014)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, F., et al.: Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief. Bioinform. 22(2), 2126–2140 (2021)
Article MathSciNet Google Scholar
Meng, H., Ma, Y., Mai, G., Wang, Y., Liu, C.: Construction of precise support vector machine based models for predicting promoter strength. Quant. Biol. 5, 90–98 (2017)
Article Google Scholar
Moraes, L., Silva, P., Luz, E., Moreira, G.: CapsProm: a capsule network for promoter prediction. Comput. Biol. Med. 147, 105627 (2022)
Article Google Scholar
Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S.: The biology of eukaryotic promoter prediction-a review. Comput. Chem. 23(3–4), 191–207 (1999)
Article Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Umarov, R.K., Solovyev, V.V.: Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS ONE 12(2), e0171410 (2017)
Article Google Scholar
Wasserman, W.W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)
Article Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article Google Scholar
Zeng, J., Zhu, S., Yan, H.: Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief. Bioinform. 10(5), 498–508 (2009)
Article Google Scholar
Zhang, M., et al.: Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief. Bioinform. 23(2) (2022)
Google Scholar
Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)
Article Google Scholar

Download references

Acknowledgment

The authors would also like to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001, Fundacão de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, grants APQ-01518-21, APQ-01647-22), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 307151/2022-0, 308400/2022-4) and Universidade Federal de Ouro Preto (PROPPI/UFOP) for supporting the development of this study. We want to express our gratitude for the collaboration of the Laboratório Multiusuários de Bioinformática of Núcleo de Pesquisas em Ciências Biológicas (NUPEB/UFOP).

Author information

Authors and Affiliations

Universidade Federal de Ouro Preto, Ouro Preto-MG, Brazil
Lauro Moraes, Eduardo Luz & Gladston Moreira

Authors

Lauro Moraes
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Luz
View author publications
You can also search for this author in PubMed Google Scholar
Gladston Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lauro Moraes .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moraes, L., Luz, E., Moreira, G. (2023). Physicochemical Properties for Promoter Classification. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-45389-2_25
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Physicochemical Properties for Promoter Classification