Abstract
Constructive Machine Learning (CML) is a research field that uses algorithms to generate new instances, similar but not identical to existing ones. It has been widely used to assist the discovery of new drug-like molecules. This is very challenging, given that the search space is discrete, unstructured and enormous. In this work we use CML to learn the intrinsic rules of datasets of molecules to generate novel ones. The chosen CML methods can be divided in two sub groups, text-based and graph oriented. Considering different possibilities to evaluate the methods and the generated molecules, we propose classifying generated molecules in a taxonomy, using a hierarchical multi-label classifier previously trained in a dataset of molecules with known taxonomy information. In this way, it is possible to predict properties and verify the relevance of the generated molecules to existing taxonomies. We also propose a hierarchical diversity measure to compare groups of molecules based on their taxonomy information. The measure showed coherent results and is faster to calculate than the commonly used external diversity measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antoniou, G., Harmelen, F.v.: Web ontology language: owl. In: Handbook on ontologies, pp. 67–92. Springer (2004)
Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7 (2015)
Benhenda, M.: ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227 (2017)
Bickerton, G.R., Paolini, G.V., Besnard, J., Muresan, S., Hopkins, A.L.: Quantifying the chemical beauty of drugs. Nat. Chem. 4(2), 90–98 (2012)
Bjerrum, E.J., Threlfall, R.: Molecular generation with recurrent neural networks (RNNs). arXiv preprint arXiv:1705.04612 (2017)
Brown, N., Ertl, P., Lewis, R., Luksch, T., Reker, D., Schneider, N.: Artificial intelligence in chemistry and drug design. J. Comput. Aided Mol. Des. 34(7), 709–715 (2020). https://doi.org/10.1007/s10822-020-00317-x
Cao, D.S., Xu, Q., Hu, Q., Liang, Y.Z.: Manual for ChemoPy (2013)
Degtyarenko, K., et al.: ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36(suppl 1), D344–D350 (2007)
DiMasi, J.A., Grabowski, H.G., Hansen, R.W.: Innovation in the pharmaceutical industry: new estimates of r &d costs. J. Health Econ. 47, 20–33 (2016)
Elton, D.C., Boukouvalas, Z., Fuge, M.D., Chung, P.W.: Deep learning for molecular design-a review of the state of the art. Mol. Syst. Des. Eng. 4(4), 828–849 (2019)
Evans, L., Phipps, R., Shanu-Wilson, J., Steele, J., Wrigley, S.: Methods for metabolite generation and characterization by NMR. In: Ma, S., Chowdhury, S.K. (eds.) Identification and Quantification of Drugs, Metabolites, Drug Metabolizing Enzymes, and Transporters (Second Edition), pp. 119–150. Elsevier, Amsterdam, second edition. (2020)
Foster, D.: Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play. O’Reilly Media (2019)
Gaulton, A., et al.: The ChEMBL database in 2017. Nucleic Acids Res. 45(D1), D945–D954 (2017)
Gómez-Bombarelli, R., et al.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4(2), 268–276 (2018)
Gupta, A., Müller, A.T., Huisman, B.J., Fuchs, J.A., Schneider, P., Schneider, G.: Generative recurrent networks for de novo drug design. Mol. Inf. 37(1–2), 1700111 (2018)
Jolliffe, I.: Principal component analysis. Encyclopedia of Statistics in Behavioral Science (2005)
Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A., Zhavoronkov, A.: druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14(9), 3098–3104 (2017)
Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2019)
Lipinski, C.A.: Lead-and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1(4), 337–341 (2004)
Marwat, S.K., ur Rehman, F.: Medicinal and pharmacological potential of harmala (peganum harmala l.) seeds. In: Preedy, V.R., Watson, R.R., Patel, V.B. (eds.) Nuts and Seeds in Health and Disease Prevention, pp. 585–599. Academic Press, San Diego (2011)
Maziarz, K., et al.: Learning to extend molecular scaffolds with structural motifs. arXiv preprint arXiv:2103.03864 (2021)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Mitchell, J.B.: Machine learning methods in chemoinformatics. Wiley Interdisc. Rev. Comput. Mol. Sci. 4(5), 468–481 (2014)
Olivecrona, M., Blaschke, T., Engkvist, O., Chen, H.: Molecular de-novo design through deep reinforcement learning. J. cheminform. 9(1), 48 (2017)
Papamakarios, G., Pavlakou, T., Murray, I.: Masked autoregressive flow for density estimation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Sanchez-Lengeling, B., Outeiral, C., Guimaraes, G.L., Aspuru-Guzik, A.: Optimizing distributions over molecular space. an objective-reinforced generative adversarial network for inverse-design chemistry (organic). ChemRxiv (2017)
Shi, C., Xu, M., Zhu, Z., Zhang, W., Zhang, M., Tang, J.: GraphAF: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382 (2020)
Team, G.: GT4SD (Generative Toolkit for Scientific Discovery) (2022)
Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theor. Probab. Appl. 18(4), 784–786 (1974)
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185–214 (2008)
Wegner, J.K., et al.: Cheminformatics. Commun. ACM 55(11), 65–75 (2012)
Weininger, D.: Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988)
Xiong, J., Xiong, Z., Chen, K., Jiang, H., Zheng, M.: Graph neural networks for automated de novo drug design. Drug Discovery Today 26(6), 1382–1393 (2021)
You, J., Liu, B., Ying, Z., Pande, V., Leskovec, J.: Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in neural information processing systems, vol. 31 (2018)
Acknowledgments
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. The authors also thank the Brazilian research agencies FAPESP and CNPq for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
de Souza Silva, R.R., Cerri, R. (2023). Constructive Machine Learning and Hierarchical Multi-label Classification for Molecules Design. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-45389-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)