Constructive Machine Learning and Hierarchical Multi-label Classification for Molecules Design

Resumo


Constructive Machine Learning (CML) is a research field that uses algorithms to generate new instances, similar but not identical to existing ones. It has been widely used to assist the discovery of new drug-like molecules. This is very challenging, given that the search space is discrete, unstructured and enormous. In this work we use CML to learn the intrinsic rules of datasets of molecules to generate novel ones. The chosen CML methods can be divided in two sub groups, text-based and graph oriented. Considering different possibilities to evaluate the methods and the generated molecules, we propose classifying generated molecules in a taxonomy, using a hierarchical multi-label classifier previously trained in a dataset of molecules with known taxonomy information. In this way, it is possible to predict properties and verify the relevance of the generated molecules to existing taxonomies. We also propose a hierarchical diversity measure to compare groups of molecules based on their taxonomy information. The measure showed coherent results and is faster to calculate than the commonly used external diversity measures.
Publicado
25/09/2023
SILVA, Rodney Renato de Souza; CERRI, Ricardo. Constructive Machine Learning and Hierarchical Multi-label Classification for Molecules Design. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 276-290. ISSN 2643-6264.