Revealing Token-Level Importance in Conditional Molecular Design Through Kullback–Leibler Divergence

  • Arthur Cerveira UFPel
  • Ulisses B. Corrêa UFPel

Resumo


Conditional molecular design using transformer-based models can accelerate drug discovery, but their black-box nature limits interpretability. We propose a method to explain these models by quantifying the influence of property conditions on the generative process. Our approach uses the Kullback–Leibler Divergence to measure the difference between conditional and unconditional output distributions at each generation step. This allows us to identify the token-level importance of a sequence for specific desired conditions.

Referências

Alizadehsani, R., Oyelere, S. S., Hussain, S., Jagatheesaperumal, S. K., Calixto, R. R., Rahouti, M., Roshanzamir, M., and De Albuquerque, V. H. C. (2024). Explainable artificial intelligence for drug discovery and development: A comprehensive survey. IEEE Access, 12:35796–35812.

Cerveira, A., Kremer, F., Gomes, G., and Correa, U. (2025). Compo-gpt: Cross-attention conditioning for multi-target molecular design in generative models. In 2025 International Joint Conference on Neural Networks (IJCNN), pages 1–8.

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79–86.

Mak, K., Wong, Y., and Pichika, M. (2023). Artificial intelligence in drug discovery and development. In Hock, F. and Pugsley, M., editors, Drug Discovery and Evaluation: Safety and Pharmacokinetic Assays. Springer, Cham.

Wang, Y., Zhao, H., Sciabola, S., and Wang, W. (2023). cmolgpt: A conditional generative pre-trained transformer for target-specific de novo molecular generation. Molecules, 28(11):4430.

Zhang, Y., Liu, C., Liu, M., Liu, T., Lin, H., Huang, C.-B., and Ning, L. (2024). Attention is all you need: utilizing attention in ai-enabled drug discovery. Briefings in Bioinformatics, 25(1):bbad467.
Publicado
12/11/2025
CERVEIRA, Arthur; CORRÊA, Ulisses B.. Revealing Token-Level Importance in Conditional Molecular Design Through Kullback–Leibler Divergence. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 144-147. DOI: https://doi.org/10.5753/eramiars.2025.16732.