On the impact of missing value imputation methods for multiple kernel learning on bipartite graphs

  • Victor Vidal Universidade Federal Rural de Pernambuco
  • Tássia Bastos Universidade Federal Rural de Pernambuco
  • Rafael Ferreira Mello Universidade Federal Rural de Pernambuco / Cesar School
  • Péricles Miranda Universidade Federal Rural de Pernambuco
  • André C. A. Nascimento Universidade Federal Rural de Pernambuco / Cesar School


In the last decade, the study of pharmacological networks has received a lot of attention, given its relevance to the drug discovery process. Many different approaches for predicting biological interactions have been proposed, especially in the area of multiple kernel learning (MKL). Such methods comprise integrative approaches that can handle heterogeneous data sources in the form of kernels, but can suffer from the missing data problem. Techniques to handle missing values in the base kernel matrices can be used, usually based on simpler techniques, such as imputing zeroes, mean and median of the kernel matrix. In this work, techniques for handling missing values were evaluated in the context of bipartite networks. Our analyses showed that depending on the amount of missing data, k-NN and Singular Value Decomposition (SVD) techniques performed much better than the other techniques, bringing encouraging results, while zero-fill showed the worst performance in relation to all other evaluated methods.

Palavras-chave: machine learning, kernel methods, bioinformatics


A. Cichonska, T. Pahikkala, S. S. H. J. A. A. M. H. T. A. J. R. (2018). Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics, Oxford University Press.

Alex Rubinsteyn, S. F. fancyimpute: An imputation library for python.

Ammad-Ud-Din (2016). Drug response prediction by inferring pathway-response associations with kernelized bayesian matrix factorization. Bioinformatics, Oxford University Press.

Andre Nascimento, Ricardo Prudencio, I. C. (2016). A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics.

C. Chong, D. S. (2007). New uses for old drugs. Nature.

Chen, J. and Zhang, L. (2021). A survey and systematic assessment of computational methods for drug response prediction. Briefings in bioinformatics, 22(1):232–246.

Dalianis, H. (2018). Evaluation metrics and evaluation. Clinical text mining, Springer.

F Aiolli, M. D. (2015). Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing.

Jin, L., Bi, Y., Hu, C., Qu, J., Shen, S., Wang, X., and Tian, Y. (2021). A comparative study of evaluating missing value imputation methods in label-free proteomics. Scientific reports, 11(1):1–11.

Kirch, W. (2008). Pearson’s correlation coefficient. Encyclopedia of Public Health, Dordrecht: Springer Netherlands.

Klekota, J. and Roth, F. P. (2008). Chemical substructures that enrich for biological activity. Bioinformatics, 24(21):2518–2525.

Kumar, R., Chen, T., Hardt, M., Beymer, D., Brannon, K., and Syeda-Mahmood, T. (2013). Multiple kernel completion and its application to cardiac disease discrimination. In 2013 IEEE 10th International Symposium on Biomedical Imaging, pages 764–767. IEEE.

Kurucz, M., Benczúr, A. A., and Csalogány, K. (2007). Methods for large scale svd with missing values. In Proceedings of KDD cup and workshop, volume 12, pages 31–38. Citeseer.

Li, M., Xia, J., Xu, H., Liao, Q., Zhu, X., and Liu, X. (2021). Localized incomplete multiple kernel k-means with matrix-induced regularization. IEEE Transactions on Cybernetics.

Liu, X., Zhu, X., Li, M., Wang, L., Zhu, E., Liu, T., Kloft, M., Shen, D., Yin, J., and Gao, W. (2019). Multiple kernel k k-means with incomplete kernels. IEEE transactions on pattern analysis and machine intelligence, 42(5):1191–1204.

Liu, Y. (2020). Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural Processing Letters, Springer.

M. Gonen, E. A. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning Research.

Murat Cokol, Ivan Iossifov, C. W. A. R. (2005). Emergent behavior of growing knowledge about molecular interactions. Nat Biotechnol.

Peter Csermely, Tamás Korcsmáros, H. J. K. G. L. R. N. (2013). Structure and dynamics of molecular networks: A novel paradigm of drug discovery a comprehensive review. Pharmacol Ther.

R Rivero, R Lemence, T. K. (2017). Mutual kernel matrix completion. IEICE.

S. P. Neill, R. M. H. (2018). Fundamentals of ocean renewable energy: generating electricity from the sea. Academic Press.

Tuikkala, J. (2008). Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC bioinformatics, BioMed Central.

Wei, R. (2018). Missing value imputation approach for mass spectrometry-based metabolomics data. Scientific reports, Nature Publishing Group.

Yang, W. (2012). Genomics of drug sensitivity in cancer: a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research, Oxford University Press.

Yuan, X., Han, L., Qian, S., Xu, G., and Yan, H. (2019). Singular value decomposition based recommendation using imputed data. Knowledge-Based Systems, 163:485–494.

Zhang, Z. (2016). Missing data imputation: focusing on single imputation. Annals of translational medicine, 4(1).

Zhu, X., Liu, X., Li, M., Zhu, E., Liu, L., Cai, Z., Yin, J., and Gao, W. (2018). Localized incomplete multiple kernel k-means. In IJCAI, pages 3271–3277.
VIDAL, Victor; BASTOS, Tássia; MELLO, Rafael Ferreira; MIRANDA, Péricles; NASCIMENTO, André C. A.. On the impact of missing value imputation methods for multiple kernel learning on bipartite graphs. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 199-211. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2023.233884.

Artigos mais lidos do(s) mesmo(s) autor(es)