Machine Learning for Incomplete Data

Diego P. P. Mesquita; João P. P. Gomes

doi:10.5753/ctd.2018.3657

Diego P. P. Mesquita UFC
João P. P. Gomes UFC

DOI: https://doi.org/10.5753/ctd.2018.3657

Resumo

Methods based on basis functions and similarity measures are widely used in machine learning and related fields. These methods often take for granted that data is fully observed and are not equipped to handle incomplete data in an organic manner. This assumption is often flawed, as incomplete data is a fact in various domains such as medical diagnosis and sensor analytics. Therefore, one might find it useful to be able to estimate the value of these functions in the presence of partially observed data. In this work, we present methodologies to estimate the Gaussian Kernel, the Euclidean Distance, the Epanechnikov kernel and arbitrary basis functions in the presence of possibly incomplete feature vectors.

Referências

Acuña, E. and Rodriguez, C. (2004). The Treatment of Missing Values and its Effect on Classifier Accuracy. Springer Berlin Heidelberg, Berlin, Heidelberg.

Aste, M., Boninsegna, M., Freno, A., and Trentin, E. (2015). Techniques for dealing with incomplete data: a tutorial and survey. Pattern Analysis and Applications, 18(1):1–29.

de Souza, A. H., Corona, F., Barreto, G. A., Miche, Y., and Lendasse, A. (2015). Minimal learning machine. Neurocomput., 164(C):34–44.

Eirola, E., Doquire, G., Verleysen, M., and Lendasse, A. (2013). Distance estimation in numerical data sets with missing values. Information Sciences, 240:115 – 128.

Eirola, E., Lendasse, A., Vandewalle, V., and Biernacki, C. (2014). Mixture of gaussians for distance estimation with missing data. Neurocomputing, 131:32 – 42.

Gheyas, I. A. and Smith, L. S. (2010). A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing, 73(16–18):3039 – 3065.

Kang, P. (2013). Locally linear reconstruction based missing value imputation for supervised learning. Neurocomputing, 118:65 – 78.

Lobato, F., Sales, C., Araujo, I., Tadaiesky, V., Dias, L., Ramos, L., and Santana, A. (2015). Multi-objective genetic algorithm for missing data imputation. Pattern Recognition Letters, 68, Part 1:126 – 131.

Mesquita, D. P., Gomes, J., Rodrigues, L. R., and Galvao, R. K. (2015a). Pruning extreme learning machines using the successive projections algorithm. IEEE Latin America Transactions, 13(12):3974–3979.

Mesquita, D. P., Gomes, J. P., Junior, A. H. S., and Nobre, J. S. (2017a). Euclidean distance estimation in incomplete datasets. Neurocomputing, 248:11 – 18. Neural Networks : Learning Algorithms and Classification Systems.

Mesquita, D. P., Rocha, L. S., Gomes, J. P. P., and Neto, A. R. R. (2016a). Classification with reject option for software defect prediction. Applied Soft Computing, 49:1085 – 1093.

Mesquita, D. P. P., Gomes, Antônio Nilo Araújo Neto, J. F. Q. n. J. P. P., and Rodrigues, L. R. (2016b). Using robust extreme learning machines to predict cotton yarn strength and hairiness. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - ESANN, pages 1–6.

Mesquita, D. P. P. and Gomes, J. P. P. (2017). Radial basis function neural networks for datasets with missing values. In Madureira, A. M., Abraham, A., Gamboa, D., and Novais, P., editors, Intelligent Systems Design and Applications, pages 108–115, Cham. Springer International Publishing.

Mesquita, D. P. P., Gomes, J. P. P., and Junior, A. H. S. (2015b). Ensemble of minimal learning machines for pattern classification. In Rojas, I., Joya, G., and Catala, A., editors, IWANN - Advances in Computational Intelligence, pages 142–152, Cham. Springer International Publishing.

Mesquita, D. P. P., Gomes, J. P. P., and Junior, A. H. S. (2017b). Epanechnikov kernel for incomplete data. Electronics Letters, 53(21):1408–1410.

Mesquita, D. P. P., Gomes, J. P. P., and Rodrigues, L. R. (2016c). Extreme learning machines for datasets with missing values using the unscented transform. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pages 85–90.

Mesquita, D. P. P., Gomes, J. P. P., and Rodrigues, L. R. (2016d). K-means for datasets with missing attributes: Building soft constraints with observed and imputed values. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - ESANN, pages 599–604.

Mesquita, D. P. P., Gomes, J. P. P., and Souza Jr, A. H. (2015c). A minimal learning machine for datasets with missing values. In Neural Information Processing: 22nd International Conference, ICONIP 2015, Istanbul, Turkey, November 9-12, 2015, Proceedings, Part I, pages 565–572. Springer International Publishing.

Mesquita, D. P. P., Gomes, J. P. P., and Souza Junior, A. H. (2017c). Ensemble of efficient minimal learning machines for classification and regression. Neural Processing Letters, 46(3):751–766.

Sovilj, D., Eirola, E., Miche, Y., Björk, K.-M., Nian, R., Akusok, A., and Lendasse, A. (2016). Extreme learning machine for missing data using multiple imputations. Neurocomputing, 174, Part A:220 – 231.