Dealing with categorical missing data using CleanerR
Resumo
Missing data is a common problem in the world of data analysis. They appear in datasets due to a multitude of reasons, from data integration to poor data input. When faced with the problem, the analyst must decide what to do with the missing data since its not always advisable to discard these values from your analysis. On this paper we shall discuss a method that takes into account information theory and functional dependencies to best imput missing values.
Referências
Neural Networks, 1:598–603.
Burgette LF, R. J. (2010). Multiple imputation for missing data via sequential regression trees. American Journal of Epidemiology, 172:1070–1076.
Harrell Jr, F. E., with contributions from Charles Dupont, and many others. (2019). Hmisc: Harrell Miscellaneous. R package version 4.2-0.
Honaker, J., King, G., and Blackwell, M. (2011). Amelia II: A program for missing data. Journal of Statistical Software, 45(7):1–47.
Pereira, R. S. (2019). cleanerR: How to Handle your Missing Data. R package version 0.1.1.
R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Troyanskaya O1, Cantor M, S. G. B. P. H. T. T. R. B. D. A. R. (2001). Missing value estimation methods for dna microarrays. Bioinformatics, 17(06):520–525.
van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3):1–67.