Handling missing values in data streams: An overview

Resumo


Missing values are a common problem in streaming scenarios, mainly due to equipment faults, network errors, and data unpredictability. This paper presents an overview of handling missing values in data streams, elucidating key concepts and summarizing recent studies that tackle this issue. It highlights limitations related to data stream requisites, concept drift exploration, and missing mechanism assumptions. Our discussion aims to indicate open issues and contribute to new research initiatives in this area.

Palavras-chave: Data stream, Preprocessing, Missing values, Imputation

Referências

Bahri, M., Bifet, A., Gama, J., Gomes, H. M., and Maniu, S. (2021). Data stream analysis: Foundations, major tasks and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(3):e1405.

Beyer, C., Büttner, M., and Spiliopoulou, M. (2023). Challenges for active feature acquisition and imputation on data streams. In Proceedings of the Workshop on IAL co-located with ECML-PKDD, volume 3470, pages 9–13, Torino, Italy. CEUR.

Bifet, A., Gavalda, R., Holmes, G., and Pfahringer, B. (2023). Machine learning for data streams: with practical examples in MOA. MIT press, 4th edition.

Dong, W., Gao, S., Yang, X., and Yu, H. (2021). An exploration of online missing value imputation in non-stationary data stream. SN Computer Science, 2:1–11.

Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big data, 8:1–37.

Fountas, P. and Kolomvatsos, K. (2020). A continuous data imputation mechanism based on streams correlation. In 2020 IEEE Symposium on Computers and Communications (ISCC), pages 1–6. IEEE.

Gama, J. (2010). Knowledge discovery from data streams. Chapman and Hall/CRC, Boca Raton, Florida, USA.

Grant, M. J. and Booth, A. (2009). A typology of reviews: an analysis of 14 review types and associated methodologies. Health information & libraries journal, 26(2):91–108.

Halder, B., Ahmed, M. M., Amagasa, T., Isa, N. A. M., Faisal, R. H., and Rahman, M. M. (2022). Missing information in imbalanced data stream: fuzzy adaptive imputation approach. Applied Intelligence, 52(5):5561–5583.

Hu, H., Kantardzic, M., and Sethi, T. S. (2020). No free lunch theorem for concept drift detection in streaming data classification: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2):e1327.

Li, X., Li, H., Lu, H., Jensen, C. S., Pandey, V., and Markl, V. (2023). Missing value imputation for multi-attribute sensor data streams via message propagation. Proceedings of the VLDB Endowment, 17(3):345–358.

Lin, W.-C. and Tsai, C.-F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53:1487–1509.

Little, R. J. and Rubin, D. B. (2019). Statistical analysis with missing data, volume 793. John Wiley & Sons, Hoboken, New Jersey, USA.

Liu, W., Luo, L., and Zhou, L. (2023). Online missing value imputation for high-dimensional mixed-type data via generalized factor models. Computational Statistics & Data Analysis, 187:107822.

Mahdi, O. A., Ali, N., Pardede, E., Alazab, A., Al-Quraishi, T., and Das, B. (2024). Roadmap of concept drift adaptation in data stream mining, years later. IEEE Access, 12.

Ren, L., Wang, T., Seklouli, A. S., Zhang, H., and Bouras, A. (2023). A review on missing values for main challenges and methods. Information Systems, page 102268.

Sun, Z., Zeng, G., and Ding, C. (2020). Imputation for missing items in a stream data based on gamma distribution. In International Conference on Smart Computing and Communication, pages 236–247. Springer.

Zhang, Y. and Thorburn, P. J. (2022). Handling missing data in near real-time environmental monitoring: A system and a review of selected methods. Future Generation Computer Systems, 128:63–72.
Publicado
14/10/2024
S. LIMA, Afonso M.; SOUSA, Elaine P. M. de. Handling missing values in data streams: An overview. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 750-756. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2024.243102.