Data Cleansing of Multiple Environmental Monitoring Time Series Using Spatio-Temporal Correlation

  • Ranier A. A. Moura UECE
  • Domingos B. S. Santos UECE
  • Daniel G. M. Lira UECE
  • José E. B. Maia UECE

Resumo


Aplicações computacionais baseadas em dados de sensores são uma realidade, mas os dados coletados e transmitidos para as aplicações raramente chegam prontos para o uso devido a perdas e ruídos de vários tipos. Neste trabalho desenvolve-se uma abordagem baseada em correlação espaço temporal para limpeza de dados de múltiplas séries temporais de sensores quanto à ruído, dados ausentes e outliers. O método foi testato em seis conjuntos de dados reais publicamente disponíveis e o seu desempenho foi comparado com um método baseline, com um autoencoder denoising e com outro método publicado. Os resultados mostram que a abordagem proposta é competitiva e requer menos dados de treinamento do que os concorrentes.

Referências

Bodik, P., Hong, W., Guestrin, C., Madden, S., Paskin, M., and Thibaux, R. (2004). Intel lab data. Online dataset.

Candanedo, L. M. and Feldheim, V. (2016). Accurate occupancy detection of an office room from light, temperature, humidity and co2 measurements using statistical learning models. Energy and Buildings, 112:28–39.

De Aquino, A. L., Figueiredo, C. M., Nakamura, E. F., Buriol, L. S., Loureiro, A. A., Fernandes, A. O., and Claudionor Jr, J. (2007). Data stream based algorithms for wireless sensor network applications. In 21st International Conference on Advanced Information Networking and Applications (AINA’07), pages 869–876. IEEE.

Ding, X., Wang, H., Su, J., Li, Z., Li, J., and Gao, H. (2019). Cleanits: a data cleaning system for industrial time series. Proceedings of the VLDB Endowment, 12(12):1786– 1789.

Fulcher, B. D. (2017). Feature-based time-series analysis. arXiv preprint ar-Xiv:1709.08055.

Kong, L. and Mamouras, K. (2020). Streamql: a query language for processing streaming time series. Proceedings of the ACM on Programming Languages, 4(OOPSLA):1–32.

Le Borgne, Y.-A., Dricot, J.-M., and Bontempi, G. (2007). Principal component aggregation for energy efficient information extraction in wireless sensor networks. Knowledge Discovery from Sensor Data.

Liguori, A., Markovic, R., Dam, T. T. H., Frisch, J., van Treeck, C., and Causone, F. (2021). Indoor environment data time-series reconstruction using autoencoder neural networks. Building and Environment, 191:107623.

Mois, G., Folea, S., and Sanislav, T. (2017). Analysis of three iot-based wireless sensors for environmental monitoring. IEEE Transactions on Instrumentation and Measurement, 66(8):2056–2064.

Morettin, P. A. and Toloi, C. (2006). Análise de séries temporais. Editora Blucher.

Nunes, F. R., Macêdo, C. d. S., Soares, J. d. N., Cavalcante, H. G., Brilhante, M. Q., and Maia, J. E. (2020). Fuzzy-probabilistic approach for dense wireless sensor network. In International Conference on Intelligent Systems Design and Applications, pages 1018–1027. Springer.

Oliveira, L. M. and Rodrigues, J. J. (2011). Wireless sensor networks: a survey on environmental monitoring. JOURNAL OF COMMUNICATIONS, 6(2):143.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830.

Read, J., Rios, R. A., Nogueira, T., and de Mello, R. F. (2020). Data streams are time series: Challenging assumptions. In Brazilian Conference on Intelligent Systems, pages 529–543. Springer.

Shrestha, A. and Mahmood, A. (2019). Review of deep learning algorithms and architectures. IEEE Access, 7:53040–53065.

Sun, C., Chen, Y., and Cheng, C. (2021). Imputation of missing data from offshore wind farms using spatio-temporal correlation and feature correlation. Energy, 229:120777.

Tan, Y. L., Sehgal, V., and Shahri, H. H. (2005). Sensoclean: Handling noisy and incomplete data in sensor networks using modeling. Main, pages 1–18.

Wang, X. and Wang, C. (2019). Time series data cleaning: A survey. IEEE Access, 8:1866–1881.

Xiao, H., Lu, C., and Ogai, H. (2017). A new low-power wireless sensor network for realtime bridge health diagnosis system. In Society of Instrument and Control Engineers of Japan (SICE), 2017 56th Annual Conference of the, pages 1565–1568. IEEE.

Yates, R. D. and Goodman, D. J. (2014). Probability and stochastic processes: a friendly introduction for electrical and computer engineers. John Wiley & Sons.

Zhang, A., Song, S., Wang, J., and Yu, P. S. (2017). Time series data cleaning: From anomaly detection to anomaly repairing. Proceedings of the VLDB Endowment, 10(10):1046–1057.
Publicado
29/11/2021
Como Citar

Selecione um Formato
MOURA, Ranier A. A.; SANTOS, Domingos B. S.; LIRA, Daniel G. M.; MAIA, José E. B.. Data Cleansing of Multiple Environmental Monitoring Time Series Using Spatio-Temporal Correlation. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 18. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 197-208. DOI: https://doi.org/10.5753/eniac.2021.18253.