Detecção de Anomalias em Dados Meteorológicos do Sertão de Pernambuco Utilizando Isolation Forest e DBSCAN
Abstract
Anomalous values are some of the problems present in meteorological time series, which may appear due to defects, bad sensor configuration, and even extreme climate effects. Using non-supervised machine learning algorithms has become increasingly common for this type of problem. The present research intends to evaluate the usage of DBSCAN (Density Based Spatial Clustering of Application with Noise) and IF (Isolation Forest) for detecting anomalies present in the meteorological data on air temperature and relative humidity of Petrolina. Both Isolation Forest and DBSCAN, in their best hyperparameter settings, performed well. The IF had an accuracy of 98% and an F1 score of 95%. DBSCAN presented an accuracy of 97% and an F1 score of 94%. Both also got a revocation of 100%, which indicates that they did not classify values as false negatives, that is, no anomaly was considered normal.
References
Angiulli, F., & Fassetti, F. (2007). Detecting distance-based outliers in streams of data. In Proceedings of the 16th ACM conference on Conference on information and knowledge management (pp. 811-820).
Arvor, D., Jonathan, M., Meirelles, M. S. P., & Dubreuil, V. (2008). Detecting outliers and asserting consistency in agriculture ground truth information by using temporal VI data from modis. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, v. 37, pt. B7, p. 1031-1036, 2008. Edition of Proceedings of 11th ISPRS Congress, Beijing, Jul. 2008.
Basu, S., & Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11, 137-154.
Celik, M., Dadaşer-Celik, F., & Dokuz, A. S. (2011). Anomaly detection in temperature data using DBSCAN algorithm. In 2011 international symposium on innovations in intelligent systems and applications (pp. 91-95). IEEE.
Centro de Estudos Avançados em Economia Aplicada (2020) Metodologia - PIB do Agronegócio, 2020, Disponível em: [link].
Chergui, N., Kechadi, M. T., & McDonnell, M. (2020). The impact of data analytics in digital agriculture: A review. In 2020 International Multi-Conference on: “Organization of Knowledge and Advanced Technologies”(OCTA) (pp. 1-13). IEEE.
Dasgupta, D., & Forrest, S. (1996). Novelty detection in time series data using ideas from immunology. In Proceedings of the international conference on intelligent systems (pp. 82-87).
Doblas-Reyes, F., Garcia, A., Hansen, J., Mariani, L., Nain, A., Ramesh, K. & Venkataraman, R. (2003). Weather and climate forecasts for agriculture. Guide to agricultural, meteorological practices.
Eze, C., Okeke-Uzodike, O. E., Emmanuel, E. I., & Mkpojiogu, E. O. (2022). Emotional Intelligence as a Predictor of Success in e-Learning Engagement During COVID-19: A Case of Veritas University Abuja, Nigeria. In ICT Infrastructure and Computing: Proceedings of ICT4SD 2022 (pp. 275-286). Singapore: Springer Nature Singapore.
Gupta, R., Nahrstedt, K., Suri, N., & Smith, J. (2021). Svad: End-to-end sensory data analysis for iobt-driven platforms. In 2021 IEEE 7th World Forum on Internet of Things (WF-IoT) (pp. 903-908). IEEE.
Hatfield, J. L., & Prueger, J. H. (2015). Temperature extremes: Effect on plant growth and development. Weather and climate extremes, 10, 4-10.
INMET (2011). Rede de estações meteorológicas automáticas do INMET. Relatório Técnico, Instituto Nacional de Meteorologia.
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation forest. In 2008 8th IEEE international conference on data mining (pp. 413-422). IEEE.
Lu, Y., Kumar, J., Collier, N., Krishna, B., & Langston, M. A. (2018). Detecting outliers in streaming time series data from ARM distributed sensors. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 779-786). IEEE.
Luo, H., Jia, S., & Zhang, W. (2019). Hierarchical temporal memory based anomaly detection for hydrological monitoring of unmanned surface vehicle. In 2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP) (pp. 420-424). IEEE.
Ministério da Agricultura, Pecuária e Abastecimento (2021). Projeções do Agronegócio 2020-2021 a 2030-2031. Disponível em: [link].
Ministry of Agriculture of British Columbia (2015) Understanding Humidity Control in Greenhouses, Disponível em: [link].
Santos, M. F., Oliveira, W. R., Amorim, M., & Stosic, T. (2019). Análise topológica de dados para caracterização de periodicidade em séries temporais de dados pluviométricos. Em Revista Mundi Engenharia, Tecnologia e Gestão (ISSN: 2525-4782). 4.
Wibisono, S., Anwar, M. T., Supriyanto, A., & Amin, I. H. A. (2021). Multivariate weather anomaly detection using DBSCAN clustering algorithm. In Journal of Physics: Conference Series (Vol. 1869, No. 1, p. 012077). IOP Publishing.
Yuxiang, S., Kunqing, X., Xiujun, M., Xingxing, J., Wen, P., & Xiaoping, G. (2005). Detecting spatio-temporal outliers in climate dataset: A method study. In Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS'05. (Vol. 2, pp. 4-pp). IEEE.
Zemicheal, T., & Dietterich, T. G. (2019). Anomaly detection in the presence of missing values for weather data quality control. In Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies (pp. 65-73).
