SmarT: Using Machine Learning to Filtering and Retrieval Spatial and Temporal Data in Big Data
Abstract
With the tremendous growth of big data time series, the efficient filtering and retrieval of large volumes of spatial and temporal data have become one of the biggest challenges for big data time series processing. Although some big data systems have been proposed to tackle these problems, none of them is considered a clear winner for all possible scenarios. This paper presents the SmarT search engine, a machine learning based solution that chooses the best big data system for filtering and retrieval of spatial and temporal data on the fly. In a detailed experimental evaluation, considering the Apache Spark, Elasticsearch, and SciDB big data systems, SmarT was able to reduce the response time in up to 22%.
References
R. Carmona. Statistical Analysis of Financial Data in R. Springer-Verlag New York, 2014.
H. Chen and N. Zhang. Graph-based change-point detection. The Annals of Statistics, 43(1):139–176, February 2015. ISSN 0090-5364.
D. De Paepe, S. V. Hautte, B. Steenwinckel, F. De Turck, F. Ongenae, O. Janssens, and S. V. Hoecke. A generalized matrix profile framework with support for contextual series analysis. Engineering Applications of Artificial Intelligence, 90, 2020. ISSN 0952-1976.
B. Eriksson, P. Barford, R. Bowden, N. Duffield, J. Sommers, and M. Roughan. Basisdetect: A model-based network event detection framework. In Proceedings of the 10th ACM SIGCOMM, page 451–464, New York, NY, USA, 2010. Association for Computing Machinery.
M. Gupta, J. Gao, C. Aggarwal, and J. Han. Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering, 2014. ISSN 1041-4347.
V. Guralnik and J. Srivastava. Event Detection from Time Series Data. In Proceedings of the Fifth ACM SIGKDD, KDD ’99, pages 33–42, New York, NY, USA, 1999. ACM. ISBN 978-1-58113-143-7.
G. Lu, Y. Zhou, C. Lu, and X. Li. A novel framework of change-point detection for machine monitoring. Mechanical Systems and Signal Processing, 83, 07 2016.
H. Raza, G. Prasad, and Y. Li. EWMA Model Based Shift-Detection Methods for Detecting Covariate Shifts in Non-Stationary Environments. Pattern Recogn., 48(3):659–669, March 2015. ISSN 0031-3203.
F. Rehbach, S. Moritz, S. Chandrasekaran, M. Rebolledo, M. Friese, and T. Bartz-Beielstein. GECCO 2018 Industrial Challenge: Monitoring of drinking-water quality. pages 1–7, 2018.
R. Salles, K. Belloze, F. Porto, P.H. Gonzalez, and E. Ogasawara. Nonstationary time series transformation methods: An experimental review. Knowledge-Based Systems, 164:274–291, 2019.
D. Silva, A. Simões, C. Cardoso, D. E. M. Oliveira, Y. Souto, L. E. G. Vignoli, R. Salles, H. S. C. Jr, A. Ziviani, E. Ogasawara, F. C. Delicato, P. F. Pires, H. L. C. P. Pinto, L. Maia, and F. Porto. A conceptual vision toward the management of machine learning models. In Proceedings of the ER Forum 2019, Salvador, Bahia, Brazil, volume 2469, pages 15–27, 2019.
J.-I. Takeuchi and K. Yamanishi. A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering, 18(4):482–492, 2006.
P. D. Talagala, R. J. Hyndman, K. Smith-Miles, S. Kandanaarachchi, and M. Muñoz. Anomaly Detection in Streaming Nonstationary Temporal Data. Journal of Computational and Graphical Statistics, 29(1):13–27, 2020.
Yahoo! Webscope. Labeled anomaly detection dataset. March 2015.
L. Xiong, C. Jiang, C. Xu, K. Yu, and S. Guo. A framework of change-point detection for multivariate hydrological series. Water Resources Research, 51, 09 2015.
