A Data-Driven Model Selection Approach to Spatio-Temporal Prediction

Rocío Zorrilla; Eduardo Ogasawara; Patrick Valduriez; Fábio Porto

doi:10.5753/sbbd.2022.224638

Rocío Zorrilla Laboratório Nacional de Computação Científica
Eduardo Ogasawara Centro Federal de Educação Tecnológica Celso Suckow da Fonseca
Patrick Valduriez INRIA / LIRMM
Fábio Porto Laboratório Nacional de Computação Científica

DOI: https://doi.org/10.5753/sbbd.2022.224638

Resumo

Spatio-temporal Predictive Queries encompass a spatio-temporal constraint, defining a region, a target variable, and an evaluation metric. The output of such queries presents the future values for the target variable computed by predictive models at each point of the spatio-temporal region. Unfortunately, especially for large spatio-temporal domains with millions of points, training temporal models at each spatial domain point is prohibitive. In this work, we propose a data-driven approach for selecting pre-trained temporal models to be applied at each query point. The chosen approach applies a model to a point according to the training and input time series similarity. The approach avoids training a different model for each domain point, saving model training time. Moreover, it provides a technique to decide on the best-trained model to be applied to a point for prediction. In order to assess the applicability of the proposed strategy, we evaluate a case study for temperature forecasting using historical data and auto-regressive models. Computational experiments show that the proposed approach, compared to the baseline, achieves equivalent predictive performance using a composition of pre-trained models at a fraction of the total computational cost.

Palavras-chave: Spatio-temporal, Prediction, Model Selection

Referências

Box, G. and Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control. Holden-Day.

Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. (2017). Clipper: A low-latency online prediction serving system. In NSDI’17-USENIX, pages 613-627, Boston, MA. USENIX Association.

Du, S. S., Wang, Y., Zhai, X., Balakrishnan, S., Salakhutdinov, R. R., and Singh, A. (2018). How many samples are needed to estimate a convolutional neural network? In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.

Ghanta, S., Subramanian, S., Khermosh, L., Sundararaman, S., Shah, H., Goldberg, Y., Roselli, D., and Talagala, N. (2019). ML health monitor: taking the pulse of machine learning algorithms in production. In Applications of Machine Learning, volume 11139, pages 191-202. International Society for Optics and Photonics, SPIE.

Hassani, H. and Silva, E. S. (2015). Forecasting with big data: A review. Annals of Data Science, 2(1):5-19.

Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, 2nd Edition. Springer.

Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for r. Journal of Statistical Software, Articles, 27(3):1-22.

Hyndman, R. J. and Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4):679-688.

I. Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., and Muller, P.-A. (2019). Deep learning for time series classification: A review. Data Min. Knowl. Discov., 33(4):917-963.

Izakian, H., Pedrycz, W., and Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence, 39:235-244.

Liao, T. W. (2005). Clustering of time series data: A survey. Pattern Recognition, 38(11):1857-1874.

Mirzasoleiman, B. (2021). Efficient machine learning from massive datasets.

Murat, M., Malinowska, I., Gos, M., and Krzyszczak, J. (2018). Forecasting daily meteorological time series using arima and regression models. International Agrophysics, 32(2):253-264.

Oregi, I., Perez, A., Del Ser, J., and Lozano, J. A. (2017). On-line dynamic time warping for streaming time series. In Machine Learning and Knowledge Discovery in Databases, pages 591-605, Cham. Springer International Publishing.

Pereira, R., Souto, Y., Chaves, A., Zorrilla, R., Tsan, B., Rusu, F., Ogasawara, E., Ziviani, A., and Porto, F. (2021). DJEnsemble: A Cost-Based Selection and Allocation of a Disjoint Ensemble of Spatio-Temporal Models, page 226-231. ACM, NY, USA.

Polyzotis, N., Roy, S., Whang, S. E., and Zinkevich, M. (2018). Data lifecycle challenges in production machine learning: A survey. SIGMOD Rec., 47(2):17-28.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53-65.

Saha, S., Moorthi, S., Wu, X., Wang, J., Nadiga, S., and Becker, E. (2011). Ncep climate forecast system version 2 (cfsv2) selected hourly time-series products.

Sakoe, H. and Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1):43-49.

Souto, Y. M., Porto, F., de Carvalho Moura, A. M., and Bezerra, E. (2018). A spatiotemporal ensemble approach to rainfall forecasting. In IJCNN, 2018, pages 1-8.

Wang, W., Gao, J., Zhang, M., Wang, S., Chen, G., Ng, T. K., Ooi, B. C., Shao, J., and Reyad, M. (2018). Rafiki: Machine learning as an analytics service system. Proc. VLDB Endow., 12(2):128-140.

Xu, G., Ren, T., Chen, Y., and Che, W. (2020). A one-dimensional cnn-lstm model for epileptic seizure recognition using eeg signal analysis. Frontiers in Neuroscience, 14:1253.