A Parallel and Distributed Approach to the Analysis of Time Series on Remote Sensing Big Data
Keywords:Big Data, Parallel Programming, Remote Sensing, Time Series Analysis
The era of Remote Sensing Big Data has arrived. Indeed, massive amounts of remotely sensed data have been collected by different countries from a large number of Earth observation spaceborne and airborne sensors. They allow us to identify meaningful changes in the Earth’s surface that may affect whole ecological systems and be a threat to biodiversity. Crucial to that end is time series analysis of remote sensing images, for which the Time-Weighted Dynamic Time Warping (TWDTW) algorithm stands out as one of the most used approaches found in the literature so far. However, the computational complexity of the TWDTW algorithm makes it rather inefficient for Remote Sensing Big Data. Also, the huge volume of high spatial-temporal resolution remote sensing data cannot be handled by a single computing node. To overcome that drawback, this work proposes a parallel algorithm, named SP-TWDTW (Spatial Parallel TWDTW), that allows for the analysis of large scale time series using Manycore architectures (GPU). In order to process massive time series of remote sensing data in a cluster of computers, an approach for distributing the TWDTW processing is introduced in this paper.
Bagnall, A., Lines, J., Bostrom, A., Large, J., and Keogh, E. The great time series classification bake off: areview and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31 (3):606–660, 2017.
Battude, M., Al Bitar, A., Morin, D., Cros, J., Huc, M., Sicre, C. M., Le Dantec, V., and Demarez, V. Estimating maize biomass and yield over large areas using high spatial and temporal resolution sentinel-2 like remote sensing data. Remote Sensing of Environment vol. 184, pp. 668–681, 2016.
Bégué, A., Arvor, D., Bellon, B., Betbeder, J., De Abelleyra, D., PD Ferraz, R., Lebourgeois, V., Le-long, C., Simões, M., and R Verón, S. Remote sensing and cropping practices: A review. Remote Sensing 10 (1): 99, 2018.
Camara, G., Assis, L. F., Ribeiro, G., Ferreira, K. R., Llapa, E., and Vinhas, L. Big earth observation data analytics: Matching requirements to system architectures. In Proceedings of the 5th ACM SIGSPATIAL international workshop on analytics for big geospatial data. ACM, New York, NY, USA, pp. 1–6, 2016.
Chebbi, I., Boulila, W., Mellouli, N., Lamolle, M., and Farah, I. A comparison of big remote sensing data processing with hadoop mapreduce and spark. In 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). IEEE, Sousse, Tunisia, pp. 1–4, 2018.
Chi, M., Plaza, A., Benediktsson, J. A., Sun, Z., Shen, J., and Zhu, Y. Big data for remote sensing: Challenges and opportunities. Proceedings of the IEEE 104 (11): 2207–2219, 2016.
Costa, W. S., Fonseca, L. M. G., Korting, T. S., Simões, M., do Nascimento Bendini, H., and Souza, R. C. M. Segmentation of optical remote sensing images for detecting homogeneous regions in space and time. Revista Brasileira de Cartografia 70 (5): 1779–1801, 2018.
Cressie, N. and Wikle, C. K. Statistics for spatio-temporal data. John Wiley & Sons, 2015.
de Oliveira, S. S. T., M. L. Pascoal, L., Ferreira, L., de Castro Cardoso, M., Bueno, E., Vagner, J., and Martins, W. S. Sp-twdtw: A new parallel algorithm for spatio-temporal analysis of remote sensing images. In XIX Brazilian Symposium on Geoinformatics. GeoInfo, Campina Grande, PB, Brasil, pp. 46–57, 2018.
Gómez, C., White, J. C., and Wulder, M. A. Characterizing the state and processes of change in a dynamic forest environment using hierarchical spatio-temporal segmentation. Remote Sensing of Environment 115 (7): 1665–1679,
Huang, W., Meng, L., Zhang, D., and Zhang, W. In-memory parallel processing of massive remotely sensed data using an apache spark on hadoop yarn model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (1): 3–19, 2017.
Jamali, S., Jönsson, P., Eklundh, L., Ardö, J., and Seaquist, J. Detecting changes in vegetation trends using time series segmentation. Remote Sensing of Environment vol. 156, pp. 182–195, 2015.
Japkowicz, N. and Stefanowski, J. Big Data Analysis: New Algorithms for a New Society. Springer, 2016.
João Jr, M., Sena, A. C., and Rebello, V. E. Implementação e avaliação de técnicas de paralelização no algoritmo de hirschberg para sistemas multicore. In Anais do XVIII Simpósio em Sistemas Computacionais de Alto Desempenho. SBC, Porto Alegre, RS, Brasil, 2017.
Li, J. and Heap, A. D. Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software vol. 53, pp. 173–189, 2014.
Liu, P., Di, L., Du, Q., and Wang, L. Remote sensing big data: theory, methods and applications, 2018.
Lu, M., Chen, J., Tang, H., Rao, Y., Yang, P., and Wu, W. Land cover change detection by integrating object-based data blending model of landsat and modis. Remote Sensing of Environment vol. 184, pp. 374–386, 2016.
Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya, A., and Jie, W. Remote sensing big data computing: Challenges and opportunities. Future Generation Computer Systems vol. 51, pp. 47–60, 2015.
Maus, V., Câmara, G., Cartaxo, R., Sanchez, A., Ramos, F. M., and de Queiroz, G. R. A time-weighted dynamic time warping method for land-use and land-cover mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (8): 3729–3739, 2016.
Maus, V., Câmara, G., Appel, M., and Pebesma, E. dtwsat: Time-weighted dynamic time warping for satellite image time series analysis in r. Journal of Statistical Software, Articles 88 (5): 1–31, 2019.
Mitas, L. and Mitasova, H. Spatial interpolation. Geographical information systems: principles, techniques, management and applications vol. 1, pp. 481–492, 1999.
Olofsson, P., Foody, G. M., Stehman, S. V., and Woodcock, C. E. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sensing of Environment vol. 129, pp. 122–131, 2013.
Olson, M. Hadoop: Scalable, flexible data storage and analysis. IQT Quart 1 (3): 14–18, 01, 2010.
Petitjean, F., Inglada, J., and Gançarski, P. Satellite image time series analysis under time warping. IEEE Transactions on Geoscience and Remote Sensing 50 (8): 3081–3095, 2012.
Petitjean, F. and Weber, J. Efficient satellite image time series analysis under time warping. Ieee geoscience and remote sensing letters 11 (6): 1143–1147, 2014.
Qu, J. J., Gao, W., Kafatos, M., Murphy, R. E., and Salomonson, V. V. Earth Science Satellite Remote Sensing: Vol. 2: Data, Computational Processing, and Tools. Springer, 2006.
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E. Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data (TKDD) 7 (3): 10, 2013.
Ranjan, R. Streaming big data processing in datacenter clouds. IEEE Cloud Computing 1 (1): 78–83, 2014.
Shabib, A., Narang, A., Niddodi, C. P., Das, M., Pradeep, R., Shenoy, V., Auradkar, P., Vignesh, T., and Sitaram, D. Parallelization of searching and mining time series data using dynamic time warping. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, Kochi, India, pp. 343–348, 2015.
Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference. ACM, New York, NY, USA, pp. 517–524, 1968.
Stein, M. L. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 2012.
Vatsavai, R. R. Machine Learning Algorithms for Spatio-temporal Data Mining. Ph.D. thesis, University of Minnesota, Minneapolis, MN, USA, 2008. AAI3338985.
Verbesselt, J., Hyndman, R., Newnham, G., and Culvenor, D. Detecting trend and seasonal changes in satellite image time series. Remote sensing of Environment 114 (1): 106–115, 2010.
White, T. Hadoop: The Definitive Guide. O’Reilly Media, Inc., 2009.
Xiao, L., Zheng, Y., Tang, W., Yao, G., and Ruan, L. Parallelizing dynamic time warping algorithm using prefix computations on gpu. In High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on. IEEE, Zhangjiajie Shi, China, pp. 294–299, 2013.
Xu, F., Lin, Y., Huang, J., Wu, D., Shi, H., Song, J., and Li, Y. Big data driven mobile traffic understanding and forecasting: A time series approach. IEEE transactions on services computing 9 (5): 796–805, 2016.
Yin, H., Yang, S., Ma, S., Liu, F., and Chen, Z. A novel parallel scheme for fast similarity search in large time series. China Communications 12 (2): 129–140, 2015.
Yu, J., Zhang, Z., and Sarwat, M. Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 22 (4): 1–42, 2018.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. Spark: Cluster computing with working sets. HotCloud 10 (10-10): 95, 2010.
Zhu, H., Gu, Z., Zhao, H., Chen, K., Li, C.-T., and He, L. Developing a pattern discovery method in time series data and its gpu acceleration. Big Data Mining and Analytics 1 (4): 266–283, 2018.