Avaliação das estruturas de arquivo para processamento de dados sísmicos com alto desempenho na nuvem computacional
Abstract
Applications in the seismology field rely on the processing of up to hundreds of terabytes of data and their performance may be strongly affected by IO operations. In this paper, we generalize the main file structures currently used to store seismic data and evaluate their performance. We present a theoretical analysis of data loading operations and a benchmark on the AWS public cloud, using three different storage technologies (HDD, SSD, and EFS). We show that an adequate choice of the file structure for a typical use case enables an up to 193 times reduction on the amount of data read and 139 times speedup in time. Our results also indicate that the use of more expensive cloud instances presents negligible effects on the performance of network storage, despite featuring enhanced network transmission capacity.
References
AWS (2016). AWS Storage Services Overview - Whitepaper. Acesso em: 05/08/2019.
AWS (2019). Amazon Web Service (AWS). Acesso em: 05/08/2019.
Barry, K., Cavers, D., and Kneale, C. (1975). Recommended standards for digital tape formats. Geophysics, 40(2):344–352.
Brandsberg-Dahl, S. (2017). High-performance computing for seismic imaging
Claerbout, J. (1991). Introduction to seplib and sep utility software. SEP-70: Stanford Exploration Project, 413:436.
Fomel, S., Sava, P., Vlad, I., Liu, Y., and Bashkardin, V. (2013). Madagascar: Opensource software project for multidimensional data analysis and reproducible computational experiments. Journal of Open Research Software.
IRIS/PASSCAL Data Group (2012). Introduction to Active Source Data Archiving Utilizing PH5 as the Archive Format. Technical Report 2012336, IRIS/PASSCAL Instrument Center.
J. Anderson, W.E. Farrell, K. G. J. G. H. S. (1990). Center for Seismic Studies version 3 database: Schema reference manual. Technical Report C90-01, Science Applications International Corp, Center for Seismic Studies.
John W. Stockwell, J. and Cohen, J. K. (2008). The New SU User’s Manual. Último acesso em: 05/08/2019.
Koziol, Q. and Editor Padua, D. (2011). HDF5, pages 827–833. Springer US, Boston, MA.
Krischer, L., Smith, J., Lei, W., Lefebvre, M., Ruan, Y., de Andrade, E. S., Podhorszki, N., Bozdağ, E., and Tromp, J. (2016). An Adaptable Seismic Data Format. Geophysical Journal International.
Li, Huailiang, T. X. S. T. H. M. J. e. C. J. (2017). An improved lossless group compression algorithm for seismic data in SEG-Y and MiniSEEDfile formats. Computers and Geosciences, pages 41 – 45.
Mayne, W. H. (1962). Common reflection point horizontal data stacking techniques. Geophysics, 27(6):927–938.
Nickerson, Bradford G., J. P. A. e. M. L. A. (1999). Data structures for fast searching of SEG-Y seismic data. Computers and Geosciences, pages 179 – 190.
Open University (2011). File:Seismic acquisition cartoon marine.jpg. Licença CC-BYSA-NC. Último acesso em: 18/08/2019.
Rubini, A. and Corbet, J. (2001). Linux device drivers. ”O’Reilly Media”.
Samet, H. (1984). The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR).
Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geophysics, 49(8):1259–1266.
Yang, X., McLaughlin, K., and North, R. (2000). User’s Guide to the CMR Seismic/Hydroacoustic/Infrasonic Data Products.
