Beyond Click-and-View: a Comparative Study of Data Management Approaches for Interactive Visualization

Authors

  • Lorenna Christ'na Nascimento Universidade Federal Fluminense
  • Rodolfo P. Chagas Universidade Federal Fluminense
  • Marcos Lage Universidade Federal Fluminense
  • Daniel de Oliveira Universidade Federal Fluminense

DOI:

https://doi.org/10.5753/jidm.2022.2513

Keywords:

Interactive Vis, Database Evaluation

Abstract

Visual analytics solutions have been growing in popularity in recent years, not only for showing final results but also for assisting in interactive analysis and decision-making. Analysis of a large amount of data requires flexible exploration and visualizations. However, queries that span geographical regions over time slices are expensive to compute, which turns it challenging to accomplish interactive speeds for huge data sets. Such systems require efficient data availability, so that response time does not interfere with the user’s ability to observe and analyze. Simultaneously, researches in the database domain have proposed solutions that can be used to support visualization systems. This article presents a comparative study of data management approaches to support interactive visualizations. The chosen data management solutions are (i) Apache Drill (a Polystore system), (ii) Apache Spark (a big data framework), (iii) Elasticsearch (a search engine), (iv) MonetDB (a column-oriented DBMS), and (v) PostgreSQL (a relational DBMS). To evaluate the performance of each solution, we selected a list of spatiotemporal queries among multiple queries submitted by users in a visual analytics system for rainfall data analysis named TEMPO. The results of this study show that Apache Spark and MonetDB present the best performance for the selected queries.

Downloads

Download data is not yet available.

References

Armbrust, M., Bateman, D., Xin, R., and Zaharia, M. Introduction to spark 2.0 for database researchers. In SIGMOD ’16. San Francisco, California, USA, pp. 2193–2194, 2016.

Battle, L., Chang, R., and Stonebraker, M. Dynamic prefetching of data tiles for interactive visualization. In SIGMOD. ACM, pp. 1363–1375, 2016.

Battle, L., Eichmann, P., Angelini, M., Catarci, T., Santucci, G., Zheng, Y., Binnig, C., Fekete, J.-D., and Moritz, D. Database benchmarking for supporting real-time interactive querying of large data. In SIGMOD. New York, NY, USA, pp. 1571–1587, 2020.

Bays, W. and Lange, K. SPEC: driving better benchmarks. In Third Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE’12, Boston, MA, USA - April 22 - 25, 2012, D. R. Kaeli, J. Rolia, L. K. John, and D. Krishnamurthy (Eds.). ACM, pp. 249–250, 2012.

Bilokon, P. A. Python, Data Science and Machine Learning - From Scratch to Productivity. WorldScientific, 2022.

Blanco, G., Traina, A. J. M., Jr., C. T., Azevedo-Marques, P. M., Jorge, A. E. S., de Oliveira, D., and Bedo, M. V. N. A superpixel-driven deep learning approach for the analysis of dermatological wounds. Comput. Methods Programs Biomed. vol. 183, 2020.

Boncz, P. A., Flokstra, J., Grust, T., van Keulen, M., Manegold, S., Mullender, K. S., Rittinger, J., and Teubner, J. Monetdb/xquery-consistent and efficient updates on the pre/post plane. In Advances in Database Technology - EDBT 2006, 10th International Conference on Extending Database Technology, Munich, Germany, March 26-31, 2006, Proceedings, Y. E. Ioannidis, M. H. Scholl, J. W. Schmidt, F. Matthes, M. Hatzopoulos, K. Böhm, A. Kemper, T. Grust, and C. Böhm (Eds.). Lecture Notes in Computer Science, vol. 3896. Springer, pp. 1190–1193, 2006.

Caban, J. J. and Gotz, D. Visual analytics in healthcare–opportunities and research challenges, 2015.

Ciesielska, M., Rizun, N., and Janowski, T. Interdisciplinarity in smart sustainable city education: exploring educational offerings and competencies worldwide. In 54th Hawaii International Conference on System Sciences, HICSS 2021, Kauai, Hawaii, USA, January 5, 2021. ScholarSpace, pp. 1–10, 2021.

de Oliveira, D., Ocaña, K. A. C. S., Baião, F. A., and Mattoso, M. A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10 (3): 521–552, 2012.

de Oliveira, D., Ocaña, K. A. C. S., Ogasawara, E. S., Dias, J., de A. R. Gonçalves, J. C., Baião, F. A., and Mattoso, M. Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows. Future Gener. Comput. Syst. 29 (7): 1816–1825, 2013.

Doraiswamy, H. and Freire, J. A gpu-friendly geometric data model and algebra for spatial queries. In SIGMOD. New York, NY, USA, pp. 1875–1885, 2020.

Drill, A. Apache drill - [link], 2022.

Duggan, J., Elmore, A. J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., and Zdonik, S. B. The bigdawg polystore system. SIGMOD Rec. 44 (2): 11–16, 2015.

Eichmann, P., Zgraggen, E., Binnig, C., and Kraska, T. Idebench: A benchmark for interactive data exploration. In SIGMOD. New York, NY, USA, pp. 1555–1569, 2020.

Elastic. Elasticsearch docs - [link], 2021a.

Elastic. Elasticsearch intro - [link], 2021b.

Elastic. Elasticsearch search and analyze - [link], 2021c.

Eldawy, A. and Mokbel, M. F. The era of big spatial data. Proc. VLDB Endow. 10 (12): 1992–1995, aug, 2017.

Freire, J., Koop, D., Santos, E., and Silva, C. T. Provenance for computational tasks: A survey. Comput. Sci. Eng. 10 (3): 11–21, 2008.

Ghojogh, B. Data Reduction Algorithms in Machine Learning and Data Science. Ph.D. thesis, University of Waterloo, Ontario, Canada, 2021.

Hausenblas, M. and Nadeau, J. Apache drill: interactive ad-hoc analysis at scale. Big data 1 (2): 100–104, 2013.

Hochstetler, J., Hochstetler, L., and Fu, S. An optimal police patrol planning strategy for smart city safety. In 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, HPCC/S-martCity/DSS 2016, Sydney, Australia, December 12-14, 2016, J. Chen and L. T. Yang (Eds.). IEEE Computer Society, pp. 1256–1263, 2016.

Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., and Kersten, M. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin, 2012.

Jiang, L., Rahman, P., and Nandi, A. Evaluating interactive data systems: Workloads, metrics, and guidelines. In SIGMOD. New York, NY, USA, pp. 1637–1644, 2018.

Kimball, R. and Ross, M. The Data Warehouse Toolkit: The complete guide to dimensional modeling. Wiley, New York, 2002.

Kononenko, O., Baysal, O., Holmes, R., and Godfrey, M. W. Mining modern repositories with elasticsearch. In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014, Hyderabad, India, P. T. Devanbu, S. Kim, and M. Pinzger (Eds.). ACM, pp. 328–331, 2014.

Kranas, P., Kolev, B., Levchenko, O., Pacitti, E., Valduriez, P., Jiménez-Peris, R., and Patiño-Martínez, M. Parallel query processing in a polystore. Distributed Parallel Databases 39 (4): 939–977, 2021.

Lins, L. D., Klosowski, J. T., and Scheidegger, C. E. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE TVCG 19 (12): 2456–2465, 2013.

Liu, Z. and Heer, J. The effects of interactive latency on exploratory visual analysis. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis), 2014a.

Liu, Z. and Heer, J. The effects of interactive latency on exploratory visual analysis. IEEE transactions on visualization and computer graphics 20 (12): 2122–2131, 2014b.

Magdon-Ismail, T. Tpcx-hs. In Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya (Eds.). Springer, 2019.

Makris, A., Tserpes, K., Spiliopoulos, G., Zissis, D., and Anagnostopoulos, D. Mongodb vs postgresql: A comparative study on performance aspects. GeoInformatica 25 (2): 243–268, 2021.

Mizutori, M. and Guha-Sapir, D. Human cost of disasters 2000-2019. Tech. rep., United Nations Office for Disaster Risk Reduction, 2020.

MonetDB. Monetdb - [link], 2021.

Munzner, T. Visualization Analysis and Design. A.K. Peters visualization series. A K Peters, 2014a.

Munzner, T. Visualization analysis and design. CRC press, 2014b.

Nascimento, L. C., Knust, L., Santos, R., Sá, B., Moreira, G., Freitas, F., Moura, N., Lage, M., and Oliveira, D. Análise de dados pluviométricos multi-fonte baseada em técnicas olap e de visualização: uma abordagem prática. In Anais do XII Workshop de Computação Aplicada à Gestão do Meio Ambiente e Recursos Naturais. SBC, Porto Alegre, RS, Brasil, pp. 1–10, 2021.

Nascimento, L. C., Lage, M., and de Oliveira, D. Um estudo sobre o uso de abordagens de gerência de dados em sistemas de análise visual de dados espaço-temporais. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. SBC, Porto Alegre, RS, Brasil, pp. 361–366, 2021.

Ocaña, K. A. C. S., Silva, V., de Oliveira, D., and Mattoso, M. Data analytics in bioinformatics: Data science in practice for genomics analysis workflows. In 11th IEEE International Conference on e-Science, e-Science 2015, Munich, Germany, August 31 - September 4, 2015. IEEE Computer Society, pp. 322–331, 2015.

Olivera, H. V., RuiZhe, G., Huacarpuma, R. C., da Silva, A. P. B., Mariano, A. M., and Holanda, M. Data modeling and nosql databases - A systematic mapping review. ACM Comput. Surv. 54 (6): 116:1–116:26, 2021.

Poess, M. TPC-DS. In Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya (Eds.). Springer, 2019a.

Poess, M. TPC-H. In Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya (Eds.). Springer, 2019b.

Poess, M. and Nambiar, R. TPC. In Encyclopedia of Big Data Technologies, S. Sakr and A. Y. Zomaya (Eds.). Springer, 2019.

Ponchateau, C. Conception et exploitation d’une base de modèles: application aux data sciences. (Design and Exploitation of a Models Database: Applied to Data Sciences). Ph.D. thesis, Ecole Nationale Supérieure de Mécanique et d’Aérotechique, Poitiers, France, 2018.

PostGIS. Postgis - [link], 2021.

PostgreSQL. Index types - [link], 2022a.

PostgreSQL. Postgresql - [link], 2022b.

Raasveldt, M. and Mühleisen, H. Duckdb: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska (Eds.). ACM, pp. 1981–1984, 2019.

Ribeiro, C. C. Sports scheduling: Problems and applications. Int. Trans. Oper. Res. 19 (1-2): 201–226, 2012.

Samet, H. Applications of spatial data structures - computer graphics, image processing, and GIS. Addison-Wesley, 1990a.

Samet, H. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990b.

Schmidt, J. Usage of visualization techniques in data science workflows. In Proc. of the VISIGRAPP. pp. 309–316, 2020.

Sevim, A., Mahin, M. T., Vu, T., Maxon, I., Eldawy, A., Carey, M., and Tsotras, V. A brief introduction to geospatial big data analytics with apache asterixdb. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on APIs and Libraries for Geospatial Data Science. pp. 1–2, 2021.

Spark, A. Apache spark aqe - [link] adaptive-query-execution, 2021a.

Spark, A. Spark sql data sources - [link], 2021b.

Spark, A. Apache spark - rdd - [link], 2022.

Strobl, C. Postgis. In Encyclopedia of GIS, S. Shekhar, H. Xiong, and X. Zhou (Eds.). Springer, pp. 1623–1630, 2017.

Thorndahl, S. and Willems, P. Probabilistic modelling of overflow, surcharge and flooding in urban drainage using the first-order reliability method and parameterization of local rain series. Water Research 42 (1): 455–466, 2008.

Verma, R. Smart city healthcare cyber physical system: Characteristics, technologies and challenges. Wirel. Pers. Commun. 122 (2): 1413–1433, 2022.

Wang, A., Zhang, A., Chan, E. H. W., Shi, W., Zhou, X., and Liu, Z. A review of human mobility research based on big data and its implication for smart city development. ISPRS Int. J. Geo Inf. 10 (1): 13, 2021.

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, S. D. Gribble and D. Katabi (Eds.). USENIX Association, pp. 15–28, 2012.

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I. Apache spark: a unified engine for big data processing. Commun. ACM 59 (11): 56–65, 2016.

Zheng, Y., Wu, W., Chen, Y., Qu, H., and Ni, L. M. Visual analytics in urban computing: An overview. IEEE Transactions on Big Data 2 (3): 276–296, 2016.

Zimbrao, G. and de Souza, J. M. A raster approximation for processing of spatial joins. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, A. Gupta, O. Shmueli, and J. Widom (Eds.). Morgan Kaufmann, pp. 558–569, 1998.

Downloads

Published

2022-09-21

How to Cite

Nascimento, L. C., P. Chagas, R., Lage, M., & de Oliveira, D. (2022). Beyond Click-and-View: a Comparative Study of Data Management Approaches for Interactive Visualization. Journal of Information and Data Management, 13(3). https://doi.org/10.5753/jidm.2022.2513

Issue

Section

SBBD 2021 Short papers - Extended papers