Towards Integration of Workflow Algebra with Relational Query Processing
Abstract
Workflows emerged as a basic abstraction for structuring data analysis experiments in the current Data Intensive Scalable Computing (DISC) scenario. In many situations, these workflows are intensive, either computationally or in relation to data management, requiring execution in high-performance processing environments. However, parallelizing the execution of workflows commonly requires laborious programming, in an ad hoc manner and in a low level of abstraction, which makes it difficult to explore optimization opportunities. Some algebraic approaches have been developed to mitigate such limitation. This work moves in the direction converging the workflow algebra with relational query processing.
Keywords:
Workflow, Algebra, Relational Query, Query Processing, Workflow Algebra
References
Bouganim, L., Florescu, D., and Valduriez, P. (1996). Dynamic Load Balancing in Hierarchical Parallel Database Systems. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB ’96, pages 436–447, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Bryant, R. (2011). Data-intensive scalable computing for scientific applications. Computing in Science and Engineering, 13(6):25–33.
Elmasri, R. and Navathe, S. B. (2015). Fundamentals of Database Systems. Pearson, Boston und 24 andere, 7 edition.
Fegaras, L. (2017). An algebra for distributed Big Data analytics. Journal of Functional Programming.
Hsu, M., Chen, Q., Wu, R., Zhang, B., and Zeller, H. (2010). Generalized UDF for analytics inside database engine. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6184 LNCS:742–754.
Liu, J., Pacitti, E., Valduriez, P., and Mattoso, M. (2015). A Survey of Data-Intensive Scientific Workflow Management. Journal of Grid Computing, pages 1–37.
Ogasawara, E., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., and Mattoso, M. (2011). An algebraic approach for data-centric scientific workflows. In Proceedings of the VLDB Endowment, volume 4, pages 1328–1339.
Ogasawara, E., Dias, J., Silva, V., Chirigati, F., Oliveira, D. d., Porto, F., Valduriez, P., and Mattoso, M. (2013). Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience, 25(16):2327–2341.
Rheinlander, A., Heise, A., Hueske, F., Leser, U., and Naumann, F. (2015). SOFA: An extensible logical optimizer for UDF-heavy data flows. Information Systems, 52:96–125.
Rheinländer, A., Leser, U., and Graefe, G. (2017). Optimization of complex dataflows with userdefined functions. ACM Computing Surveys, 50(3).
Tamer Ozsu, M. and Valduriez, P. (2011). Principles of Distributed Database Systems. Springer, New York, 3 edition.
Zaharia, M., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., and Venkataraman, S. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM, 59(11):56–65.
Bryant, R. (2011). Data-intensive scalable computing for scientific applications. Computing in Science and Engineering, 13(6):25–33.
Elmasri, R. and Navathe, S. B. (2015). Fundamentals of Database Systems. Pearson, Boston und 24 andere, 7 edition.
Fegaras, L. (2017). An algebra for distributed Big Data analytics. Journal of Functional Programming.
Hsu, M., Chen, Q., Wu, R., Zhang, B., and Zeller, H. (2010). Generalized UDF for analytics inside database engine. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6184 LNCS:742–754.
Liu, J., Pacitti, E., Valduriez, P., and Mattoso, M. (2015). A Survey of Data-Intensive Scientific Workflow Management. Journal of Grid Computing, pages 1–37.
Ogasawara, E., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., and Mattoso, M. (2011). An algebraic approach for data-centric scientific workflows. In Proceedings of the VLDB Endowment, volume 4, pages 1328–1339.
Ogasawara, E., Dias, J., Silva, V., Chirigati, F., Oliveira, D. d., Porto, F., Valduriez, P., and Mattoso, M. (2013). Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience, 25(16):2327–2341.
Rheinlander, A., Heise, A., Hueske, F., Leser, U., and Naumann, F. (2015). SOFA: An extensible logical optimizer for UDF-heavy data flows. Information Systems, 52:96–125.
Rheinländer, A., Leser, U., and Graefe, G. (2017). Optimization of complex dataflows with userdefined functions. ACM Computing Surveys, 50(3).
Tamer Ozsu, M. and Valduriez, P. (2011). Principles of Distributed Database Systems. Springer, New York, 3 edition.
Zaharia, M., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., and Venkataraman, S. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM, 59(11):56–65.
Published
2018-08-25
How to Cite
FERREIRA, João Antonio; SOARES, Jorge; PORTO, Fabio; PACITTI, Esther; COUTINHO, Rafaelli; OGASAWARA, Eduardo.
Towards Integration of Workflow Algebra with Relational Query Processing. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 33. , 2018, Rio de Janeiro.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2018
.
p. 205-210.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2018.22231.
