Use of Semantic Annotations for Exploring Parallelism in Data-Intensive Workflows
Abstract
Applications that analyze large volumes of data are often modeled as interconnected activities (workflows) and executed on high-performance platforms. Data partitioning and replication can make the activities parallelizable. However, to define a model that results in an efficient use of the platform is not trivial. This paper proposes semantic annotations to characterize the data processing in workflows activities, in order to automatically create strategies to parallelize the execution. In experiments with a workflow that handles 5.8 millions of data objects in a NoSQL system, the parallelism obtained from the annotations has reduced the makespan by 88.4% and the financial cost by 10.4%.
Keywords:
Semantic annotations, Parallelism in workflows
References
Dean, J. and Ghemawat, S. (2010). MapReduce: a flexible data processing tool. In Communications of the ACM, volume 53, pages 72–77. ACM.
Ferreira, G. R. et al. (2014). Uso de SGBDs NoSQL na gerência da proveniência distribuída em workflows científicos. In The 29th Brazilian Symposium on Databases.
Ogasawara, E. et al. (2011). An algebraic approach for data-centric scientific workflows. In The VLDB Endowment, volume 4, pages 1328–1339.
Pautasso, C. and Alonso, G. (2006). Parallel computing patterns for grid workflows. In The 6th Workshop on Workflows in Support of Large-Scale Science, pages 1–10.
Singh, G. et al. (2008). Workflow task clustering for best effort systems with pegasus. In The 15th ACM Mardi Gras Conference, pages 9:1–9:8.
Ferreira, G. R. et al. (2014). Uso de SGBDs NoSQL na gerência da proveniência distribuída em workflows científicos. In The 29th Brazilian Symposium on Databases.
Ogasawara, E. et al. (2011). An algebraic approach for data-centric scientific workflows. In The VLDB Endowment, volume 4, pages 1328–1339.
Pautasso, C. and Alonso, G. (2006). Parallel computing patterns for grid workflows. In The 6th Workshop on Workflows in Support of Large-Scale Science, pages 1–10.
Singh, G. et al. (2008). Workflow task clustering for best effort systems with pegasus. In The 15th ACM Mardi Gras Conference, pages 9:1–9:8.
Published
2016-10-04
How to Cite
WATANABE, Elaine Naomi; BRAGHETTO, Kelly Rosa.
Use of Semantic Annotations for Exploring Parallelism in Data-Intensive Workflows. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 271-276.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2016.24340.
