User Steering Support in Large-scale Workflows

  • Renan Souza Universidade Federal do Rio de Janeiro (UFRJ) / IBM Research
  • Marta Mattoso Universidade Federal do Rio de Janeiro (UFRJ)
  • Patrick Valduriez University of Montpellier

Resumo


Large-scale workflows that execute on High-Performance Computing machines need to be dynamically steered by users. This means that users analyze big data files, assess key performance indicators, fine-tune parameters, and evaluate the tuning impacts while the workflows generate multiple files, which is challenging. If one does not keep track of such interactions (called user steering actions), it may be impossible to understand the consequences of steering actions and to reproduce the results. This thesis proposes a generic approach to enable tracking user steering actions by characterizing, capturing, relating, and analyzing them by leveraging provenance data management concepts. Experiments with real users show that the approach enabled the understanding of the impact of steering actions while incurring negligible overhead.
Palavras-chave: workflows, user steering, high-performance computing

Referências

Atkinson, M., Gesing, S., Montagnat, J., and Taylor, I. (2017). Scientific workflows: Past, present and future. Future Generation Computer Systems, 75:216–227.

Camata, J. J., Silva, V., Valduriez, P., Mattoso, M., and Coutinho, A. L. G. A. (2018). In situ visualization and data analysis for turbidity currents simulation. Computers & Geosciences, 110:23–31.

Deelman, E., Peterka, T., Altintas, I., Carothers, C. D., Kleese van Dam, K., Moreland, K., Parashar, M., Ramakrishnan, L., Taufer, M., and Vetter, J. (2017). The future of scientific workflows. International Journal of HPC Applications, 32(1):159–175.

F. da Silva, R., Filgueira, R., Pietri, I., Jiang, M., Sakellariou, R., and Deelman, E. (2017). A characterization of workflow management systems for extreme-scale applications. Future Generation Computer Systems, 75:228–238.

Ogasawara, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., and Mattoso, M. (2011). An algebraic approach for data-centric scientific workflows. PVLDB, 4(12):1328–1339.

Rude, U., Willcox, K., McInnes, L. C., and Sterck, H. D. (2018). Research and education in computational science and engineering. SIAM Review, 60(3):707–754.

Silva, V., de Oliveira, D., Valduriez, P., and Mattoso, M. (2018a). DfAnalyzer: runtime dataflow analysis of scientific applications using provenance. PVLDB, 11(12):2082–2085.

Silva, V., Neves, L., Souza, R., Coutinho, A., de Oliveira, D., and Mattoso, M. (2018b). Adding domain data to code profiling tools to debug workflow parallel execution. Future Generation Computer Systems, 110:422–439.

Silva, V., Neves, L., Souza, R., Coutinho, A., Oliveira, D. D., and Mattoso, M. (2016). Integrating domain-data steering with code-profiling tools to debug data-intensive workflows. In Workflows in Support of Large-Scale Science (WORKS) at ACM/IEEE Supercomputing

Silva, V., Souza, R., Camata, J., de Oliveira, D., Valduriez, P., Coutinho, A., and Mattoso, M. (2018c). Capturing provenance for runtime data analysis in computational science and engineering applications. In International Provenance and Annotation Workshop (IPAW), pages 183–187.

Souza, R., Azevedo, L., Thiago, R., Soares, E., Nery, M., Netto, M., Brazil, E. V., Cerqueira, R., Valduriez, P., and Mattoso, M. (2019a). Efficient runtime capture of multiworkflow data using provenance. In IEEE e-Science, pages 1–10.

Souza, R., G. Azevedo, L., Lourenc¸o, V., Soares, E., Thiago, R., Brandao, R., Civitarese, D., Vital Brazil, E., Moreno, M., Valduriez, P., Mattoso, M., Cerqueira, R., and A. S. Netto, M. (2021a). Workflow provenance in the lifecycle of scientific machine learning. Concurrency and Computation: Practice and Experience, pages 1–21.

Souza, R. and Mattoso, M. (2018). Provenance of dynamic adaptations in user-steered dataflows. In International Provenance and Annotation Workshop (IPAW), pages 16–29.

Souza, R., Neves, L., Azeredo, L., Luiz, R., Tady, E., Cavalin, P., and Mattoso, M. (2018). Towards a human-in-the-loop library for tracking hyperparameter tuning in deep learning development. In Latin American Data Science (LaDaS) at VLDB.

Souza, R., Silva, V., Camata, J., Coutinho, A., Valduriez, P., and Mattoso, M. (2017a). Tracking of online parameter fine-tuning in scientific workflows. In Workflows in Support of Large-Scale Science (WORKS) at ACM/IEEE Supercomputing.

Souza, R., Silva, V., Camata, J. J., Coutinho, A., Valduriez, P., and Mattoso, M. (2019b). Keeping track of user steering actions in dynamic workflows. Future Generation Computer Systems, 99:624–643.

Souza, R., Silva, V., Coutinho, A., Valduriez, P., and Mattoso, M. (2016). Online input data reduction in scientific workflows. In Workflows in Support of Large-Scale Science (WORKS) at ACM/IEEE Supercomputing, pages 1–10.

Souza, R., Silva, V., Coutinho, A., Valduriez, P., and Mattoso, M. (2017b). Data reduction in scientific workflows using provenance monitoring and user steering. Future Generation Computer Systems, 110:481–501.

Souza, R., Silva, V., Lima, A. A. B., Oliveira, D., Valduriez, P., and Mattoso, M. (2021b). Distributed in-memory data management for workflow executions. PeerJCS.

Souza, R., Silva, V., Miranda, P., Lima, A. A. B., Valduriez, P., and Mattoso, M. (2017c). Spark scalability analysis in a scientific workflow. In SBBD, pages 288–293.

Souza, R., Silva, V., Oliveira, D., Valduriez, P., Lima, A. A. B., and Mattoso, M. (2015). Parallel execution of workflows driven by a distributed database management system. In ACM/IEEE Supercomputing, pages 1–3.
Publicado
04/10/2021
SOUZA, Renan; MATTOSO, Marta; VALDURIEZ, Patrick. User Steering Support in Large-scale Workflows. In: CONCURSO DE TESES E DISSERTAÇÕES (CTDBD) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 195-200. DOI: https://doi.org/10.5753/sbbd_estendido.2021.18185.