A data centered approach for analysis in deep neural networks

  • Débora Pina Federal University of Rio de Janeiro
  • Liliane Kunstmann Federal University of Rio de Janeiro
  • Daniel Oliveira Federal Fluminense University
  • Patrick Valduriez Inria, University of Montpellier, CNRS, LIRMM
  • Marta Mattoso Federal University of Rio de Janeiro

Abstract


The duration of the life cycle in deep neural networks depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network's execution allows adapting the data, thus reducing the life cycle time. However, there are challenges not only in collecting hyperparameters, but also in modeling the relationships between these data. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach with Keras, using a real application provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.

Keywords: Deep learning, hyperparameters, provenance, data analysis

References

Badan, F. and Sekanina, L. (2019). Optimizing convolutional neural networks for embedded systems by means of neuroevolution. In TPNC 2019, volume 11934, pages 109–121.

Breck, E., Polyzotis, N., Roy, S., Whang, S. E., and Zinkevich, M. (2019). Data validation for machine learning. In Conference on Systems and Machine Learning (SysML).

Caveness, E., GC, P. S., Peng, Z., Polyzotis, N., Roy, S., and Zinkevich, M. (2020). Tensorflow data validation: Data analysis and validation in continuous ml pipelines. In Proceedings of the 2020 ACM SIGMOD, pages 2793–2796.

Freitas, R. S., Barbosa, C. H., Guerra, G. M., Coutinho, A. L., and Rochinha, F. A. (2020). An encoder-decoder deep surrogate for reverse time migration in seismic imaging under uncertainty. arXiv preprint arXiv:2006.09550.

Gharibi, G., Walunj, V., Rella, S., and Lee, Y. (2019). Modelkb: towards automated management of the modeling lifecycle in deep learning. In Int. Work. on Realizing Art. Intel. Synergies in Soft. Eng., pages 28–34. IEEE Press.

Miao, H., Li, A., Davis, L. S., and Deshpande, A. (2017). Towards unified data and lifecycle management for deep learning. In 2017 IEEE 33rd ICDE, pages 571–582. IEEE.

Moreau, L. and Groth, P. (2013). Provenance: an introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology, 3(4):1–129.

Pina, D. B., Neves, L., Paes, A., de Oliveira, D., and Mattoso, M. (2019). Análise de hiperparâmetros em aplicações de aprendizado profundo por meio de dados de proveniência. In XXXIV SBBD, pages 223–228. SBC.

Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2017). Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561.

Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., and Seufert, S. (2017). Automatically tracking metadata and provenance of machine learning experiments. In ML Systems workshop.

Silva, V., de Oliveira, D., Valduriez, P., and Mattoso, M. (2018). Dfanalyzer: runtime dataflow analysis of scientific applications using provenance. PVLDB, 11(12):2082–2085.

Silva, V., Leite, J., Camata, J. J., De Oliveira, D., Coutinho, A. L., Valduriez, P., and Mattoso, M. (2017). Raw data queries during data-intensive parallel workflow execution. FGCS, 75:402–422.

Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., and Hirzel, M. (2018). Runway: machine learning model experiment management tool. In SysML.

Zaharia, M., Chen, A., Davidson, A., Ghodsi, A., Hong, S. A., Konwinski, A., Murching, S., Nykodym, T., Ogilvie, P., Parkhe, M., Xie, F., and Zumar, C. (2018). Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng. Bull., 41:39–45.

Zhu, Y. and Zabaras, N. (2018). Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics, 366:415–447.
Published
2020-09-28
PINA, Débora; KUNSTMANN, Liliane; OLIVEIRA, Daniel; VALDURIEZ, Patrick; MATTOSO, Marta. A data centered approach for analysis in deep neural networks. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 35. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 187-192. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2020.13639.