A data centered approach for analysis in deep neural networks
Abstract
The duration of the life cycle in deep neural networks depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network's execution allows adapting the data, thus reducing the life cycle time. However, there are challenges not only in collecting hyperparameters, but also in modeling the relationships between these data. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach with Keras, using a real application provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.
References
Breck, E., Polyzotis, N., Roy, S., Whang, S. E., and Zinkevich, M. (2019). Data validation for machine learning. In Conference on Systems and Machine Learning (SysML).
Caveness, E., GC, P. S., Peng, Z., Polyzotis, N., Roy, S., and Zinkevich, M. (2020). Tensorflow data validation: Data analysis and validation in continuous ml pipelines. In Proceedings of the 2020 ACM SIGMOD, pages 2793–2796.
Freitas, R. S., Barbosa, C. H., Guerra, G. M., Coutinho, A. L., and Rochinha, F. A. (2020). An encoder-decoder deep surrogate for reverse time migration in seismic imaging under uncertainty. arXiv preprint arXiv:2006.09550.
Gharibi, G., Walunj, V., Rella, S., and Lee, Y. (2019). Modelkb: towards automated management of the modeling lifecycle in deep learning. In Int. Work. on Realizing Art. Intel. Synergies in Soft. Eng., pages 28–34. IEEE Press.
Miao, H., Li, A., Davis, L. S., and Deshpande, A. (2017). Towards unified data and lifecycle management for deep learning. In 2017 IEEE 33rd ICDE, pages 571–582. IEEE.
Moreau, L. and Groth, P. (2013). Provenance: an introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology, 3(4):1–129.
Pina, D. B., Neves, L., Paes, A., de Oliveira, D., and Mattoso, M. (2019). Análise de hiperparâmetros em aplicações de aprendizado profundo por meio de dados de proveniência. In XXXIV SBBD, pages 223–228. SBC.
Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2017). Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561.
Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., and Seufert, S. (2017). Automatically tracking metadata and provenance of machine learning experiments. In ML Systems workshop.
Silva, V., de Oliveira, D., Valduriez, P., and Mattoso, M. (2018). Dfanalyzer: runtime dataflow analysis of scientific applications using provenance. PVLDB, 11(12):2082–2085.
Silva, V., Leite, J., Camata, J. J., De Oliveira, D., Coutinho, A. L., Valduriez, P., and Mattoso, M. (2017). Raw data queries during data-intensive parallel workflow execution. FGCS, 75:402–422.
Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., and Hirzel, M. (2018). Runway: machine learning model experiment management tool. In SysML.
Zaharia, M., Chen, A., Davidson, A., Ghodsi, A., Hong, S. A., Konwinski, A., Murching, S., Nykodym, T., Ogilvie, P., Parkhe, M., Xie, F., and Zumar, C. (2018). Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng. Bull., 41:39–45.
Zhu, Y. and Zabaras, N. (2018). Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics, 366:415–447.
