Modular metadata integrator for machine learning with runtime visualization

Abstract


In recent years, data visualization during the training of a machine learning (ML) model, as well as the structured storage of metadata for future analysis, has emerged as a key abstraction to help the human in selecting a model. Existing solutions have two limitations: the first relates to the frameworks used in training, which tend to have a tightly coupled aspect, while the second, to data governance risks. Thus, humans, data scientists and analysts, face the following barriers: i) vendor lock-in and ii) application management at commercial level. This paper aims at presenting a reference architecture for ML environments, which can be applied to data visualization. Therefore, it has three main focuses: modularization, interoperability and data governance. This architecture is based on serverless computing, as it favors tight, simple and interoperable coupling. An instantiation experiment of the architecture shows runtime visualization based on independent components.
Keywords: machine learning metadata, ml experiment tracking, runtime data visualization

References

De Bie, T., De Raedt, L., Hernández-Orallo, J., Hoos, H. H., Smyth, P., and Williams, C. K. (2022). Automating data science. Communications of the ACM, 65(3):76–87. ACM New York, NY, USA.

Gil, Y., Honaker, J., Gupta, S., Ma, Y., D’Orazio, V., Garijo, D., Gadewar, S., Yang, Q., and Jahanshad, N. (2019). Towards human-guided machine learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces, pages 614–624.

Kumar, A., McCann, R., Naughton, J., and Patel, J. M. (2016). Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record, 44(4):17–22. ACM New York, NY, USA.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444. Publisher: Nature Publishing Group UK London.

Lee, D. and Macke, S. (2020). A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. IEEE Data Engineering Bulletin. National Science Foundation. NSF-PAR ID: 10161752.

Pang, B., Nijkamp, E., and Wu, Y. N. (2020). Deep learning with tensorflow: A review. Journal of Educational and Behavioral Statistics, 45(2):227–248. SAGE Publications, Los Angeles, CA.

Schelter, S., Boese, J.-H., Kirschnick, J., Klein, T., and Seufert, S. (2017). Automatically tracking metadata and provenance of machine learning experiments. In NeurIPS 2017, pages 27–29.

Schlegel, M. and Sattler, K.-U. (2023). Management of Machine Learning Lifecycle Artifacts: A Survey. SIGMOD Record, 51(4).

Spinner, T., Schlegel, U., Schäfer, H., and El-Assady, M. (2020). explAIner: A visual analytics framework for interactive and explainable machine learning. IEEE transactions on visualization and computer graphics, 26(1):1064–1074. Publisher: IEEE.

Victorino, M. and Bräscher, M. (2009). Organização da informação e do conhecimento, engenharia de software e arquitetura orientada a serviços: uma abordagem holı́stica para o desenvolvimento de sistemas de informação computadorizados. Revista de Ciência da Informação, 10(3).

Wang, J., Liu, S., and Zhang, W. (2023). Visual Analytics For Machine Learning: A Data Perspective Survey. arXiv e-prints, page arXiv:2307.07712.

Yuan, J., Chen, C., Yang, W., Liu, M., Xia, J., and Liu, S. (2021). A survey of visual analytics techniques for machine learning. Computational Visual Media, 7(1):3–36. Publisher: Springer.
Published
2023-09-25
SILVA, Filipe; MATTOSO, Marta. Modular metadata integrator for machine learning with runtime visualization. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 445-450. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2023.233424.