A Comparative Study between Data Engineering Education and the Needs of the Brazilian Market
Abstract
The term Data Engineering (DE) has been frequently used in the literature and current curricular proposals to refer to the processes of acquiring, organizing, and preparing data to be consumed in exploratory analyses, as input for systems and applications, or other similar contexts. With the emergence of the Data Science field, this term has been employed to encompass what was traditionally known as data management. In this study, we explore DE in the Brazilian academic and industrial context, highlighting the growing relevance of this area in today's society and the need for related skills among computing professionals. This study was motivated by the authors' perception that the advancements of at least a decade in the industry in DE have not been adequately absorbed by undergraduate education in universities. Through surveys conducted on the courses, bibliographies, and syllabi related to DE in 23 Brazilian universities and technology companies in the country, we built a taxonomy of currently taught topics and another taxonomy of topics considered relevant to the industry. The comparative study of these taxonomies revealed a gap between DE education and market demands, with academic curriculums often outdated regarding topics considered relevant for the contemporary industry. In particular, topics related to high-performance data platforms, cloud data management, and data workflow are highlighted as significant current needs of the industry, but which are little explored in current curriculums. Our goal with this study is to support changes in curriculums that can contribute to the training of more qualified professionals aligned with the modern market's needs.
References
ACM and IEEE. 2020. ACM Computing Curricula 2020. [link]
ACM, IEEE and AAAI. 2023. Computer Science Curricula 2023 - Version Beta. [link]
Imanol Arrieta-Ibarra et al. 2018. Should We Treat Data as Labor? Moving beyond "Free". AEA Papers and Proceedings 108, 38–42.
Tijl De Bie et al. 2022. Automating data science. Commun. ACM 65, 3, 76–87.
Peter J. Denning. 2003. Great principles of computing. Commun. ACM 46, 11, 15–20.
Andreas Grillenberger and Ralf Romeike. 2014. Big Data - Challenges for Computer Science Education. In Informatics in Schools. Teaching and Learning Perspectives - 7th International Conference on Informatics in Schools: Situation, Evolution, and Perspectives, ISSEP. 29–40.
Andreas Grillenberger and Ralf Romeike. 2017. Key Concepts of Data Management – an Empirical Approach. In Proceedings of the 17th Koli Calling International Conference on Computing Education Research. 30–39.
Ismail Bile Hassan and Jigang Liu. 2019. Embedding Data Science into Computer Science Education. In IEEE International Conference on Electro Information Technology EIT. 367–372.
Nicolaus Henke et al. 2016. The age of analytics: Competing in a data-driven world.
Alfredo Nazábal et al. 2020. Data Engineering for Data Analytics: A Classification of the Issues, and Case Studies. CoRR abs/2004.12929.
David Reinsel et al. 2018. The Digitization of the World - From Edge to Core.
SBC. 2021. Ref. Curricular: Bacharelado em Ciência de Dados. [link]
Yasin N. Silva et al. 2014. Integrating big data into the computing curricula. In The 45th ACM Technical Symposium on Computer Science Education, SIGCSE. 139–144.
Michael Stonebraker and Ugur Çetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone (Abstract). In Proceedings of the 21st International Conference on Data Engineering, ICDE. 2–11
Matei Zaharia et al. 2021. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, Online Proceedings.
