Detection of similarity between SQL queries for educational purposes
Abstract
This paper proposes a results comparison algorithm to facilitate the evaluation of SQL academic exercises, recognizing the complexity of such queries that challenge the capacity of teachers in dealing with subtle variations on answers. The algorithm aims to generate messages indicating how similar the results are between the reference query (template) and the student’s attempt. In face of the diversity of possible answers, many parcially correct, like the use of a different number of columns or distinct filtering criteria, the algorithm plays the key role of noticing such subtleties. The results highlight the algorithm’s efficiency in simplifying the correction process for educators, providing immediate and detailed feedback to students, thus promoting a more equitable and efficient assessment in the context of distance learning.
References
Chaudhuri, S., Chen, B.-C., Ganti, V., and Kaushik, R. (2007). Example-driven design of efficient record matching queries. In VLDB, volume 7, pages 327–338.
De Vries, T., Ke, H., Chawla, S., and Christen, P. (2011). Robust record linkage blocking using suffix arrays and bloom filters. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2):1–27.
Fan, W., Jia, X., Li, J., and Ma, S. (2009). Reasoning about record matching rules. Proceedings of the VLDB Endowment, 2(1):407–418.
Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97.
Papadakis, G., Skoutas, D., Thanos, E., and Palpanas, T. (2020). Blocking and filtering techniques for entity resolution: A survey. ACM Computing Surveys (CSUR), 53(2):1–42.
Wang, J., Li, G., Yu, J. X., and Feng, J. (2011). Entity matching: How similar is similar. Proceedings of the VLDB Endowment, 4(10):622–633.
Whang, S. E., Menestrina, D., Koutrika, G., Theobald, M., and Garcia-Molina, H. (2009). Entity resolution with iterative blocking. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 219–232.
