Predicting difficulty indicators of programming questions using metrics extracted from solution code
Abstract
Online Judges (OJs) are used in programming courses to construct assigments and practical exams. However, it is not easy to design assigments with balanced exercises according to difficulty levels. One way to assist the instructor is to present difficulty indicators for each question, such as success rate, average implementation time, number of submissions, etc. However, for new questions that have yet to be solved by students, it is not possible to calculate such indicators. Thus, this paper uses machine learning to predict nine difficulty indicators from metrics extracted from solution codes provided by teachers when they crete exercises in the OJ. As examples of results, the binary classification of success rate obtained the highest f1-score (0.920), and the binary classification of the number of submissions obtained the lowest score (0.560).
Keywords:
machine learning, difficulty indicators, online judges, predicting difficulty metrics, introductory programming courses
References
Beckmann, J. F., Birney, D. P., and Goode, N. (2017). Beyond psychometrics: the difference between difficult problem solving and complex problem solving. Frontiers in psychology, 8:1739.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery.
Denny, P., Cukierman, D., and Bhaskar, J. (2015). Measuring the effect of inventing practice exercises on learning in an introductory programming course. In Proceedings of the 15th Koli Calling Conference on Computing Education Research, pages 13–22.
Effenberger, T., Cechak, J., and Pelanek, R. (2019). Difficulty and complexity of introductory programming problems. In Educational Data Mining in Computer Science Education (CSEDM).
Elnaffar, S. (2016). Using software metrics to predict the difficulty of code writing questions. In IEEE Global Engineering Education Conference (EDUCON), pages 513–518
Francisco, R. E. and Ambrosio, A. P. (2015). Mining an online judge system to support introductory computer programming teaching. In SMLIR: Workshop on Tools and Technologies in Statistics, Machine Learning and Information Retrieval for Educational Data Mining, pages 93–98.
Lima, M. A., Carvalho, L. S., Oliveira, E. H., Oliveira, D. B., and Pereira, F. D. (2021). Uso de atributos de código para classificar a dificuldade de questões de programação em juízes online. Revista Brasileira de Informatica na Educação, 29:1137–1157.
Liu, P. and Li, Z. (2012). Task complexity: A review and conceptualization framework. International Journal of Industrial Ergonomics, 42(6):553–568.
Llana, L., Martin-Martin, E., and Pareja-Flores, C. (2012). Flop, a free laboratory of programming. In Proceedings of the 12th Koli Calling International Conference on Computing Education Research, Koli Calling ’12, page 93–99, New York, NY, USA.
Lundberg, S., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I. (2020). From local explanations to global understanding with explainable ai for trees. Nature machine intelligence, pages 56–67.
Meisalo, V., Sutinen, E., and Torvinen, S. (2004). Classification of exercises in a virtual programming course. In 34th Annual Frontiers in Education (FIE 2004), pages S3D–1.
Pelanek, R., Effenberger, T., and Čechák, J. (2021). Complexity and difficulty of items in learning systems. International Journal of Artificial Intelligence in Education, pages 1–37.
Santos, P., Carvalho, L. S. G., Oliveira, E. H. T., and Oliveira, D. B. F. (2019). Classificação de dificuldade de questões de programação com base na inteligibilidade do enunciado. Simpósio Brasileiro de Informática na Educação, 30(1):1886–1895.
Sheard, J., Simon, Carbone, A., Chinn, D., Clear, T., Corney, M., D’Souza, D., Fenwick, J., Harland, J., Laakso, M.-J., and Teague, D. (2013). How difficult are exams? a framework for assessing the complexity of introductory programming exams. In Proceedings of the 15th Australasian Computing Education Conference, volume 136 of ACE ’13, pages 145—-154, AUS.
Van Der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3):247–272.
Whalley, J. and Kasto, N. (2014). How difficult are novice code writing tasks? a software metrics approach. In Proceedings of the Sixteenth Australasian Computing Education Conference, volume 148, pages 105–112.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery.
Denny, P., Cukierman, D., and Bhaskar, J. (2015). Measuring the effect of inventing practice exercises on learning in an introductory programming course. In Proceedings of the 15th Koli Calling Conference on Computing Education Research, pages 13–22.
Effenberger, T., Cechak, J., and Pelanek, R. (2019). Difficulty and complexity of introductory programming problems. In Educational Data Mining in Computer Science Education (CSEDM).
Elnaffar, S. (2016). Using software metrics to predict the difficulty of code writing questions. In IEEE Global Engineering Education Conference (EDUCON), pages 513–518
Francisco, R. E. and Ambrosio, A. P. (2015). Mining an online judge system to support introductory computer programming teaching. In SMLIR: Workshop on Tools and Technologies in Statistics, Machine Learning and Information Retrieval for Educational Data Mining, pages 93–98.
Lima, M. A., Carvalho, L. S., Oliveira, E. H., Oliveira, D. B., and Pereira, F. D. (2021). Uso de atributos de código para classificar a dificuldade de questões de programação em juízes online. Revista Brasileira de Informatica na Educação, 29:1137–1157.
Liu, P. and Li, Z. (2012). Task complexity: A review and conceptualization framework. International Journal of Industrial Ergonomics, 42(6):553–568.
Llana, L., Martin-Martin, E., and Pareja-Flores, C. (2012). Flop, a free laboratory of programming. In Proceedings of the 12th Koli Calling International Conference on Computing Education Research, Koli Calling ’12, page 93–99, New York, NY, USA.
Lundberg, S., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I. (2020). From local explanations to global understanding with explainable ai for trees. Nature machine intelligence, pages 56–67.
Meisalo, V., Sutinen, E., and Torvinen, S. (2004). Classification of exercises in a virtual programming course. In 34th Annual Frontiers in Education (FIE 2004), pages S3D–1.
Pelanek, R., Effenberger, T., and Čechák, J. (2021). Complexity and difficulty of items in learning systems. International Journal of Artificial Intelligence in Education, pages 1–37.
Santos, P., Carvalho, L. S. G., Oliveira, E. H. T., and Oliveira, D. B. F. (2019). Classificação de dificuldade de questões de programação com base na inteligibilidade do enunciado. Simpósio Brasileiro de Informática na Educação, 30(1):1886–1895.
Sheard, J., Simon, Carbone, A., Chinn, D., Clear, T., Corney, M., D’Souza, D., Fenwick, J., Harland, J., Laakso, M.-J., and Teague, D. (2013). How difficult are exams? a framework for assessing the complexity of introductory programming exams. In Proceedings of the 15th Australasian Computing Education Conference, volume 136 of ACE ’13, pages 145—-154, AUS.
Van Der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3):247–272.
Whalley, J. and Kasto, N. (2014). How difficult are novice code writing tasks? a software metrics approach. In Proceedings of the Sixteenth Australasian Computing Education Conference, volume 148, pages 105–112.
Published
2022-11-16
How to Cite
SILVA, Élrik Souza; CARVALHO, Leandro S. G.; OLIVEIRA, David B. F. de; OLIVEIRA, Elaine H. T.; LAUSCHNER, Tanara; LIMA, Marcos A. P. de; PEREIRA, Filipe Dwan.
Predicting difficulty indicators of programming questions using metrics extracted from solution code. In: BRAZILIAN SYMPOSIUM ON COMPUTERS IN EDUCATION (SBIE), 33. , 2022, Manaus.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 859-870.
DOI: https://doi.org/10.5753/sbie.2022.224724.
