Correlation Between Complexity and Difficulty of Programming Questions in Online Judges
Abstract
Automatic code correction environments are increasingly used in the teaching-learning process of programming disciplines. However, a problem often faced by teachers who use this technology is to determine the difficulty of the questions registered in the environment. This work aims to carry out a correlation analysis between code complexity metrics and the difficulty faced by students, so that it is possible to automatically predict the difficulty level of a question just by knowing its solution model. This study was divided into three stages: i) analysis of Spearman's correlation between complexity metrics (extracted from the question) and difficulty (extracted from the student's interaction with the question), ii) prediction of the difficulty class of questions through models machine learning for classification and iii) prediction of difficulty metrics using regression models. Regarding item i), it was observed that 96% of the correlations were weak or non-existent between individual metrics of code complexity and difficulty, 4% of cases of moderate correlation and no cases of strong correlation. For item ii), the highest f1-score obtained was 88%, considering classification with two levels of difficulty (“easy” and “hard”), and a maximum f1-score of 67%, considering classification with three levels (“ easy”, “medium” and “difficult”). For item iii), the best result obtained was an adjusted correlation coefficient of 63%.
Keywords:
Online Judges, Programming questions difficulty, Difficulty metrics, Complexity metrics, Machine Learning
References
Davide Anguita, Alessandro Ghio, Sandro Ridella, and Dario Sterpi. 2009. K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines. In DMIN. 291–297.
Leandro Silva Galvão Carvalho, David Braga Fernandes Oliveira, and Bruno Gadelha. 2016. Juiz online como ferramenta de apoio a uma metodologia de ensino híbrido em programação. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), Vol. 27. SBC, Uberlândia, MG, 140–149.
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357.
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794.
C.P. Dancey and J. Reidy. 2007. Statistics Without Maths for Psychology. Pearson/Prentice Hall.
Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. 2017. Relatório Síntese de Área – Ciência da Computação (Bacharelado/Licenciatura). Technical Report. Retrieved August 27, 2022 from [link].
Marcos Avner Pimenta de Lima, Leandro Silva Galvão Carvalho, Elaine Harada Teixeira Oliveira, David Braga Fernandes Oliveira, and Filipe Dwan Pereira. 2021. Uso de atributos de código para classificação da facilidade de questões de codificação. In Anais do Simpósio Brasileiro de Educação em Computação. SBC, SBC, online, 113–122.
Alessandro Di Bucchianico. 2008. Coefficient of determination (R 2). Encyclopedia of statistics in quality and reliability 1.
Tomáš Effenberger, Jaroslav Cechák, and Radek Pelánek. 2019. Difficulty and Complexity of Introductory Programming Problems.
Said Elnaffar. 2016. Using software metrics to predict the difficulty of code writing questions. In 2016 ieee global engineering education conference (educon). IEEE, 513–518.
Rodrigo Elias Francisco, Ana Paula Laboissière Ambrósio, Cleon Xavier Pereira Junior, and Márcia Aparecida Fernandes. 2018. Juiz online no ensino de CS1-lições aprendidas e proposta de uma ferramenta. Revista Brasileira de Informática na Educação 26, 03, 163.
Margherita Grandini, Enrico Bagli, and Giorgio Visani. 2020. Metrics for multi-class classification: An overview. arXiv:2008.05756 [stat.ML]
Peng Liu and Zhizhong Li. 2012. Task complexity: A review and conceptualization framework. International Journal of Industrial Ergonomics 42, 6, 553–568.
Ana Carolina Lorena and André CPLF Carvalho. 2007. Uma introdução às support vector machines. Revista de Informática Teórica e Aplicada 14, 2, 43–67.
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1, 56–67.
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, England.
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering 4, 308–320.
V. Meisalo, E. Sutinen, and S. Torvinen. 2004. Classification of exercises in a virtual programming course. In 34th Annual Frontiers in Education, 2004. FIE 2004. S3D–1.
Leann Myers and Maria J. Sirois. 2006. Spearman Correlation Coefficients, Differences between. In Encyclopedia of Statistical Sciences. John Wiley & Sons, Ltd.
M Z Naser and Amir H Alavi. 2021. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Archit. Struct. Constr.
Rodrigo Barros Paes, Romero Malaquias, Márcio Guimarães, and Hyggo Almeida. 2013. Ferramenta para a Avaliação de Aprendizado de Alunos em Programação de Computadores. In Anais dos Workshops do Congresso Brasileiro de Informática na Educação, Vol. 2. SBC, Campinas, SP, 203–212.
Fillipi D Pelz, Elieser A Jesus, and André LA Raabe. 2012. Um mecanismo para correção automática de exercícios práticos de programação introdutória. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educaçao-SBIE), Vol. 23.
Filipe Dwan Pereira, Samuel C Fonseca, Elaine HT Oliveira, Alexandra I Cristea, Henrik Bellhäuser, Luiz Rodrigues, David BF Oliveira, Seiji Isotani, and Leandro SG Carvalho. 2021. Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model. IEEE Access 9, 117097–117119.
Filipe Dwan Pereira, Hermino BF Junior, Luiz Rodriguez, Armando Toda, Elaine HT Oliveira, Alexandra I Cristea, David BF Oliveira, Leandro SG Carvalho, Samuel C Fonseca, Ahmed Alamri, et al. 2021. A recommender system based on effort: Towards minimising negative affects and maximising achievement in CS1 learning. In International Conference on Intelligent Tutoring Systems. Springer, 466–480.
André Prisco, Rafael dos Santos, Silvia Botelho, Neilor Tonin, and Jean Bez. 2017. Using information technology for personalizing the computer science teaching. In 2017 IEEE Frontiers in Education Conference (FIE). IEEE, 1–7.
Pedro Santos, Leandro Silva Galvão Carvalho, Elaine Oliveira, and David Fernandes. 2019. Classificação de dificuldade de questões de programação com base na inteligibilidade do enunciado. In Simpósio Brasileiro de Informática na Educação (SBIE), Vol. 30. SBC, Brasília, DF, 1886.
Bernhard Schölkopf, Alex J Smola, Robert C Williamson, and Peter L Bartlett. 2000. New support vector algorithms. Neural computation 12, 5, 1207–1245.
Élrik Souza Silva, Leandro S. Galvão Carvalho, David B. F. Oliveira, Elaine H. T. Oliveira, Tanara Lauschner, Lima Marcos A. P., and Filipe Dwan Pereira. 2022. Previsão de indicadores de dificuldade de questões de programação a partir de métricas extraídas do código de solução. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação (SBIE). SBC, Manaus, AM.
Amelec Viloria, Omar Bonerge Pineda Lezama, and Nohora Mercado-Caruzo. 2020. Unbalanced data processing using oversampling: Machine learning. Procedia Computer Science 175, 108–113.
Leandro Silva Galvão Carvalho, David Braga Fernandes Oliveira, and Bruno Gadelha. 2016. Juiz online como ferramenta de apoio a uma metodologia de ensino híbrido em programação. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), Vol. 27. SBC, Uberlândia, MG, 140–149.
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357.
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794.
C.P. Dancey and J. Reidy. 2007. Statistics Without Maths for Psychology. Pearson/Prentice Hall.
Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. 2017. Relatório Síntese de Área – Ciência da Computação (Bacharelado/Licenciatura). Technical Report. Retrieved August 27, 2022 from [link].
Marcos Avner Pimenta de Lima, Leandro Silva Galvão Carvalho, Elaine Harada Teixeira Oliveira, David Braga Fernandes Oliveira, and Filipe Dwan Pereira. 2021. Uso de atributos de código para classificação da facilidade de questões de codificação. In Anais do Simpósio Brasileiro de Educação em Computação. SBC, SBC, online, 113–122.
Alessandro Di Bucchianico. 2008. Coefficient of determination (R 2). Encyclopedia of statistics in quality and reliability 1.
Tomáš Effenberger, Jaroslav Cechák, and Radek Pelánek. 2019. Difficulty and Complexity of Introductory Programming Problems.
Said Elnaffar. 2016. Using software metrics to predict the difficulty of code writing questions. In 2016 ieee global engineering education conference (educon). IEEE, 513–518.
Rodrigo Elias Francisco, Ana Paula Laboissière Ambrósio, Cleon Xavier Pereira Junior, and Márcia Aparecida Fernandes. 2018. Juiz online no ensino de CS1-lições aprendidas e proposta de uma ferramenta. Revista Brasileira de Informática na Educação 26, 03, 163.
Margherita Grandini, Enrico Bagli, and Giorgio Visani. 2020. Metrics for multi-class classification: An overview. arXiv:2008.05756 [stat.ML]
Peng Liu and Zhizhong Li. 2012. Task complexity: A review and conceptualization framework. International Journal of Industrial Ergonomics 42, 6, 553–568.
Ana Carolina Lorena and André CPLF Carvalho. 2007. Uma introdução às support vector machines. Revista de Informática Teórica e Aplicada 14, 2, 43–67.
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1, 56–67.
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, England.
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering 4, 308–320.
V. Meisalo, E. Sutinen, and S. Torvinen. 2004. Classification of exercises in a virtual programming course. In 34th Annual Frontiers in Education, 2004. FIE 2004. S3D–1.
Leann Myers and Maria J. Sirois. 2006. Spearman Correlation Coefficients, Differences between. In Encyclopedia of Statistical Sciences. John Wiley & Sons, Ltd.
M Z Naser and Amir H Alavi. 2021. Error metrics and performance fitness indicators for artificial intelligence and machine learning in engineering and sciences. Archit. Struct. Constr.
Rodrigo Barros Paes, Romero Malaquias, Márcio Guimarães, and Hyggo Almeida. 2013. Ferramenta para a Avaliação de Aprendizado de Alunos em Programação de Computadores. In Anais dos Workshops do Congresso Brasileiro de Informática na Educação, Vol. 2. SBC, Campinas, SP, 203–212.
Fillipi D Pelz, Elieser A Jesus, and André LA Raabe. 2012. Um mecanismo para correção automática de exercícios práticos de programação introdutória. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educaçao-SBIE), Vol. 23.
Filipe Dwan Pereira, Samuel C Fonseca, Elaine HT Oliveira, Alexandra I Cristea, Henrik Bellhäuser, Luiz Rodrigues, David BF Oliveira, Seiji Isotani, and Leandro SG Carvalho. 2021. Explaining Individual and Collective Programming Students’ Behavior by Interpreting a Black-Box Predictive Model. IEEE Access 9, 117097–117119.
Filipe Dwan Pereira, Hermino BF Junior, Luiz Rodriguez, Armando Toda, Elaine HT Oliveira, Alexandra I Cristea, David BF Oliveira, Leandro SG Carvalho, Samuel C Fonseca, Ahmed Alamri, et al. 2021. A recommender system based on effort: Towards minimising negative affects and maximising achievement in CS1 learning. In International Conference on Intelligent Tutoring Systems. Springer, 466–480.
André Prisco, Rafael dos Santos, Silvia Botelho, Neilor Tonin, and Jean Bez. 2017. Using information technology for personalizing the computer science teaching. In 2017 IEEE Frontiers in Education Conference (FIE). IEEE, 1–7.
Pedro Santos, Leandro Silva Galvão Carvalho, Elaine Oliveira, and David Fernandes. 2019. Classificação de dificuldade de questões de programação com base na inteligibilidade do enunciado. In Simpósio Brasileiro de Informática na Educação (SBIE), Vol. 30. SBC, Brasília, DF, 1886.
Bernhard Schölkopf, Alex J Smola, Robert C Williamson, and Peter L Bartlett. 2000. New support vector algorithms. Neural computation 12, 5, 1207–1245.
Élrik Souza Silva, Leandro S. Galvão Carvalho, David B. F. Oliveira, Elaine H. T. Oliveira, Tanara Lauschner, Lima Marcos A. P., and Filipe Dwan Pereira. 2022. Previsão de indicadores de dificuldade de questões de programação a partir de métricas extraídas do código de solução. In Anais do XXXIII Simpósio Brasileiro de Informática na Educação (SBIE). SBC, Manaus, AM.
Amelec Viloria, Omar Bonerge Pineda Lezama, and Nohora Mercado-Caruzo. 2020. Unbalanced data processing using oversampling: Machine learning. Procedia Computer Science 175, 108–113.
Published
2023-04-24
How to Cite
FERNANDES, Jackson; CARVALHO, Leandro S. G.; OLIVEIRA, David B. F.; OLIVEIRA, Elaine H. T.; PEREIRA, Filipe Dwan; LAUSCHNER, Tanara.
Correlation Between Complexity and Difficulty of Programming Questions in Online Judges. In: BRAZILIAN SYMPOSIUM ON COMPUTING EDUCATION (EDUCOMP), 3. , 2023, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 97-107.
ISSN 3086-0733.
DOI: https://doi.org/10.5753/educomp.2023.228217.
