A Proposal for a Scale to Assess Machine Learning at Create stage in K-12
Abstract
There is a trend to include the teaching of Machine Learning (ML) in K-12, teaching students to create their own intelligent solutions. In this context, it is also important to assess the students’ learning of ML and the Design Thinking process at the Create level. Evolving an assessment model, the aim of this article is to propose a scale with pedagogical interpretation using Item Response Theory (IRT). The results provide a first indication of the suitability of the assessment model in terms of internal consistency and IRT calibration parameters that are very close to acceptable. We expect the definition of the scale to support the learning of the creation of ML solutions by providing feedback to students and teachers.
References
Alves, N. C., Gresse von Wangenheim, C., Hauck, J. C. R., and Borgatto, A. F. (2020). A Large-scale Evaluation of a Rubric for the Automatic Assessment of Algorithms and Programming Concepts. In Proc. of ACM Technical Symposium on Computer Science Education. Portland, OR, USA.
Alves, N. C. (2023). Assessing the Creativity of Mobile Applications in Computing Education. PhD Thesis, PPGCC/UFSC, Brazil.
Beaton, A., and Allen, N. (1992), Interpreting Scales Through Scale Anchoring. Journal of Educational Statistics, 17(2).
Bennett, R. E., and von Davier, M. (2017). Advancing human assessment: The methodological, psychological and policy contributions of ETS. Switzerland: Springer Nature.
Brown, T. (2008). Design thinking. Harvard business review, 86(6).
Brown, T. A. (2015). Confirmatory factor analysis for applied research. Second edition. New York, USA: The Guilford Press.
Camada M. Y. and Durães G. M., (2020). Ensino da Inteligência Artificial na Educação Básica: um novo horizonte para as pesquisas brasileiras. Proc. of Simpósio Brasileiro de Informática na Educação, online, Brazil.
Cappelleri, J. C., Jason Lundy, J., and Hays R. D. (2014). Overview of Classical Test Theory and Item Response Theory for the Quantitative Assessment of Items in Developing Patient-Reported Outcomes Measures. Clinical Therapeutics, 36(5).
Caruso A. L. M. and Cavalheiro S. A. da C., (2021). Integração entre Pensamento Computacional e Inteligência Artificial: uma Revisão Sistemática de Literatura. Proc. of Simpósio Brasileiro de Informática na Educação, online, Brazil.
Chalmers, Robert P. (2012), mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software. 48(6).
CnE. (2024), Computação na Escola. Retrieved 22/05/2024 from [link]
De Ayala, R. J. (2022), The Theory and Practice of Item Response Theory. Second edition. Guilford Press, New York, NY, USA.
DeVellis, R. F. (2017), Scale development: theory and applications. Fourth edition. Los Angeles, USA: SAGE.
Google (2023). Google Teachable Machine. Retrieved 01/06/2023 from [link].
Gresse von Wangenheim, C. G. von, Hauck, J. C. R., Demetrio, M. F., Pelle, R., Cruz Alves, N. da, Barbosa, H. and Azevedo, L. F. (2018), CodeMaster—Automatic Assessment and Grading of App Inventor and Snap! Programs. Informatics in Education, 17(1).
Gresse von Wangenheim C., Alves N. da C., Rauber M. F., Hauck J. C. R., and Yeter I. H. (2021). A Proposal for Performance-based Assessment of the Learning of Machine Learning Concepts and Practices in K-12. Informatics in Education, 21(3).
House of Lords (2018), AI in the UK: ready, willing and able. HL Paper 100, London, UK.
IVG. (2023). Instituto Pe. Vilson Groh. Retrieved 22/05/2024 from [link]
Kandlhofer, M., Steinbauer, G., Hirschmugl-Gaisch, S., and Huber, P. (2016), Artificial intelligence and computer science in education: From kindergarten to university. Proc. of the Frontiers in Education Conference, Erie, PA, USA, 1–9.
Lee, I., Martin, F., Denner, J., Coulter, B., Allan, W., Erickson, J., Malyn-Smith, J., and Werner, L. (2011). Computational thinking for youth in practice. ACM Inroads, 2(1).
Lima, A. L. S. (2023). Automated assessment of the visual aesthetics of App Inventor user interfaces with Deep Learning. PhD Thesis, PPGCC/UFSC, Brazil.
Long, D. and Magerko, B. (2020), What is AI literacy? Competencies and design considerations. Proc. of the Conf. on Human Factors in Computing Systems, Honolulu, HI, USA.
Martins, R. M. and Gresse von Wangenheim, C, (2023). Findings on Teaching Machine Learning in High School: A Ten - Year Systematic Literature Review. Informatics in Education, 22 (3).
Mislevy R. J., Almond R. G., and Lukas J. F., (2003), A Brief Introduction to Evidence-Centered Design. ETS Research Report Series, 2003(1).
Mislevy, R. J. (2012), Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining. Design and Discovery in Educational Assessment: Evidence-Centered Design. Psychometrics, and Educational Data Mining, 4(1).
Morrison, G. R., Ross, S. M., Morrison, J. R., and Kalman, H. K., (2019), Designing effective instruction. Eighth editioned. Hoboken, NJ: Wiley.
Moskal B. M. and Leydens J. A., (2000), Scoring rubric development: Validity and reliability. Practical assessment, research, and evaluation, 7(1).
MEC (2022), Normas sobre Computação na Educação Básica – Complemento à Base Nacional Comum Curricular (BNCC). Parecer 02/2022 CNE/CEB/MEC.
Oliveira, F. P., Gresse von Wangenheim, C., and Hauck, J. C. R. (2022). TMIC: App Inventor Extension for the Deployment of Image Classification Models Exported from Teachable Machine. arXiv:2208.1263
Paek, I., and Cole, K. (2020), Using R for Item Response Theory Model Applications. New York, NY, USA: Routledge.
R Core Team. (2022), R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [link].
Rauber M. F. and Gresse Von Wangenheim C., (2022), Assessing the Learning of Machine Learning in K-12: A Ten-Year Systematic Mapping. Informatics in Education, 22(2).
Rauber, M. F., Gresse von Wangenheim, C., Barbetta, P. A., Borgatto, A. F., Martins, R. M. and Hauck, J. R. (2023). Reliability and Validity of an Automated Model for Assessing the Learning of Machine Learning in Middle and High School: Experiences from the “ML for All!” course. Informatics in Education, 23(2), online.
Rauber, M. F. and Gresse von Wangenheim, C., (2023), Uma proposta para avaliação do desempenho de aprendizagem de conceitos e práticas de Machine Learning em nível Create na Educação Básica. In Proc. of Simpósio Brasileiro de Informática na Educação, SBC, Passo Fundo, Brazil.
Revelle, W. (2022), psych: Procedures for Personality and Psychological Research. Northwestern University, Evanston, USA. [link].
Royal Society, (2017), Machine learning: the power and promise of computers that learn by example. Retrieved 01/06/2022 from [link].
Samejima, F. (1969), Estimation of latent ability using a response of graded scores. Monograph 17. Psychometrika, 34(2).
Samejima, F. (1997), Graded response model. Handbook of Modern Item Response Theory. New York, NY, USA: Springer.
Seeratan, K. L., and Mislevy, R. J. (2008), Design patterns for assessing internal knowledge representations. Menlo Park, USA: SRI International.
Solecki, I., Porto, J., Alves, N. D. C., Gresse von Wangenheim, C., Hauck, J., and Borgatto, A. F. (2020). Automated Assessment of the Visual Design of Android Apps Developed with App Inventor. In Proc. of ACM Technical Symposium on Computer Science Education. Portland, OR, USA.
Touretzky, D., Gardner-McCune, C., Martin, F., and Seehorn D. (2019). Envisioning AI for K-12: What Should Every Child Know about AI? Proc. of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.
Trochim, W. M. K., and Donnelly, J. P. (2008). The research methods knowledge base. Third edition. Mason: Atomic Dog/Cengage Learning.
UNESCO (2022). K-12 AI curricula: a mapping of government-endorsed AI curricula. Retrieved 06/06/2022 from [link]
