Proposta de Escala para Avaliação da Aprendizagem de Machine Learning em nível Create na Educação Básica

Marcelo Fernando Rauber; Christiane Gresse von Wangenheim; Adriano F. Borgatto; Ramon Mayor Martins; Deise M. Arndt; Jean Carlo Rossa Hauck

doi:10.5753/sbie.2024.241471

Marcelo Fernando Rauber Universidade Federal de Santa Catarina (UFSC) / Instituto Federal Catarinense (IFC) http://orcid.org/0000-0001-5653-7155
Christiane Gresse von Wangenheim Universidade Federal de Santa Catarina (UFSC) https://orcid.org/0000-0002-6566-1606
Adriano F. Borgatto Universidade Federal de Santa Catarina (UFSC) https://orcid.org/0000-0001-6280-2525
Ramon Mayor Martins Universidade Federal de Santa Catarina (UFSC) https://orcid.org/0000-0002-1952-0909
Deise M. Arndt Universidade Federal de Santa Catarina (UFSC) https://orcid.org/0009-0002-4754-6097
Jean Carlo Rossa Hauck Universidade Federal de Santa Catarina (UFSC) https://orcid.org/0000-0001-6550-9092

##plugins.pubIds.doi.readerDisplayName## https://doi.org/10.5753/sbie.2024.241471

Resumen

Há uma tendência de incluir o ensino de Machine Learning (ML) já na Educação Básica, ensinando os alunos a criar suas próprias soluções inteligentes. Nesse contexto, também é importante avaliar o aprendizado de ML e do processo de Design Thinking em nível Create. Evoluindo um modelo de avaliação, este artigo tem como objetivo propor uma escala com sua interpretação pedagógica utilizando a Teoria de Resposta ao Item (TRI). Os resultados fornecem uma primeira indicação da adequação do modelo de avaliação em relação à consistência interna e parâmetros de calibração da TRI muito próximos aos aceitáveis. Esperamos que a definição da escala possa apoiar o aprendizado da criação de soluções de ML fornecendo feedback aos alunos e professores.

Palabras clave: Avaliação, Escala, Educação Basica, Machine Learning, Classificação de Imagens, Teoria de Resposta ao Item

Citas

Almeida, B. C. S. (2022). Desenvolvimento de um Curso Ensinando a Criação de Apps Inteligentes para a Classificação de Imagens com Machine Learning e Design Thinking. Trabalho de conclusão de curso, Graduação em Sistemas de Informação/ UFSC, Brasil.

Alves, N. C., Gresse von Wangenheim, C., Hauck, J. C. R., and Borgatto, A. F. (2020). A Large-scale Evaluation of a Rubric for the Automatic Assessment of Algorithms and Programming Concepts. In Proc. of ACM Technical Symposium on Computer Science Education. Portland, OR, USA.

Alves, N. C. (2023). Assessing the Creativity of Mobile Applications in Computing Education. PhD Thesis, PPGCC/UFSC, Brazil.

Beaton, A., and Allen, N. (1992), Interpreting Scales Through Scale Anchoring. Journal of Educational Statistics, 17(2).

Bennett, R. E., and von Davier, M. (2017). Advancing human assessment: The methodological, psychological and policy contributions of ETS. Switzerland: Springer Nature.

Brown, T. (2008). Design thinking. Harvard business review, 86(6).

Brown, T. A. (2015). Confirmatory factor analysis for applied research. Second edition. New York, USA: The Guilford Press.

Camada M. Y. and Durães G. M., (2020). Ensino da Inteligência Artiﬁcial na Educação Básica: um novo horizonte para as pesquisas brasileiras. Proc. of Simpósio Brasileiro de Informática na Educação, online, Brazil.

Cappelleri, J. C., Jason Lundy, J., and Hays R. D. (2014). Overview of Classical Test Theory and Item Response Theory for the Quantitative Assessment of Items in Developing Patient-Reported Outcomes Measures. Clinical Therapeutics, 36(5).

Caruso A. L. M. and Cavalheiro S. A. da C., (2021). Integração entre Pensamento Computacional e Inteligência Artiﬁcial: uma Revisão Sistemática de Literatura. Proc. of Simpósio Brasileiro de Informática na Educação, online, Brazil.

Chalmers, Robert P. (2012), mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software. 48(6).

CnE. (2024), Computação na Escola. Retrieved 22/05/2024 from [link]

De Ayala, R. J. (2022), The Theory and Practice of Item Response Theory. Second edition. Guilford Press, New York, NY, USA.

DeVellis, R. F. (2017), Scale development: theory and applications. Fourth edition. Los Angeles, USA: SAGE.

Google (2023). Google Teachable Machine. Retrieved 01/06/2023 from [link].

Gresse von Wangenheim, C. G. von, Hauck, J. C. R., Demetrio, M. F., Pelle, R., Cruz Alves, N. da, Barbosa, H. and Azevedo, L. F. (2018), CodeMaster—Automatic Assessment and Grading of App Inventor and Snap! Programs. Informatics in Education, 17(1).

Gresse von Wangenheim C., Alves N. da C., Rauber M. F., Hauck J. C. R., and Yeter I. H. (2021). A Proposal for Performance-based Assessment of the Learning of Machine Learning Concepts and Practices in K-12. Informatics in Education, 21(3).

House of Lords (2018), AI in the UK: ready, willing and able. HL Paper 100, London, UK.

IVG. (2023). Instituto Pe. Vilson Groh. Retrieved 22/05/2024 from [link]

Kandlhofer, M., Steinbauer, G., Hirschmugl-Gaisch, S., and Huber, P. (2016), Artificial intelligence and computer science in education: From kindergarten to university. Proc. of the Frontiers in Education Conference, Erie, PA, USA, 1–9.

Lee, I., Martin, F., Denner, J., Coulter, B., Allan, W., Erickson, J., Malyn-Smith, J., and Werner, L. (2011). Computational thinking for youth in practice. ACM Inroads, 2(1).

Lima, A. L. S. (2023). Automated assessment of the visual aesthetics of App Inventor user interfaces with Deep Learning. PhD Thesis, PPGCC/UFSC, Brazil.

Long, D. and Magerko, B. (2020), What is AI literacy? Competencies and design considerations. Proc. of the Conf. on Human Factors in Computing Systems, Honolulu, HI, USA.

Martins, R. M. and Gresse von Wangenheim, C, (2023). Findings on Teaching Machine Learning in High School: A Ten - Year Systematic Literature Review. Informatics in Education, 22 (3).

Mislevy R. J., Almond R. G., and Lukas J. F., (2003), A Brief Introduction to Evidence-Centered Design. ETS Research Report Series, 2003(1).

Mislevy, R. J. (2012), Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining. Design and Discovery in Educational Assessment: Evidence-Centered Design. Psychometrics, and Educational Data Mining, 4(1).

Morrison, G. R., Ross, S. M., Morrison, J. R., and Kalman, H. K., (2019), Designing effective instruction. Eighth editioned. Hoboken, NJ: Wiley.

Moskal B. M. and Leydens J. A., (2000), Scoring rubric development: Validity and reliability. Practical assessment, research, and evaluation, 7(1).

MEC (2022), Normas sobre Computação na Educação Básica – Complemento à Base Nacional Comum Curricular (BNCC). Parecer 02/2022 CNE/CEB/MEC.

Oliveira, F. P., Gresse von Wangenheim, C., and Hauck, J. C. R. (2022). TMIC: App Inventor Extension for the Deployment of Image Classification Models Exported from Teachable Machine. arXiv:2208.1263

Paek, I., and Cole, K. (2020), Using R for Item Response Theory Model Applications. New York, NY, USA: Routledge.

R Core Team. (2022), R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [link].

Rauber M. F. and Gresse Von Wangenheim C., (2022), Assessing the Learning of Machine Learning in K-12: A Ten-Year Systematic Mapping. Informatics in Education, 22(2).

Rauber, M. F., Gresse von Wangenheim, C., Barbetta, P. A., Borgatto, A. F., Martins, R. M. and Hauck, J. R. (2023). Reliability and Validity of an Automated Model for Assessing the Learning of Machine Learning in Middle and High School: Experiences from the “ML for All!” course. Informatics in Education, 23(2), online.

Rauber, M. F. and Gresse von Wangenheim, C., (2023), Uma proposta para avaliação do desempenho de aprendizagem de conceitos e práticas de Machine Learning em nível Create na Educação Básica. In Proc. of Simpósio Brasileiro de Informática na Educação, SBC, Passo Fundo, Brazil.

Revelle, W. (2022), psych: Procedures for Personality and Psychological Research. Northwestern University, Evanston, USA. [link].

Royal Society, (2017), Machine learning: the power and promise of computers that learn by example. Retrieved 01/06/2022 from [link].

Samejima, F. (1969), Estimation of latent ability using a response of graded scores. Monograph 17. Psychometrika, 34(2).

Samejima, F. (1997), Graded response model. Handbook of Modern Item Response Theory. New York, NY, USA: Springer.

Seeratan, K. L., and Mislevy, R. J. (2008), Design patterns for assessing internal knowledge representations. Menlo Park, USA: SRI International.

Solecki, I., Porto, J., Alves, N. D. C., Gresse von Wangenheim, C., Hauck, J., and Borgatto, A. F. (2020). Automated Assessment of the Visual Design of Android Apps Developed with App Inventor. In Proc. of ACM Technical Symposium on Computer Science Education. Portland, OR, USA.

Touretzky, D., Gardner-McCune, C., Martin, F., and Seehorn D. (2019). Envisioning AI for K-12: What Should Every Child Know about AI? Proc. of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.

Trochim, W. M. K., and Donnelly, J. P. (2008). The research methods knowledge base. Third edition. Mason: Atomic Dog/Cengage Learning.

UNESCO (2022). K-12 AI curricula: a mapping of government-endorsed AI curricula. Retrieved 06/06/2022 from [link]