skip to main content
10.1145/3613372.3614197acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

Published:25 September 2023Publication History

ABSTRACT

As a way of addressing increasingly sophisticated problems, software professionals face the constant challenge of seeking improvement. However, for these individuals to enhance their skills, their process of studying and training must involve feedback that is both immediate and accurate. In the context of software companies, where the scale of professionals undergoing training is large, but the number of qualified professionals available for providing corrections is small, delivering effective feedback becomes even more challenging. To circumvent this challenge, this work presents an exploration of using Large Language Models (LLMs) to support the correction process of open-ended questions in technical training.

In this study, we utilized ChatGPT to correct open-ended questions answered by 42 industry professionals on two topics. Evaluating the corrections and feedback provided by ChatGPT, we observed that it is capable of identifying semantic details in responses that other metrics cannot observe. Furthermore, we noticed that, in general, subject matter experts tended to agree with the corrections and feedback given by ChatGPT.

References

  1. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.Google ScholarGoogle Scholar
  2. Andrew Begel and Beth Simon. 2008. Novice Software Developers, All over Again. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER ’08). Association for Computing Machinery, New York, NY, USA, 3–14. https://doi.org/10.1145/1404520.1404522Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jan Philip Bernius, Stephan Krusche, and Bernd Bruegge. 2022. Machine learning based feedback on textual student answers in large courses. Computers and Education: Artificial Intelligence 3 (2022), 100081.Google ScholarGoogle ScholarCross RefCross Ref
  4. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.Google ScholarGoogle Scholar
  5. Sarah Gielen, Elien Peeters, Filip Dochy, Patrick Onghena, and Katrien Struyven. 2010. Improving the effectiveness of peer feedback for learning. Learning and instruction 20, 4 (2010), 304–315.Google ScholarGoogle Scholar
  6. Antonio Hernández-Blanco, Boris Herrera-Flores, David Tomás, and Borja Navarro-Colorado. 2019. A systematic review of deep learning approaches to educational data mining. Complexity 2019 (2019).Google ScholarGoogle Scholar
  7. Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274.Google ScholarGoogle ScholarCross RefCross Ref
  9. Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022).Google ScholarGoogle Scholar
  10. Alfirna Rizqi Lahitani, Adhistya Erna Permanasari, and Noor Akhmad Setiawan. 2016. Cosine similarity to determine similarity measure: Study case in online essay assessment. In 2016 4th International Conference on Cyber and IT Service Management. IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  11. Baoli Li and Liping Han. 2013. Distance weighted cosine similarity measure for text classification. In Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14. Springer, 611–618.Google ScholarGoogle Scholar
  12. Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S Yu, and Lifang He. 2020. A survey on text classification: From shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020).Google ScholarGoogle Scholar
  13. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vivian Liu and Lydia B Chilton. 2022. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xiaofei Ma, Zhiguo Wang, Patrick Ng, Ramesh Nallapati, and Bing Xiang. 2019. Universal text representation from bert: An empirical study. arXiv preprint arXiv:1910.07973 (2019).Google ScholarGoogle Scholar
  16. Guido Makransky, Malene Warming Thisgaard, and Helen Gadegaard. 2016. Virtual simulations as preparation for lab exercises: Assessing learning of key laboratory skills in microbiology and improvement of essential non-cognitive skills. PloS one 11, 6 (2016), e0155895.Google ScholarGoogle ScholarCross RefCross Ref
  17. Steven Moore, Huy A Nguyen, Norman Bier, Tanvi Domadia, and John Stamper. 2022. Assessing the Quality of Student-Generated Short Answer Questions Using GPT-3. In Educating for a New Future: Making Sense of Technology-Enhanced Learning Adoption: 17th European Conference on Technology Enhanced Learning, EC-TEL 2022, Toulouse, France, September 12–16, 2022, Proceedings. Springer, 243–257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023).Google ScholarGoogle Scholar
  20. Thomas Scialom and Jacopo Staiano. 2019. Ask to learn: A study on curiosity-driven question generation. arXiv preprint arXiv:1911.03350 (2019).Google ScholarGoogle Scholar
  21. Ritika Singh and Satwinder Singh. 2021. Text similarity measures in news articles by vector space model using NLP. Journal of The Institution of Engineers (India): Series B 102 (2021), 329–338.Google ScholarGoogle ScholarCross RefCross Ref
  22. Robert McCaughan Smith. 1982. Learning how to learn: Applied theory for adults. Open University Press Great Britain.Google ScholarGoogle Scholar
  23. Marieke Thurlings, Marjan Vermeulen, Theo Bastiaens, and Sjef Stijnen. 2013. Understanding feedback: A learning theory perspective. Educational Research Review 9 (2013), 1–15.Google ScholarGoogle ScholarCross RefCross Ref
  24. Phillip D Tomporowski, Bryan McCullick, Daniel M Pendleton, and Caterina Pesce. 2015. Exercise and children’s cognition: The role of exercise characteristics and a place for metacognition. Journal of Sport and Health Science 4, 1 (2015), 47–55.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  26. Regina Vollmeyer and Falko Rheinberg. 2005. A surprising effect of feedback on learning. Learning and instruction 15, 6 (2005), 589–602.Google ScholarGoogle Scholar
  27. Sam Witteveen and Martin Andrews. 2019. Paraphrasing with large language models. arXiv preprint arXiv:1911.09661 (2019).Google ScholarGoogle Scholar
  28. Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).Google ScholarGoogle Scholar
  29. Mengxiao Zhu, Ou Lydia Liu, and Hee-Sun Lee. 2020. The effect of automated feedback on revision behavior and learning gains in formative assessment of scientific argument writing. Computers & Education 143 (2020), 103668.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SBES '23: Proceedings of the XXXVII Brazilian Symposium on Software Engineering
      September 2023
      570 pages
      ISBN:9798400707872
      DOI:10.1145/3613372

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 September 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate147of427submissions,34%
    • Article Metrics

      • Downloads (Last 12 months)376
      • Downloads (Last 6 weeks)84

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format