Generative Language Models in the Development of Computing Education Tools with a Graphical Interface

  • Mateus Otavio Lisboa UFV
  • Hugo Costa UFV
  • Pedro Coura UFV
  • Isabela Freitas UFV
  • Maria Lúcia Bento Villela UFV
  • Ricardo Ferreira UFV

Abstract


This work performs a quantitative and qualitative analysis of four large language model (LLM) environments for creating educational materials focused on teaching computer science. Evaluation metrics include compilation, execution, and functionality errors, as well as request size, number of interactions, and number of lines of code generated. The evaluated environments are ChatGPT, Claude, Copilot, and Gemini. We explore the construction of incremental requests with directives for creating data input interfaces, algorithm simulations, and outputs with graphical visualizations and/or animations. The evaluated examples cover various domains. Initial results indicate an acceleration in the creation of interactive and attractive educational tools, with the generation of more than 240 functional interfaces from more than 350 trials, representing a success rate of 84%, excluding Gemini LLM, which showed poor performance, highlighting promising directions for using LLMs in the design of educational interfaces.

References

Al-Shetairy, M., Hindy, H., Khattab, D., and Aref, M. M. (2024). Transformers utilization in chart understanding: A review of recent advances & future trends. arXiv preprint arXiv:2410.13883.

Canesche, M., Bragança, L., Neto, O. P. V., Nacif, J. A., and Ferreira, R. (2021). Google colab cad4u: Hands-on cloud laboratories for digital design. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE.

Chen, B., Zhang, Z., Langrené, N., and Zhu, S. (2023). Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv preprint arXiv:2310.14735.

de Figueiredo, G. A., de Souza, E. S., Rodrigues, J. H., Nacif, J. A., and Ferreira, R. (2024). Desenvolvendo ferramentas para ensino de risc-v com python, verilog, matplotlib, svg e chatgpt. International Journal of Computer Architecture Education, 13(1):43–52.

de Viçosa, U. F. (2024). Material Complementar. [link]. [Online].

Del, M. and Fishel, M. (2022). True detective: a deep abductive reasoning benchmark undoable for gpt-3 and challenging for gpt-4. arXiv preprint arXiv:2212.10114.

Ferreira, R., Canesche, M., Jamieson, P., Neto, O. P. V., and Nacif, J. A. (2024a). Examples and tutorials on using google colab and gradio to create online interactive student-learning modules. Computer Applications in Engineering Education, page e22729.

Ferreira, R., Sabino, C., Canesche, M., Neto, O. P. V., and Nacif, J. A. (2024b). Aiot tool integration for enriching teaching resources and monitoring student engagement. Internet of Things, 26:101045.

Khowaja, S. A., Khuwaja, P., Dev, K., Wang, W., and Nkenyereye, L. (2024). Chatgpt needs spade (sustainability, privacy, digital divide, and ethics) evaluation: A review. Cognitive Computation, pages 1–23.

Kiesler, N. and Schiffner, D. (2023). Large language models in introductory programming education: Chatgpt’s performance and implications for assessments. arXiv preprint arXiv:2308.08572.

Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., and Hajishirzi, H. (2021). Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387.

Logan IV, R. L., Balažević, I., Wallace, E., Petroni, F., Singh, S., and Riedel, S. (2021). Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353.

Sato, Y., Suzuki, A., and Mineshima, K. (2024). Building a large dataset of human-generated captions for science diagrams. In International Conference on Theory and Application of Diagrams, pages 393–401. Springer.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.

Xu, X., Tao, C., Shen, T., Xu, C., Xu, H., Long, G., and Lou, J.-g. (2023). Re-reading improves reasoning in language models. arXiv preprint arXiv:2309.06275.

Yang, Z., Li, L., Wang, J., Lin, K., Azarnasab, E., Ahmed, F., Liu, Z., Liu, C., Zeng, M., and Wang, L. (2023). Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.

Yang, Z. and Zhu, Z. (2024). Heuristic question sequence generation based on retrieval augmentation. Education and Lifelong Development Research.

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., and Narasimhan, K. (2024). Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

Zala, A., Lin, H., Cho, J., and Bansal, M. (2023). Diagrammergpt: Generating open-domain, open-platform diagrams via llm planning. arXiv preprint arXiv:2310.12128.

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., et al. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
Published
2025-04-07
LISBOA, Mateus Otavio; COSTA, Hugo; COURA, Pedro; FREITAS, Isabela; VILLELA, Maria Lúcia Bento; FERREIRA, Ricardo. Generative Language Models in the Development of Computing Education Tools with a Graphical Interface. In: BRAZILIAN SYMPOSIUM ON COMPUTING EDUCATION (EDUCOMP), 5. , 2025, Juiz de Fora/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 639-650. ISSN 3086-0733. DOI: https://doi.org/10.5753/educomp.2025.4927.