Modelos Generativos de Linguagem na Construção de Ferramentas de Ensino de Computação com Interface Gráfica

Mateus Otavio Lisboa; Hugo Costa; Pedro Coura; Isabela Freitas; Maria Lúcia Bento Villela; Ricardo Ferreira

doi:10.5753/educomp.2025.4927

Mateus Otavio Lisboa UFV
Hugo Costa UFV
Pedro Coura UFV
Isabela Freitas UFV
Maria Lúcia Bento Villela UFV
Ricardo Ferreira UFV

DOI: https://doi.org/10.5753/educomp.2025.4927

Resumo

Este trabalho realiza uma análise quantitativa e qualitativa de quatro ambientes de modelos de linguagem de grande escala (LLM) para a criação de materiais didáticos voltados ao ensino de computação. As métricas de avaliação incluem erros de compilação, execução e funcionalidade, além do tamanho da requisição, número de interações e número de linhas do código geradas. Os ambientes avaliados são ChatGPT, Claude, Copilot e Gemini. Exploramos a construção de requisições incrementais com diretivas para criação de interfaces de entrada de dados, simulação de algoritmos e saídas com visualizações gráficas e/ou animações. Os exemplos avaliados abrangem diversos domínios, todos documentados e disponibilizados como um conjunto de dados. Os resultados iniciais indicam uma aceleração na criação de ferramentas educacionais interativas e atrativas, com a geração de mais de 240 interfaces funcionais a partir de mais de 350 ensaios, ou seja, uma taxa de sucesso de 84%, excluindo a LLM Gemini, que apresentou baixo desempenho, destacando direções promissoras para uso de LLMs no desenho de interfaces didáticas.

Referências

Al-Shetairy, M., Hindy, H., Khattab, D., and Aref, M. M. (2024). Transformers utilization in chart understanding: A review of recent advances & future trends. arXiv preprint arXiv:2410.13883.

Canesche, M., Bragança, L., Neto, O. P. V., Nacif, J. A., and Ferreira, R. (2021). Google colab cad4u: Hands-on cloud laboratories for digital design. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE.

Chen, B., Zhang, Z., Langrené, N., and Zhu, S. (2023). Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv preprint arXiv:2310.14735.

de Figueiredo, G. A., de Souza, E. S., Rodrigues, J. H., Nacif, J. A., and Ferreira, R. (2024). Desenvolvendo ferramentas para ensino de risc-v com python, verilog, matplotlib, svg e chatgpt. International Journal of Computer Architecture Education, 13(1):43–52.

de Viçosa, U. F. (2024). Material Complementar. [link]. [Online].

Del, M. and Fishel, M. (2022). True detective: a deep abductive reasoning benchmark undoable for gpt-3 and challenging for gpt-4. arXiv preprint arXiv:2212.10114.

Ferreira, R., Canesche, M., Jamieson, P., Neto, O. P. V., and Nacif, J. A. (2024a). Examples and tutorials on using google colab and gradio to create online interactive student-learning modules. Computer Applications in Engineering Education, page e22729.

Ferreira, R., Sabino, C., Canesche, M., Neto, O. P. V., and Nacif, J. A. (2024b). Aiot tool integration for enriching teaching resources and monitoring student engagement. Internet of Things, 26:101045.

Khowaja, S. A., Khuwaja, P., Dev, K., Wang, W., and Nkenyereye, L. (2024). Chatgpt needs spade (sustainability, privacy, digital divide, and ethics) evaluation: A review. Cognitive Computation, pages 1–23.

Kiesler, N. and Schiffner, D. (2023). Large language models in introductory programming education: Chatgpt’s performance and implications for assessments. arXiv preprint arXiv:2308.08572.

Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., and Hajishirzi, H. (2021). Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387.

Logan IV, R. L., Balažević, I., Wallace, E., Petroni, F., Singh, S., and Riedel, S. (2021). Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353.

Sato, Y., Suzuki, A., and Mineshima, K. (2024). Building a large dataset of human-generated captions for science diagrams. In International Conference on Theory and Application of Diagrams, pages 393–401. Springer.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.

Xu, X., Tao, C., Shen, T., Xu, C., Xu, H., Long, G., and Lou, J.-g. (2023). Re-reading improves reasoning in language models. arXiv preprint arXiv:2309.06275.

Yang, Z., Li, L., Wang, J., Lin, K., Azarnasab, E., Ahmed, F., Liu, Z., Liu, C., Zeng, M., and Wang, L. (2023). Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.

Yang, Z. and Zhu, Z. (2024). Heuristic question sequence generation based on retrieval augmentation. Education and Lifelong Development Research.

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., and Narasimhan, K. (2024). Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

Zala, A., Lin, H., Cho, J., and Bansal, M. (2023). Diagrammergpt: Generating open-domain, open-platform diagrams via llm planning. arXiv preprint arXiv:2310.12128.

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., et al. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.