Avaliando a habilidade do ChatGPT de realizar provas de Dedução Natural em Lógica Proposicional

Francisco Leonardo Batista Martins; Augusto César Araújo de Oliveira; Davi Romero de Vasconcelos; Maria Viviane de Menezes

doi:10.5753/sbie.2023.234658

Francisco Leonardo Batista Martins UFC
Augusto César Araújo de Oliveira UFC
Davi Romero de Vasconcelos UFC
Maria Viviane de Menezes UFC

DOI: https://doi.org/10.5753/sbie.2023.234658

Resumo

A utilização de agentes conversacionais, também conhecidos como chatbots, na educação tem despertado um crescente interesse de pesquisadores, educadores e instituições de ensino em todo o mundo. Esses sistemas têm a capacidade de compreender e processar grandes volumes de dados, oferecendo suporte individualizado aos alunos. No entanto, é importante considerar que esses sistemas podem gerar respostas incorretas em tarefas que envolvem raciocínio lógico. Este artigo tem como objetivo avaliar a habilidade do agente conversacional ChatGPT na resolução de exercícios de Dedução Natural em lógica proposicional. O estudo busca verificar se o ChatGPT é uma ferramenta adequada para essa tarefa. Para isso, são realizados experimentos utilizando uma base de dados de exercícios de dedução natural em lógica proposicional. Esse estudo busca contribuir para a compreensão das capacidades e limitações dos agentes conversacionais em habilidades de raciocínio lógico.

Referências

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.

Carl, M. (2023). Using large language models for (de-) formalization and natural argumentation exercises for beginner’s students. arXiv preprint arXiv:2304.06186.

Enderton, H. B. (1972). A mathematical introduction to logic. Academic Press.

Huth, M. and Ryan, M. (2004). Logic in Computer Science: Modelling and Reasoning about Systems (2nd Ed.). Cambridge University Press.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. (2023). Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274.

Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., and Zhang, Y. (2023). Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439.

Liu, J., Cui, L., Liu, H., Huang, D., Wang, Y., and Zhang, Y. (2020). Logiqa: A challenge dataset for machine reading comprehension with logical reasoning. arXiv preprint arXiv:2007.08124.

OpenAI (2021). ChatGPT. https://openai.com/research/chatgpt. Acesso em: 13 de junho de 2023.

Pelletier, F. J. (1999). A brief history of natural deduction. History and Philosophy of Logic, 20(1):1–31.

Pelletier, F. J. (2000). A history of natural deduction and elementary logic textbooks. Logical consequence: Rival approaches, 1:105–138.

Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education, Inc.

Tlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., and Agyemang, B. (2023). What if the devil is my guardian angel: Chatgpt as a case study of using chatbots in education. Smart Learning Environments, 10(1):15.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

vom Scheidt, G. (2023). Experimental results from applying gpt-4 to an unpublished formal language. arXiv e-prints, pages arXiv–2305.

Weber, F., Wambsganss, T., Rüttimann, D., and Söllner, M. (2021). Pedagogical agents for interactive learning: A taxonomy of conversational agents in education. In Forty-Second International Conference on Information Systems. Austin, Texas, pages 1–17.

Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., Shen, Y., et al. (2023). A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. arXiv preprint arXiv:2303.10420.

Yu, W., Jiang, Z., Dong, Y., and Feng, J. (2020). Reclor: A reading comprehension dataset requiring logical reasoning. arXiv preprint arXiv:2002.04326.