Acting Humanly: Identification and Analysis of Logical Reasoning Biases Exhibited by ChatGPT versus Undergraduate Students
Resumo
Definitions of Artificial Intelligence (AI) include characterizing algorithms as those that: thinking humanly, thinking rationally, acting humanly and acting rationally. On the one hand, Logic, as a formal framework, allows for the creation of algorithms capable of thinking rationally by expressing real world situations in a language that enables valid and rigorous reasoning. On the other hand, Large Language Models, such as ChatGPT, represent algorithms that acting humanly, especially in tasks involving understanding and generating natural language text. However, these models can exhibit logical reasoning biases, which are tendencies that impair the ability to reason logically. This article aims to identify and analyze the logical reasoning biases exhibited by ChatGPT in comparison to those exhibited by Information Technology Undergraduate Students, beginners in the Logic course.
Palavras-chave:
Logical reasoning biases, ChatGPT, Propositional Logic
Referências
Ando, R., Morishita, T., Abe, H., Mineshima, K., and Okada, M. (2023). Evaluating large language models with NeuBAROCO: Syllogistic reasoning ability and human-like biases. In Chatzikyriakidis, S. and de Paiva, V., editors, Proceedings of the 4th Natural Logic Meets Machine Learning Workshop, pages 1–11, Nancy, France. Association for Computational Linguistics.
Bellman, R. (1978). An Introduction to Artificial Intelligence: Can Computers Think? Boyd & Fraser Publishing Company.
Bennett, B. (2012). Logically fallacious: the ultimate collection of over 300 logical fallacies (Academic Edition). eBookIt. com.
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., et al. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53.
Coq, P. (1996). The coq proof assistant-reference manual. INRIA Rocquencourt and ENS Lyon, version, 5.
De Moura, L., Kong, S., Avigad, J., Van Doorn, F., and von Raumer, J. (2015). The lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, pages 378–388. Springer.
Enderton, H. B. (2001). A mathematical introduction to logic. Elsevier.
Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., and Khot, T. (2024). Bias runs deep: Implicit reasoning biases in persona-assigned LLMs. In The Twelfth International Conference on Learning Representations.
Han, S. J., Ransom, K. J., Perfors, A., and Kemp, C. (2024). Inductive reasoning in humans and large language models. Cognitive Systems Research, 83:101155.
Huth, M. and Ryan, M. (2004). Logic in Computer Science: Modelling and Reasoning about Systems (2nd Ed.). Cambridge University Press.
Koubaa, A. (2023). Gpt-4 vs. gpt-3.5: A concise showdown.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., and Zhang, Y. (2023). Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach.
Martins, F., Oliveira, A., Vasconcelos, D., and Menezes, M. (2023). Avaliando a habilidade do chatgpt de realizar provas de dedução natural em lógica proposicional. In Anais do XXXIV Simpósio Brasileiro de Informática na Educação, pages 1282–1292, Porto Alegre, RS, Brasil. SBC.
Nipkow, T., Wenzel, M., and Paulson, L. C. (2002). Isabelle/HOL: a proof assistant for higher-order logic. Springer.
OpenAI (2021). ChatGPT. [link]. Accessed on: April 3, 2024.
Rich, E. and Knight, K. (1991). Artificial Intelligence. Artificial Intelligence Series. McGraw-Hill.
Russell, S. J. and Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson.
Saparov, A., Pang, R. Y., Padmakumar, V., Joshi, N., Kazemi, M., Kim, N., and He, H. (2023). Testing the general deductive reasoning capacity of large language models using ood examples. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 3083–3105. Curran Associates, Inc.
Shikishima, C., Hiraishi, K., Yamagata, S., Sugimoto, Y., Takemura, R., Ozaki, K., Okada, M., Toda, T., and Ando, J. (2009). Is g an entity? a japanese twin study using syllogisms and intelligence tests. Intelligence, 37(3):256–267.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. (2023). Llama: Open and efficient foundation language models.
Tversky, A., Kahneman, D., and Slovic, P. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge.
Whitehead, A. and Russell, B. (1927). Principia Mathematica. Number v. 1 in Cambridge mathematical library. Cambridge University Press.
Winston, P. H. (1992). Artificial intelligence (3rd ed.). Addison-Wesley Longman Publishing Co., Inc., USA.
Yang, K., Swope, A., Gu, A., Chalamala, R., Song, P., Yu, S., Godil, S., Prenger, R. J., and Anandkumar, A. (2024). Leandojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems, 36.
Zhang, M. and Li, J. (2021). A commentary of gpt-3 in mit technology review 2021. Fundamental Research, 1(6):831–833.
Bellman, R. (1978). An Introduction to Artificial Intelligence: Can Computers Think? Boyd & Fraser Publishing Company.
Bennett, B. (2012). Logically fallacious: the ultimate collection of over 300 logical fallacies (Academic Edition). eBookIt. com.
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., et al. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53.
Coq, P. (1996). The coq proof assistant-reference manual. INRIA Rocquencourt and ENS Lyon, version, 5.
De Moura, L., Kong, S., Avigad, J., Van Doorn, F., and von Raumer, J. (2015). The lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, pages 378–388. Springer.
Enderton, H. B. (2001). A mathematical introduction to logic. Elsevier.
Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., and Khot, T. (2024). Bias runs deep: Implicit reasoning biases in persona-assigned LLMs. In The Twelfth International Conference on Learning Representations.
Han, S. J., Ransom, K. J., Perfors, A., and Kemp, C. (2024). Inductive reasoning in humans and large language models. Cognitive Systems Research, 83:101155.
Huth, M. and Ryan, M. (2004). Logic in Computer Science: Modelling and Reasoning about Systems (2nd Ed.). Cambridge University Press.
Koubaa, A. (2023). Gpt-4 vs. gpt-3.5: A concise showdown.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., and Zhang, Y. (2023). Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach.
Martins, F., Oliveira, A., Vasconcelos, D., and Menezes, M. (2023). Avaliando a habilidade do chatgpt de realizar provas de dedução natural em lógica proposicional. In Anais do XXXIV Simpósio Brasileiro de Informática na Educação, pages 1282–1292, Porto Alegre, RS, Brasil. SBC.
Nipkow, T., Wenzel, M., and Paulson, L. C. (2002). Isabelle/HOL: a proof assistant for higher-order logic. Springer.
OpenAI (2021). ChatGPT. [link]. Accessed on: April 3, 2024.
Rich, E. and Knight, K. (1991). Artificial Intelligence. Artificial Intelligence Series. McGraw-Hill.
Russell, S. J. and Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson.
Saparov, A., Pang, R. Y., Padmakumar, V., Joshi, N., Kazemi, M., Kim, N., and He, H. (2023). Testing the general deductive reasoning capacity of large language models using ood examples. In Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 3083–3105. Curran Associates, Inc.
Shikishima, C., Hiraishi, K., Yamagata, S., Sugimoto, Y., Takemura, R., Ozaki, K., Okada, M., Toda, T., and Ando, J. (2009). Is g an entity? a japanese twin study using syllogisms and intelligence tests. Intelligence, 37(3):256–267.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. (2023). Llama: Open and efficient foundation language models.
Tversky, A., Kahneman, D., and Slovic, P. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge.
Whitehead, A. and Russell, B. (1927). Principia Mathematica. Number v. 1 in Cambridge mathematical library. Cambridge University Press.
Winston, P. H. (1992). Artificial intelligence (3rd ed.). Addison-Wesley Longman Publishing Co., Inc., USA.
Yang, K., Swope, A., Gu, A., Chalamala, R., Song, P., Yu, S., Godil, S., Prenger, R. J., and Anandkumar, A. (2024). Leandojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems, 36.
Zhang, M. and Li, J. (2021). A commentary of gpt-3 in mit technology review 2021. Fundamental Research, 1(6):831–833.
Publicado
17/11/2024
Como Citar
OLIVEIRA, Augusto C. A.; MARTINS, Francisco L. B.; VASCONCELOS, Davi R.; MENEZES, Maria V..
Acting Humanly: Identification and Analysis of Logical Reasoning Biases Exhibited by ChatGPT versus Undergraduate Students. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 647-658.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245191.