The Winograd Schemas from Hell

Fábio Cozman; Hugo Munhoz

doi:10.5753/eniac.2020.12157

Fábio Cozman Universidade de São Paulo
Hugo Munhoz Universidade de São Paulo

DOI: https://doi.org/10.5753/eniac.2020.12157

Resumo

The Winograd Challenge has been advocated as a test of computer understanding with respect to commonsense reasoning. The challenge is based on Winograd Schemas: sentences that contain correferential ambiguities. Most Winograd Schemas are relatively easy for human subjects, and today the best computer systems for the Winograd Challenge can work close to human performance. In this paper, we examine the assumptions behind the Winograd Challenge, and investigate how far we can push the difficulty level of Winograd Schemas, proposing various strategies to build really challenging schemas.

Palavras-chave: Winograd Challenge, Commonsense Reasoning, Natural Language Processing

Referências

Bennett, C. H. and Gardner, M. (1979). The random number omega bids fair to hold the mysteries of the universe. Scientific American, 241:20–34.

Calude, C. S., Dinneen, M. J., and Shu, C.-K. (2002). Computing a glimpse of randomness. Experimental Mathematics, 11.

Cozman, F. G. and Munhoz, H. N. (2020). Some thoughts on knowledge-enhanced machine learning. International Journal of Approximate Reasoning, submitted.

Davis, E. (2016). How to write science questions that are easy for people and hard for computers. AI Magazine, Spring:13–22.

Davis, E. (2018). Collection of Winograd schemas.

Epstein, R., Roberts, G., and Beber, G. (2009). Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer. Springer.

Floridi, L., Taddeo, M., and Turilli, M. (2009). Turings imitation game: Still an impossible challenge for all machines and some judges — an evaluation of the 2008 Loebner contest. Minds & Machines, 19:145–150.

Hill, R. (1982). A Dictionary of False Friends. Macmillan Press.

Khashabi, D., Khot, T., Sabhwaral, A., Tafjord, O., Clark, P., and Hajishirzi, H. (2020). UnifiedQA: Crossing format boundaries with a single QA system. Technical report, arXiv:2005.00700.

Kocijan, V., Lukasiewicz, T., Davis, E., Marcus, G., and Morgenstern, L. (2020). A review of Winograd Schema Challenge datasets and approaches. Technical report, arXiv 2004.13831.

Levesque, H. J., Davis, E., and Morgenstern, L. (2012). The Winograd schema challenge. In International Conference on Principles of Knowledge Representation and Reasoning, page 552561.

Marcus, G., Rossi, F., and Veloso, M. (2016). Beyond the Turing test. AI Magazine, 37(1):3–4.

Rajpurkar, P., Zhang, J., Lopyrey, K., and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.

Sakaguchi, K., Bras, R. L., Bhagavatula, C., and Choi, Y. (2019). Winogrande: An adversarial Winograd schema challenge at scale. Technical report, arXiv.1907.10641.

Shlesinger, M. and Malkiel, B. (2005). Comparing modalities: Cognates as a case in point. Across Languages and Cultures, 6.

Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX:433–460.

Bennett, C. H. and Gardner, M. (1979). The random number omega bids fair to hold the mysteries of the universe. Scientific American, 241:20–34.

Calude, C. S., Dinneen, M. J., and Shu, C.-K. (2002). Computing a glimpse of randomness. Experimental Mathematics, 11.

Cozman, F. G. and Munhoz, H. N. (2020). Some thoughts on knowledge-enhanced machine learning. International Journal of Approximate Reasoning, submitted.

Davis, E. (2016). How to write science questions that are easy for people and hard for computers. AI Magazine, Spring:13–22.

Davis, E. (2018). Collection of Winograd schemas.

Epstein, R., Roberts, G., and Beber, G. (2009). Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer. Springer.

Floridi, L., Taddeo, M., and Turilli, M. (2009). Turings imitation game: Still an impossible challenge for all machines and some judges — an evaluation of the 2008 Loebner contest. Minds & Machines, 19:145–150.

Hill, R. (1982). A Dictionary of False Friends. Macmillan Press.

Khashabi, D., Khot, T., Sabhwaral, A., Tafjord, O., Clark, P., and Hajishirzi, H. (2020). UnifiedQA: Crossing format boundaries with a single QA system. Technical report, arXiv:2005.00700.

Kocijan, V., Lukasiewicz, T., Davis, E., Marcus, G., and Morgenstern, L. (2020). A review of Winograd Schema Challenge datasets and approaches. Technical report, arXiv 2004.13831.

Levesque, H. J., Davis, E., and Morgenstern, L. (2012). The Winograd schema challenge. In International Conference on Principles of Knowledge Representation and Reasoning, page 552561.

Marcus, G., Rossi, F., and Veloso, M. (2016). Beyond the Turing test. AI Magazine, 37(1):3–4.

Rajpurkar, P., Zhang, J., Lopyrey, K., and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.

Sakaguchi, K., Bras, R. L., Bhagavatula, C., and Choi, Y. (2019). Winogrande: An adversarial Winograd schema challenge at scale. Technical report, arXiv.1907.10641.

Shlesinger, M. and Malkiel, B. (2005). Comparing modalities: Cognates as a case in point. Across Languages and Cultures, 6.

Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX:433–460.