Who Killed the Winograd Schema Challenge?

Neri, Hugo; Cozman, Fabio G.

doi:10.1007/978-3-031-45392-2_14

Hugo Neri⁹ &
Fabio G. Cozman⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14197))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

236 Accesses

Abstract

In which we investigate the technical issues surrounding the defeat, or perhaps the sudden assassination, of the Winograd Schema Challenge. We argue that, while the obvious suspect is the WinoGrande-based solution, the real cause of death was the masked language modeling technique for learning large language models. The Winograd Schema Challenge was, in the end, just a test for masked language closure, and as such it was killed by the use of this technique at scale.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The self-attention mechanism guarantees that long-distance context has “equal opportunity” to show up. When it comes to an anaphora resolution, the self-attention mechanism tackles it at its core.
2.
As a digression, we note that Quoc V. Le, the second author of this paper, tackled the WSC in one of the first successful approaches using language models [48].
3.
In an interview, Jacob Devlin has commmented on the cloze test as being a fairly known test in psycholinguistics for accessing levels of readership ability.
4.
At https://commoncrawl.org/2016/10/news-dataset-available/.
5.
At http://Skylion007.github.io/OpenWebTextCorpus.

References

Open AI. GPT4 Technical Report. arXiv:2303.08774, 2023
Bailey, D., Harrison, A., Lierler, Y., Lifschitz, V., Michael, J.: The winograd schema challenge and reasoning about correlation. In: Working Notes of the Symposium on Logical Formalizations of Commonsense Reasoning (2015)
Google Scholar
Bender, D.: Establishing a human baseline for the Winograd schema challenge. In: Modern AI and Cognitive Science Conference, pp. 39–45 (2015)
Google Scholar
Bobrow, D.: Precision-focussed textual inference. In: Proceedings of the Workshop on Textual Entailment and Paraphrasing ACL, Prague (2007)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv:2005.14165 (2020)
Cozman, F.G., Neri, H.: Some thoughts on knowledge-enhanced machine learning. Int. J. Approximate Reasoning 136, 308–324 (2020)
Article MathSciNet MATH Google Scholar
Dagan, I.: Recognizing textual entailment: Rational, evaluation and approaches. Natural Lang. Eng. 15(4), i-xvii (2009)
Google Scholar
Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9
Chapter Google Scholar
Dai, A.M., Le, V.Q.: Semi-supervised sequence learning. In: C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds, Advances in Neural Information Processing Systems, vol. 28. Curran Associates Inc (2015)
Google Scholar
Davies, E.: Winograd schemas and machine translation. arXiv:1608.01884 (2016)
Davis, E., Morgenstern, L., Ortiz, C.L.: The first Winograd Schema Challenge at IJCAI-16. AI Mag. 38(3), 97–98 (2017)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2 (2019)
Elazar, Y., Zhang, H., Goldberg, Y., Roth, D.: Back to square one: artifact detection, training and commonsense disentanglement in the Winograd schema. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10486–10500. Association for Computational Linguistics (2021)
Google Scholar
Emami, A., et al.: The KnowRef coreference corpus: removing gender and number cues for difficult pronominal anaphora resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3952–3961, Florence, Italy, July 2019. Association for Computational Linguistics (2019)
Google Scholar
Emami, A., Trischler, A., Suleman, K., Chi, J., Cheung, K.: A generalized knowledge hunting framework for the Winograd Schema Challenge. In: NAACL-HLT 2018: Student Research Workshop, pp. 25–31 (2018)
Google Scholar
Frege, G.: Sense and reference. Philos. Rev. 57 (1948)
Google Scholar
Joshi, B., Shah, N., Barbieri, F., Leonardo Neves, L.: The devil is in the details: evaluating limitations of transformer-based methods for granular tasks. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3652–3659, Barcelona, Spain (Online). International Committee on Computational Linguistics (2020)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing (3rd ed. draft) (2023)
Google Scholar
Kavumba, P., Inoue, N., Heinzerling, B., Singh, K., Reisert, P., Kentaro Inui, K.: When choosing plausible alternatives, Clever Hans can be clever. In: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pp. 33–42, Hong Kong, China. Association for Computational Linguistics (2019)
Google Scholar
Kocijan, V., Cretu, A.-M., Camburu, O.-M., Yordanov, Y., Lukasiewicz, T.: A surprisingly robust trick for the Winograd scheme challenge. In: Annual Meeting of the Association for Computational Linguistics, pp. 4837–4842 (2019)
Google Scholar
Kocijan, V., Davis, E., Lukasiewicz, T., Marcus, G., Leora Morgenstern, L.: The defeat of the Winograd Schema Challenge. arXiv:2201.02387 (2023)
Kocijan, V., Lukasiewicz, T., Davis, E., Marcus, G.: A review of Winograd Schema Challenge datasets and approaches. arXiv:2004.13831v1 (2020)
Korman, D.: Defining textual entailment. J. Assoc. Inf. Sci. Technol. 69 (2018)
Google Scholar
Levesque, H.: The winograd schema challenge. In: AAAI (2011)
Google Scholar
Levesque, H.: On our best behaviour. In: IJCAI (2013)
Google Scholar
Levesque, H.: Common Sense, the Turing Test, and the Quest for Real AI. The MIT Press (2017)
Google Scholar
Levesque, H., Davis, E., Morgenstern, L.: The Winograd Schema Challenge. Knowledge Representation (2012)
Google Scholar
Liu, Q., Jiang, H., Ling, H.-Z., Zhu, X,. Wei, S., Hu, Y.: Commonsense knowledge enhanced embeddings for solving pronoun disambiguation problems in Winograd schemes challenge. arXiv:1611.04146 (2016)
Yinhan Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)
Marcus, G., Davis, E.: Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon (2019)
Google Scholar
George, A.: Müller. Language and Communication, McGraw-Hili (1951)
Google Scholar
Nicos, I., Michael Loizos, M.: Tackling the Winograd schema Challenge through machine logical inferences. STAIRS 75 (2016)
Google Scholar
Nicos, I., Michael Loizos, M.: How the availability of training material affects performance in the Winograd Schema Challenge (2017)
Google Scholar
Nicos, I., Michael Loizos, M.: A data-driven metric of hardness for WSC sentences. GCAI-2018 (EPiC Series in Computing) 55, 107–120 (2018)
Google Scholar
Judea Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, 2\({}^{\underline{\rm a}}\) edition (2009)
Google Scholar
Peters, M.E., Neumann, M., Iyyer, M, Gardner, M., Kenton Lee, K., Luke Zettlemoyer, L.: Deep contextualized word representations, Christopher Clark (2018)
Google Scholar
Quine. Two dogmas of empiricism. Philos. Rev. 60 (1951)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Technical Report 8, OpenAI Blog (2019)
Google Scholar
Rahman, A., Ng, V.: Resolving complex cases of definite pronouns: The Winograd schema challenge. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 777–789, Jeju Island, Korea (2012). Association for Computational Linguistics
Google Scholar
Ruan, Y.-P., Zhu, X., Ling, Z.-H., Shi, Z., Liu, Q., Wei, S.: Exploring unsupervised pretraining and sentence structure modelling for Winograd schemes challenge. arXiv:1904.09705 (2019)
Rus, V.: A study of textual entailment. Int. J. Art. Intell. Tools 17 (2007)
Google Scholar
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v2 (2019)
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. AAAI-20 Technical Tracks 34(05) (2019)
Google Scholar
Sakaguchi, K., Bras, R.L., Bhagavatula, C., Yejin Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v1 (2019)
Taylor, W.: Cloze procedure: a new tool for measuring readability. J. Quartely Fall (1953)
Google Scholar
Hugo Touvron, H., et al.: LLaMA: open and efficient foundation language models. Technical report, arXiv:2302.13971 (2023)
Trichelair, P., et al.: On the evaluation of common-sense reasoning in natural language understanding. arXiv preprint arXiv:1811.01778 (20180
Trinh, T., Quoc Le, Q.: A simple method for commonsense reasoning. arXiv:1806.02847 (2018)
van Aken, B., Winter, B., Löser, A., Felix, A.: Gers. How does BERT answer questions? a layer-wise analysis of transformer representations. In: Association for Computing Machinery, editor, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pp. 1823–1832 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)
Google Scholar
Zhang, H., Song, Y.: A distributed solution for Winograd Schema Challenge. In: ICMLC2018 (2018)
Google Scholar
Zhu, Y, et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar

Download references

Acknowledgements

The first author was supported by FAPESP through grant 2018/0968-1. The second author was partially supported by CNPq through grant 305753/2022-3. This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by FAPESP (grant 2019/07665-4) and by the IBM Corporation.

Author information

Authors and Affiliations

Universidade de São Paulo, São Paulo, Brazil
Hugo Neri & Fabio G. Cozman

Authors

Hugo Neri
View author publications
You can also search for this author in PubMed Google Scholar
Fabio G. Cozman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabio G. Cozman .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neri, H., Cozman, F.G. (2023). Who Killed the Winograd Schema Challenge?. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14197. Springer, Cham. https://doi.org/10.1007/978-3-031-45392-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-45392-2_14
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45391-5
Online ISBN: 978-3-031-45392-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Who Killed the Winograd Schema Challenge?