Skip to main content

Who Killed the Winograd Schema Challenge?

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14197))

Included in the following conference series:

  • 236 Accesses

Abstract

In which we investigate the technical issues surrounding the defeat, or perhaps the sudden assassination, of the Winograd Schema Challenge. We argue that, while the obvious suspect is the WinoGrande-based solution, the real cause of death was the masked language modeling technique for learning large language models. The Winograd Schema Challenge was, in the end, just a test for masked language closure, and as such it was killed by the use of this technique at scale.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The self-attention mechanism guarantees that long-distance context has “equal opportunity” to show up. When it comes to an anaphora resolution, the self-attention mechanism tackles it at its core.

  2. 2.

    As a digression, we note that Quoc V. Le, the second author of this paper, tackled the WSC in one of the first successful approaches using language models [48].

  3. 3.

    In an interview, Jacob Devlin has commmented on the cloze test as being a fairly known test in psycholinguistics for accessing levels of readership ability.

  4. 4.

    At https://commoncrawl.org/2016/10/news-dataset-available/.

  5. 5.

    At http://Skylion007.github.io/OpenWebTextCorpus.

References

  1. Open AI. GPT4 Technical Report. arXiv:2303.08774, 2023

  2. Bailey, D., Harrison, A., Lierler, Y., Lifschitz, V., Michael, J.: The winograd schema challenge and reasoning about correlation. In: Working Notes of the Symposium on Logical Formalizations of Commonsense Reasoning (2015)

    Google Scholar 

  3. Bender, D.: Establishing a human baseline for the Winograd schema challenge. In: Modern AI and Cognitive Science Conference, pp. 39–45 (2015)

    Google Scholar 

  4. Bobrow, D.: Precision-focussed textual inference. In: Proceedings of the Workshop on Textual Entailment and Paraphrasing ACL, Prague (2007)

    Google Scholar 

  5. Brown, T.B., et al.: Language models are few-shot learners. arXiv:2005.14165 (2020)

  6. Cozman, F.G., Neri, H.: Some thoughts on knowledge-enhanced machine learning. Int. J. Approximate Reasoning 136, 308–324 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  7. Dagan, I.: Recognizing textual entailment: Rational, evaluation and approaches. Natural Lang. Eng. 15(4), i-xvii (2009)

    Google Scholar 

  8. Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). https://doi.org/10.1007/11736790_9

    Chapter  Google Scholar 

  9. Dai, A.M., Le, V.Q.: Semi-supervised sequence learning. In: C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds, Advances in Neural Information Processing Systems, vol. 28. Curran Associates Inc (2015)

    Google Scholar 

  10. Davies, E.: Winograd schemas and machine translation. arXiv:1608.01884 (2016)

  11. Davis, E., Morgenstern, L., Ortiz, C.L.: The first Winograd Schema Challenge at IJCAI-16. AI Mag. 38(3), 97–98 (2017)

    Google Scholar 

  12. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2 (2019)

  13. Elazar, Y., Zhang, H., Goldberg, Y., Roth, D.: Back to square one: artifact detection, training and commonsense disentanglement in the Winograd schema. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10486–10500. Association for Computational Linguistics (2021)

    Google Scholar 

  14. Emami, A., et al.: The KnowRef coreference corpus: removing gender and number cues for difficult pronominal anaphora resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3952–3961, Florence, Italy, July 2019. Association for Computational Linguistics (2019)

    Google Scholar 

  15. Emami, A., Trischler, A., Suleman, K., Chi, J., Cheung, K.: A generalized knowledge hunting framework for the Winograd Schema Challenge. In: NAACL-HLT 2018: Student Research Workshop, pp. 25–31 (2018)

    Google Scholar 

  16. Frege, G.: Sense and reference. Philos. Rev. 57 (1948)

    Google Scholar 

  17. Joshi, B., Shah, N., Barbieri, F., Leonardo Neves, L.: The devil is in the details: evaluating limitations of transformer-based methods for granular tasks. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3652–3659, Barcelona, Spain (Online). International Committee on Computational Linguistics (2020)

    Google Scholar 

  18. Jurafsky, D., Martin, J.H.: Speech and Language Processing (3rd ed. draft) (2023)

    Google Scholar 

  19. Kavumba, P., Inoue, N., Heinzerling, B., Singh, K., Reisert, P., Kentaro Inui, K.: When choosing plausible alternatives, Clever Hans can be clever. In: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pp. 33–42, Hong Kong, China. Association for Computational Linguistics (2019)

    Google Scholar 

  20. Kocijan, V., Cretu, A.-M., Camburu, O.-M., Yordanov, Y., Lukasiewicz, T.: A surprisingly robust trick for the Winograd scheme challenge. In: Annual Meeting of the Association for Computational Linguistics, pp. 4837–4842 (2019)

    Google Scholar 

  21. Kocijan, V., Davis, E., Lukasiewicz, T., Marcus, G., Leora Morgenstern, L.: The defeat of the Winograd Schema Challenge. arXiv:2201.02387 (2023)

  22. Kocijan, V., Lukasiewicz, T., Davis, E., Marcus, G.: A review of Winograd Schema Challenge datasets and approaches. arXiv:2004.13831v1 (2020)

  23. Korman, D.: Defining textual entailment. J. Assoc. Inf. Sci. Technol. 69 (2018)

    Google Scholar 

  24. Levesque, H.: The winograd schema challenge. In: AAAI (2011)

    Google Scholar 

  25. Levesque, H.: On our best behaviour. In: IJCAI (2013)

    Google Scholar 

  26. Levesque, H.: Common Sense, the Turing Test, and the Quest for Real AI. The MIT Press (2017)

    Google Scholar 

  27. Levesque, H., Davis, E., Morgenstern, L.: The Winograd Schema Challenge. Knowledge Representation (2012)

    Google Scholar 

  28. Liu, Q., Jiang, H., Ling, H.-Z., Zhu, X,. Wei, S., Hu, Y.: Commonsense knowledge enhanced embeddings for solving pronoun disambiguation problems in Winograd schemes challenge. arXiv:1611.04146 (2016)

  29. Yinhan Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)

  30. Marcus, G., Davis, E.: Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon (2019)

    Google Scholar 

  31. George, A.: Müller. Language and Communication, McGraw-Hili (1951)

    Google Scholar 

  32. Nicos, I., Michael Loizos, M.: Tackling the Winograd schema Challenge through machine logical inferences. STAIRS 75 (2016)

    Google Scholar 

  33. Nicos, I., Michael Loizos, M.: How the availability of training material affects performance in the Winograd Schema Challenge (2017)

    Google Scholar 

  34. Nicos, I., Michael Loizos, M.: A data-driven metric of hardness for WSC sentences. GCAI-2018 (EPiC Series in Computing) 55, 107–120 (2018)

    Google Scholar 

  35. Judea Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, 2\({}^{\underline{\rm a}}\) edition (2009)

    Google Scholar 

  36. Peters, M.E., Neumann, M., Iyyer, M, Gardner, M., Kenton Lee, K., Luke Zettlemoyer, L.: Deep contextualized word representations, Christopher Clark (2018)

    Google Scholar 

  37. Quine. Two dogmas of empiricism. Philos. Rev. 60 (1951)

    Google Scholar 

  38. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. Technical Report 8, OpenAI Blog (2019)

    Google Scholar 

  39. Rahman, A., Ng, V.: Resolving complex cases of definite pronouns: The Winograd schema challenge. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 777–789, Jeju Island, Korea (2012). Association for Computational Linguistics

    Google Scholar 

  40. Ruan, Y.-P., Zhu, X., Ling, Z.-H., Shi, Z., Liu, Q., Wei, S.: Exploring unsupervised pretraining and sentence structure modelling for Winograd schemes challenge. arXiv:1904.09705 (2019)

  41. Rus, V.: A study of textual entailment. Int. J. Art. Intell. Tools 17 (2007)

    Google Scholar 

  42. Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v2 (2019)

  43. Sakaguchi, K., Bras, R.L., Bhagavatula, C., Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. AAAI-20 Technical Tracks 34(05) (2019)

    Google Scholar 

  44. Sakaguchi, K., Bras, R.L., Bhagavatula, C., Yejin Choi, Y.: Winogrande: an adversarial Winograd Schema Challenge at scale. arXiv:1907.10641v1 (2019)

  45. Taylor, W.: Cloze procedure: a new tool for measuring readability. J. Quartely Fall (1953)

    Google Scholar 

  46. Hugo Touvron, H., et al.: LLaMA: open and efficient foundation language models. Technical report, arXiv:2302.13971 (2023)

  47. Trichelair, P., et al.: On the evaluation of common-sense reasoning in natural language understanding. arXiv preprint arXiv:1811.01778 (20180

  48. Trinh, T., Quoc Le, Q.: A simple method for commonsense reasoning. arXiv:1806.02847 (2018)

  49. van Aken, B., Winter, B., Löser, A., Felix, A.: Gers. How does BERT answer questions? a layer-wise analysis of transformer representations. In: Association for Computing Machinery, editor, Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pp. 1823–1832 (2019)

    Google Scholar 

  50. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)

    Google Scholar 

  51. Zhang, H., Song, Y.: A distributed solution for Winograd Schema Challenge. In: ICMLC2018 (2018)

    Google Scholar 

  52. Zhu, Y, et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

Download references

Acknowledgements

The first author was supported by FAPESP through grant 2018/0968-1. The second author was partially supported by CNPq through grant 305753/2022-3. This work was carried out at the Center for Artificial Intelligence (C4AI-USP), with support by FAPESP (grant 2019/07665-4) and by the IBM Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio G. Cozman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neri, H., Cozman, F.G. (2023). Who Killed the Winograd Schema Challenge?. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14197. Springer, Cham. https://doi.org/10.1007/978-3-031-45392-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45392-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45391-5

  • Online ISBN: 978-3-031-45392-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics