Syntactic Analysis in Transformers through Attention Heads

  • Ricardo Gomes de Oliveira UFBA
  • Daniela Barreiro Claro UFBA
  • Rerisson Cavalcante UFBA

Abstract


The advances in Natural Language Processing (NLP) have led to the development of Transformer architectures, such as BERT. One of the most prominent features of this architecture is the attention mechanism. However, the application of attention mechanisms to languages other than English, like Brazilian Portuguese, remains underexplored. This work analyzes the attention heads in a Transformer architecture, considering the syntactic relations in a sentence. We analyze how attention patterns align with syntactic dependencies involving phenomena such as transitive verbs, reflexive pronouns, and subordinate clauses, as realized in Brazilian Portuguese. Results described the existence of specialized heads within arcs, such as subject–verb and verb–object. These findings open up new research opportunities to evaluate syntactic sensitivity in Transformer models and to contribute to the development of more linguistically informed models for Brazilian Portuguese.

References

Chu, Y.-J. and Liu, T.-H. (1965). On the shortest arborescence of a directed graph. In Science Sinica, volume 14, pages 1396–1400.

Clark, K., Khandelwal, U., Levy, O., and Manning, C. D. (2019). What does bert look at? an analysis of bert’s attention. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 276–286.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

Hewitt, J. and Manning, C. D. (2019). Structural probe: Finding syntax in word representations. In Proceedings of NAACL-HLT 2019, Minneapolis, USA.

Lin, Y., Tan, Y. C., and Frank, R. (2019). Open sesame: Getting inside BERT’s linguistic knowledge. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 241–253, Florence, Italy. Association for Computational Linguistics.

Melo, A., Cabral, B., and Claro, D. (2024). Scaling and adapting large language models for portuguese open information extraction: A comparative study of fine-tuning and lora. In Anais da XXXIV Brazilian Conference on Intelligent Systems, pages 427–441, Porto Alegre, RS, Brasil. SBC.

Michel, P., Levy, O., and Neubig, G. (2019). Are sixteen heads really better than one? In Advances in Neural Information Processing Systems, pages 14014–14024.

Pagano, A. S., Rassi, A., and Pagano, A. C. S. (2024). A ordem e a função das palavras em uma sentença: Sintaxe. In Caseli, H. M. and Nunes, M. G. V., editors, Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português, book chapter 6. BPLN, 2 edition.

Pires, T., Schlinger, E., and Garrette, D. (2019). How multilingual is multilingual bert? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).

Raganato, A. and Tiedemann, J. (2018). An analysis of encoder representations in transformerbased machine translation. In Linzen, T., Chrupała, G., and Alishahi, A., editors, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 287–297, Brussels, Belgium. Association for Computational Linguistics.

Rodrigues, T., Zeni, R., Souza, F., Bonatelli, I., and Fancellu, F. (2023). Albertina pt: State-of-the-art monolingual deberta models for brazilian and european portuguese. arXiv preprint arXiv:2306.02741.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Proceedings of the Brazilian Conference on Intelligent Systems (BRACIS).

Tenney, I., Das, D., and Pavlick, E. (2019). Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), pages 5998–6008.

Voita, E., Talbot, D., Moiseev, F., Sennrich, R., and Titov, I. (2019). Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5797–5808, Florence, Italy. Association for Computational Linguistics.

Wu, S. and Dredze, M. (2020). Are all languages created equal in multilingual bert? In Proceedings of the 5th Workshop on Representation Learning for NLP (RepL4NLP-2020), pages 120–130. Association for Computational Linguistics.
Published
2025-09-29
OLIVEIRA, Ricardo Gomes de; CLARO, Daniela Barreiro; CAVALCANTE, Rerisson. Syntactic Analysis in Transformers through Attention Heads. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 295-306. DOI: https://doi.org/10.5753/stil.2025.37833.