The Role of Aggregation Functions on Transformers and ViTs Self-Attention for Classification

  • Joelson Sartori FURG
  • Rodrigo de Bem FURG
  • Graçaliz Dimuro FURG
  • Giancarlo Lucca UCPel


Aggregation functions are mathematical operations that combine or summarize a set of values into a single representative value. They play a crucial role in the attention mechanisms of Transformer neural networks. However, Transformers' default aggregation functions, based on matrix multiplication, may have limitations in certain classification scenarios. This function may struggle with the complexity of information present in the input data, resulting in lower accuracy and efficiency. Considering this issue, the present work aims to replace the traditional matrix multiplication operation used in the classical attention mechanism with alternative and more general aggregation functions. To validate the new aggregation methods on the attention mechanism, we conducted experiments on two datasets, the recently propose Google American Sign Language (ASL) Fingerspelling Recognition and the well-known CIFAR-10, performing time series and image classification, respectively. Results shed light on the role of aggregation functions for classification with Transformers, demonstrating promising outcomes and potential for further improvements.
SARTORI, Joelson; BEM, Rodrigo de; DIMURO, Graçaliz; LUCCA, Giancarlo. The Role of Aggregation Functions on Transformers and ViTs Self-Attention for Classification. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 36. , 2023, Rio Grande/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 97-102.