Gaze estimation via self-attention augmented convolutions

  • Gabriel Lefundes Vieira UFBA
  • Luciano Oliveira UFBA

Resumo


Although recently deep learning methods have boosted the accuracy of appearance-based gaze estimation, there is still room for improvement in the network architectures for this particular task. Hence we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network. The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. This mechanism can also create better and more spatially-aware feature representations derived from the face and eye images before gaze regression. We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones. In our experiments, results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, while achieving a second-place on the EyeDiap data set. It is noteworthy that our proposed framework was the only one to reach high accuracy simultaneously on both data sets.
Palavras-chave: Training, Graphics, Deep learning, Estimation, Network architecture, Task analysis, Faces
Publicado
18/10/2021
VIEIRA, Gabriel Lefundes; OLIVEIRA, Luciano. Gaze estimation via self-attention augmented convolutions. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 34. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 .