Fast Spatial-Temporal Transformer Network

  • Rafael Molossi Escher FURG
  • Rodrigo Andrade de Bem FURG
  • Paulo Lilles Jorge Drews FURG


In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.
Palavras-chave: Training, Graphics, Computer vision, Neural networks, Transformers, Image restoration, Data mining, Deep Learning, Video Inpainting, Reformer Networks, Transformer Networks
Como Citar

Selecione um Formato
ESCHER, Rafael Molossi; BEM, Rodrigo Andrade de; DREWS, Paulo Lilles Jorge. Fast Spatial-Temporal Transformer Network. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 34. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 .