Applying CUDA Architecture to Accelerate Full Search Block Matching Algorithm for High Performance Motion Estimation in Video Encoding
Resumo
This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a video encoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture. A comparison among the performance achieved by this solution with a theoretical model and two other implementations (sequential and parallel using OpenMP library) is made as well. We obtained a O(n^2/log^2n) speed-up which fits the proposed theoretical model considering different search areas. It represents up to 600x gain compared to the serial implementation, and 66x compared to the parallel OpenMP implementation.
Palavras-chave:
Graphics processing unit, Complexity theory, Motion estimation, Computer architecture, Algorithm design and analysis, Encoding, Accuracy, Motion Estimation, H.264/AVC, GPU, CUDA
Publicado
26/10/2011
Como Citar
MONTEIRO, Eduarda; VIZZOTTO, Bruno; DINIZ, Cláudio; ZATT, Bruno; BAMPI, Sergio.
Applying CUDA Architecture to Accelerate Full Search Block Matching Algorithm for High Performance Motion Estimation in Video Encoding. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 23. , 2011, Vitória/ES.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2011
.
p. 128-135.
