Modelo Multi-Codec Baseado em Spatio-Temporal Deformable Fusion para Melhoria de Qualidade de Vídeos Comprimidos

Gilberto Kreisler; Garibaldi da Silveira Junior; Bruno Zatt; Daniel Palomino; Guilherme Correa

doi:10.5753/semish.2023.230044

Gilberto Kreisler UFPel
Garibaldi da Silveira Junior UFPel
Bruno Zatt UFPel
Daniel Palomino UFPel
Guilherme Correa UFPel

DOI: https://doi.org/10.5753/semish.2023.230044

Resumo

Vídeos comprimidos geralmente sofrem com efeitos visuais que prejudicam a qualidade percebida pelo usuário. Atualmente, diferentes arquiteturas de aprendizado profundo têm se mostrado eficientes para o problema de melhoria de qualidade em vídeos. No entanto, a maioria delas é treinada e validada usando vídeos gerados por um único padrão de codificação de vídeo. Este artigo propõe um novo modelo baseado na arquitetura Spatio-Temporal Deformable Fusion (STDF), proporcionando ganhos de qualidade para vídeos comprimidos por diferentes padrões. Os resultados demonstram que ao considerar diferentes padrões e configurações de codificação de vídeo no treinamento do modelo, um aumento significativo na melhoria de qualidade é alcançado, com um incremento médio de PSNR de até 0,382 dB.

Referências

Boyce, J., Suehring, K., and Li, X. (2018). Jvet-j1010: Jvet common test conditions and software reference configurations. JVET-J1010.

Cisco (2020). Cisco annual internet report (2018–2023) white paper. [link]. Acessado em 15/02/2023.

Dai, Y., Liu, D., and Wu, F. (2017). A convolutional neural network approach for post-processing in hevc intra coding. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23, pages 28–39. Springer.

Deng, J., Wang, L., Pu, S., and Zhuo, C. (2020). Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 10696–10703.

Dong, C., Deng, Y., Loy, C. C., and Tang, X. (2015). Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, pages 576–584.

Foi, A., Katkovnik, V., and Egiazarian, K. (2007). Pointwise shape-adaptive dct for high-quality denoising and deblocking of grayscale and color images. IEEE transactions on image processing, 16(5):1395–1411.

HoangVan, X. and Nguyen, H.-H. (2020). Enhancing quality for vvc compressed videos with multi-frame quality enhancement model. In 2020 International Conference on Advanced Technologies for Communications (ATC), pages 172–176. IEEE.

Kuanar, S., Conly, C., and Rao, K. (2018). Deep learning based hevc in-loop filtering for decoder quality enhancement. In 2018 Picture Coding Symposium (PCS), pages 164–168. IEEE.

Li, T., Xu, M., Zhu, C., Yang, R., Wang, Z., and Guan, Z. (2019). A deep learning approach for multi-frame in-loop filter of hevc. IEEE Transactions on Image Processing, 28(11):5663–5678.

Meng, X., Deng, X., Zhu, S., and Zeng, B. (2019). Enhancing quality for vvc compressed videos by jointly exploiting spatial details and temporal structure. In 2019 IEEE International Conference on Image Processing (ICIP), pages 1193–1197. IEEE.

Nasiri, F., Hamidouche, W., Morin, L., Dhollande, N., and Cocherel, G. (2021). A cnn-based prediction-aware quality enhancement framework for vvc. IEEE Open Journal of Signal Processing, 2:466–483.

Soh, J. W., Park, J., Kim, Y., Ahn, B., Lee, H.-S., Moon, Y.-S., and Cho, N. I. (2018). Reduction of video compression artifacts based on deep temporal networks. IEEE Access, 6:63094–63106.

Statista (2022). Semiconductor market size worldwide from 1987 to 2020. [link]. Acessado em 10/01/2022.

Tong, J., Wu, X., Ding, D., Zhu, Z., and Liu, Z. (2019). Learning-based multi-frame video quality enhancement. In 2019 IEEE International Conference on Image Processing (ICIP), pages 929–933. IEEE.

Wang, Y., Zhu, H., Li, Y., Chen, Z., and Liu, S. (2018). Dense residual convolutional neural network based in-loop filter for hevc. In 2018 IEEE Visual Communications and Image Processing (VCIP), pages 1–4. IEEE.

Xing, Q. and Deng, J. (2020). PyTorch implementation of STDF. https://github.com/ryanxingql/stdf-pytorch, version 1.0.0, 2020.

Xue, T., Chen, B., Wu, J., Wei, D., and Freeman, W. T. (2019). Video enhancement with task-oriented flow. International Journal of Computer Vision, 127(8):1106–1125.

Yang, R., Xu, M., Liu, T., Wang, Z., and Guan, Z. (2018a). Enhancing quality for hevc compressed videos. IEEE Transactions on Circuits and Systems for Video Technology, 29(7):2039–2054.

Yang, R., Xu, M., Wang, Z., and Li, T. (2018b). Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6664–6673.

Zhang, Y., Shen, T., Ji, X., Zhang, Y., Xiong, R., and Dai, Q. (2018). Residual highway convolutional neural networks for in-loop filtering in hevc. IEEE Transactions on image processing, 27(8):3827–3841.