Modelo Multi-Codec Baseado em Spatio-Temporal Deformable Fusion para Melhoria de Qualidade de Vídeos Comprimidos

  • Gilberto Kreisler UFPel
  • Garibaldi da Silveira Junior UFPel
  • Bruno Zatt UFPel
  • Daniel Palomino UFPel
  • Guilherme Correa UFPel

Abstract


Compressed videos often suffer from visual effects that decrease the quality perceived by the user. Currently, different deep learning architectures have been shown to be efficient for the problem of quality improvement in videos. However, most of them are trained and validated using videos generated by a single video encoding standard. This paper proposes a new model based on the Spatio-Temporal Deformable Fusion (STDF) architecture, providing quality gains for videos compressed by different standards. The results demonstrate that when considering different standards and video encoding settings in model training, a significant increase in quality improvement is achieved, with an average PSNR increment of up to 0.382 dB.

References

Boyce, J., Suehring, K., and Li, X. (2018). Jvet-j1010: Jvet common test conditions and software reference configurations. JVET-J1010.

Cisco (2020). Cisco annual internet report (2018–2023) white paper. [link]. Acessado em 15/02/2023.

Dai, Y., Liu, D., and Wu, F. (2017). A convolutional neural network approach for post-processing in hevc intra coding. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23, pages 28–39. Springer.

Deng, J., Wang, L., Pu, S., and Zhuo, C. (2020). Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 10696–10703.

Dong, C., Deng, Y., Loy, C. C., and Tang, X. (2015). Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, pages 576–584.

Foi, A., Katkovnik, V., and Egiazarian, K. (2007). Pointwise shape-adaptive dct for high-quality denoising and deblocking of grayscale and color images. IEEE transactions on image processing, 16(5):1395–1411.

HoangVan, X. and Nguyen, H.-H. (2020). Enhancing quality for vvc compressed videos with multi-frame quality enhancement model. In 2020 International Conference on Advanced Technologies for Communications (ATC), pages 172–176. IEEE.

Kuanar, S., Conly, C., and Rao, K. (2018). Deep learning based hevc in-loop filtering for decoder quality enhancement. In 2018 Picture Coding Symposium (PCS), pages 164–168. IEEE.

Li, T., Xu, M., Zhu, C., Yang, R., Wang, Z., and Guan, Z. (2019). A deep learning approach for multi-frame in-loop filter of hevc. IEEE Transactions on Image Processing, 28(11):5663–5678.

Meng, X., Deng, X., Zhu, S., and Zeng, B. (2019). Enhancing quality for vvc compressed videos by jointly exploiting spatial details and temporal structure. In 2019 IEEE International Conference on Image Processing (ICIP), pages 1193–1197. IEEE.

Nasiri, F., Hamidouche, W., Morin, L., Dhollande, N., and Cocherel, G. (2021). A cnn-based prediction-aware quality enhancement framework for vvc. IEEE Open Journal of Signal Processing, 2:466–483.

Soh, J. W., Park, J., Kim, Y., Ahn, B., Lee, H.-S., Moon, Y.-S., and Cho, N. I. (2018). Reduction of video compression artifacts based on deep temporal networks. IEEE Access, 6:63094–63106.

Statista (2022). Semiconductor market size worldwide from 1987 to 2020. [link]. Acessado em 10/01/2022.

Tong, J., Wu, X., Ding, D., Zhu, Z., and Liu, Z. (2019). Learning-based multi-frame video quality enhancement. In 2019 IEEE International Conference on Image Processing (ICIP), pages 929–933. IEEE.

Wang, Y., Zhu, H., Li, Y., Chen, Z., and Liu, S. (2018). Dense residual convolutional neural network based in-loop filter for hevc. In 2018 IEEE Visual Communications and Image Processing (VCIP), pages 1–4. IEEE.

Xing, Q. and Deng, J. (2020). PyTorch implementation of STDF. https://github.com/ryanxingql/stdf-pytorch, version 1.0.0, 2020.

Xue, T., Chen, B., Wu, J., Wei, D., and Freeman, W. T. (2019). Video enhancement with task-oriented flow. International Journal of Computer Vision, 127(8):1106–1125.

Yang, R., Xu, M., Liu, T., Wang, Z., and Guan, Z. (2018a). Enhancing quality for hevc compressed videos. IEEE Transactions on Circuits and Systems for Video Technology, 29(7):2039–2054.

Yang, R., Xu, M., Wang, Z., and Li, T. (2018b). Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6664–6673.

Zhang, Y., Shen, T., Ji, X., Zhang, Y., Xiong, R., and Dai, Q. (2018). Residual highway convolutional neural networks for in-loop filtering in hevc. IEEE Transactions on image processing, 27(8):3827–3841.
Published
2023-08-06
KREISLER, Gilberto; SILVEIRA JUNIOR, Garibaldi da; ZATT, Bruno; PALOMINO, Daniel; CORREA, Guilherme. Modelo Multi-Codec Baseado em Spatio-Temporal Deformable Fusion para Melhoria de Qualidade de Vídeos Comprimidos. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 50. , 2023, João Pessoa/PB. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 143-154. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2023.230044.