Multi-Domain Spatio-Temporal Deformable Fusion model for video quality enhancement

  • Garibaldi da Silveira Júnior UFPel
  • Gilberto Kreisler UFPel
  • Bruno Zatt UFPel
  • Daniel Palomino UFPel
  • Guilherme Correa UFPel

Resumo


Lossy video compression introduces artifacts that can degrade the perceived visual quality of the video. Improving the quality of compressed videos involves mitigating these artifacts through filtering techniques. Deep neural network (DNN) models have emerged as powerful tools for this task, demonstrating effectiveness in artifact reduction. However, traditional approaches typically evaluate these models using videos compressed by a single coding standard, limiting their applicability across diverse codecs. To address this limitation, this study proposes a novel multi-domain architecture built upon the Spatio-Temporal Deformable Fusion technique. This innovative approach enables the development of models capable of enhancing videos compressed by various codecs, ensuring consistent performance across different standards. Experimental results showcase the efficacy of the proposed method, yielding significant improvements in average Peak Signal-to-Noise Ratio (PSNR) for videos compressed with HEVC, VVC, VP9, and AV1, with enhancements of 0.764 dB, 0.448 dB, 0.736 dB, and 0.228 dB, respectively. The code of our MD-STDF approach is available at https://github.com/Espeto/md-stdf

Palavras-chave: Redes neurais profundas, Melhoria de qualidade de vídeo, Codificação de vídeo, Aprendizado multi-domínio

Referências

Aayushi Agarwal, Akshay Agarwal, Sayan Sinha, Mayank Vatsa, and Richa Singh. 2021. MD-CSDNetwork: Multi-domain cross stitched network for deepfake detection. In 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021). IEEE, 1–8.

Isis Bender, Daniel Palomino, Luciano Agostini, Guilherme Correa, and Marcelo Porto. 2019. Compression efficiency and computational cost comparison between AV1 and HEVC encoders. In 2019 27th European Signal Processing Conference (EUSIPCO). IEEE, 1–5.

J. Boyce, K. Suehring, and X. Li. 2018. JVET-J1010: JVET common test conditions and software reference configurations. JVET-J1010 (2018).

Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736–3764.

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4778–4787.

Yimin Chen, Rongrong Lu, Yibo Zou, and Yanhui Zhang. 2018. Branch-Activated Multi-Domain Convolutional Neural Network for Visual Tracking. Journal of Shanghai Jiaotong University (Science) 23 (2018), 360–367.

Yue Chen, Debargha Murherjee, Jingning Han, Adrian Grange, Yaowu Xu, Zoe Liu, Sarah Parker, Cheng Chen, Hui Su, Urvang Joshi, et al. 2018. An overview of core coding tools in the AV1 video codec. In 2018 picture coding symposium (PCS). IEEE, 41–45.

V Cisco. 2020. Cisco visual networking index: Forecast and trends, 2018–2023. White Paper 1 (2020).

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764–773.

Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23. Springer, 28–39.

Jianing Deng, Li Wang, Shiliang Pu, and Cheng Zhuo. 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 10696–10703.

Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576–584.

Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shaw-Min Lei, Jeong-Hoon Park, and Woo-Jin Han. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video technology 22, 12 (2012), 1755–1764.

Zhenyu Guan, Qunliang Xing, Mai Xu, Ren Yang, Tie Liu, and Zulin Wang. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE transactions on pattern analysis and machine intelligence 43, 3 (2019), 949–963.

Gilberto Kreisler, Garibaldi da Silveira Junior, Bruno Zatt, Daniel Palomino, and Guilherme Correa. 2023. Modelo Multi-Codec Baseado em Spatio-Temporal Deformable Fusion para Melhoria de Qualidade de Vídeos Comprimidos. In Anais do L Seminário Integrado de Software e Hardware. SBC, 143–154.

Shiba Kuanar, Christopher Conly, and KR Rao. 2018. Deep learning based HEVC in-loop filtering for decoder quality enhancement. In 2018 Picture Coding Symposium (PCS). IEEE, 164–168.

Tianyi Li, Mai Xu, Ce Zhu, Ren Yang, Zulin Wang, and Zhenyu Guan. 2019. A deep learning approach for multi-frame in-loop filter of HEVC. IEEE Transactions on Image Processing 28, 11 (2019), 5663–5678.

Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems 33, 12 (2021), 6999–7019.

Ming Liang and Xiaolin Hu. 2015. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3367–3375.

Xiandong Meng, Xuan Deng, Shuyuan Zhu, and Bing Zeng. 2019. Enhancing quality for VVC compressed videos by jointly exploiting spatial details and temporal structure. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 1193–1197.

Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. 2013. The latest open-source video codec VP9-an overview and preliminary results. In 2013 Picture Coding Symposium (PCS). IEEE, 390–393.

Seungjun Nah, Sanghyun Son, and Kyoung Mu Lee. 2019. Recurrent neural networks with intra-frame iterations for video deblurring. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8102–8111.

Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4293–4302.

Andrey Norkin, Gisle Bjontegaard, Arild Fuldseth, Matthias Narroschke, Masaru Ikeda, Kenneth Andersson, Minhua Zhou, and Geert Van der Auwera. 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1746–1754.

Bo Peng, Renjie Chang, Zhaoqing Pan, Ge Li, Nam Ling, and Jianjun Lei. 2022. Deep in-loop filtering via multi-domain correlation learning and partition constraint for multiview video coding. IEEE Transactions on Circuits and Systems for Video Technology 33, 4 (2022), 1911–1921.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649–1668.

Junchao Tong, Xilin Wu, Dandan Ding, Zheng Zhu, and Zoe Liu. 2019. Learning-based multi-frame video quality enhancement. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 929–933.

Chia-Yang Tsai, Ching-Yeh Chen, Tomoo Yamakage, In Suk Chong, Yu-Wen Huang, Chih-Ming Fu, Takayuki Itoh, Takashi Watanabe, Takeshi Chujoh, Marta Karczewicz, et al. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934–945.

Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0.

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127 (2019), 1106–1125.

Ren Yang and Radu Timofte. 2021. NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Dataset and Study. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.

Ren Yang, Mai Xu, Tie Liu, Zulin Wang, and Zhenyu Guan. 2018. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2018), 2039–2054.

Ren Yang, Mai Xu, Zulin Wang, and Tianyi Li. 2018. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6664–6673.

Chao Zhu, Hang Dong, Jinshan Pan, Boyang Liang, Yuhao Huang, Lean Fu, and Fei Wang. 2022. Deep recurrent neural network with multi-scale bi-directional propagation for video deblurring. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 3598–3607.

Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9308–9316.
Publicado
14/10/2024
SILVEIRA JÚNIOR, Garibaldi da; KREISLER, Gilberto; ZATT, Bruno; PALOMINO, Daniel; CORREA, Guilherme. Multi-Domain Spatio-Temporal Deformable Fusion model for video quality enhancement. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 30. , 2024, Juiz de Fora/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 223-230. DOI: https://doi.org/10.5753/webmedia.2024.241618.