3D-STDF: Compressed Video Quality Enhancement with 3D Spatio-Temporal Fusion and Deformable Convolution

  • Garibaldi da Silveira Júnior UFPel
  • Bruno Zatt UFPel
  • Daniel Palomino UFPel
  • Guilherme Correa UFPel

Resumo


Compressed videos frequently present artifacts that compromise visual quality. Deep learning models have demonstrated significant effectiveness in mitigating such distortions. In this study, we introduce 3D-STDF, an architecture based on the well-known Spatio-Temporal Deformable Fusion (STDF) and augmented with 3D convolutions to more effectively model temporal dependencies across video frames. Furthermore, we refine the Quality Enhancement (QE) module by integrating residual blocks, thereby enabling the extraction and representation of more intricate spatial features. Experimental results indicate that models based on the 3D-STDF architecture achieved an overall average improvement of up to 0.607 dB in PSNR, clearly outperforming previous STDF-based solutions.
Palavras-chave: Video Quality Enhancement, Video Coding, Deep Learning, Spatio-Temporal Deformable Fusion

Referências

J. Boyce, K. Suehring, and X. Li. 2018. JVET-J1010: JVET common test conditions and software reference configurations. JVET-J1010 (2018).

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4778–4787.

H-Y Cheong, Alexis M Tourapis, Joan Llach, and Jill Boyce. 2004. Adaptive spatiotemporal filtering for video denoising. In 2004 International Conference on Image Processing, 2004. ICIP’04., Vol. 2. IEEE, 965–968.

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764–773.

Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23. Springer, 28–39.

Jianing Deng, Li Wang, Shiliang Pu, and Cheng Zhuo. 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 10696–10703.

Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576–584.

Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shaw-Min Lei, Jeong-Hoon Park, and Woo-Jin Han. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video technology 22, 12 (2012), 1755–1764.

Zhenyu Guan, Qunliang Xing, Mai Xu, Ren Yang, Tie Liu, and Zulin Wang. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE transactions on pattern analysis and machine intelligence 43, 3 (2019), 949–963.

Garibaldi Silveira Júnior, Gilberto Kreisler, Bruno Zatt, Daniel Palomino, and Guilherme Correa. 2024. Multi-Domain Spatio-Temporal Deformable Fusion model for video quality enhancement. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web (Juiz de Fora/MG). SBC, Porto Alegre, RS, Brasil, 223–230. DOI: 10.5753/webmedia.2024.241618

Gilberto Kreisler, Garibaldi da Silveira Junior, Bruno Zatt, Daniel Palomino, and Guilherme Correa. 2023. Modelo Multi-Codec Baseado em Spatio-Temporal Deformable Fusion para Melhoria de Qualidade de Vídeos Comprimidos. In Anais do L Seminário Integrado de Software e Hardware. SBC, 143–154.

Shiba Kuanar, Christopher Conly, and KR Rao. 2018. Deep learning based HEVC in-loop filtering for decoder quality enhancement. In 2018 Picture Coding Symposium (PCS). IEEE, 164–168.

Tianyi Li, Mai Xu, Ce Zhu, Ren Yang, Zulin Wang, and Zhenyu Guan. 2019. A deep learning approach for multi-frame in-loop filter of HEVC. IEEE Transactions on Image Processing 28, 11 (2019), 5663–5678.

Khoi-Nguyen C Mac, Dhiraj Joshi, Raymond A Yeh, Jinjun Xiong, Rogerio S Feris, and Minh N Do. 2019. Learning motion in feature space: Locally-consistent deformable convolution networks for fine-grained action detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6282–6291.

Mona Mahmoudi and Guillermo Sapiro. 2005. Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE signal processing letters 12, 12 (2005), 839–842.

Andrey Norkin, Gisle Bjontegaard, Arild Fuldseth, Matthias Narroschke, Masaru Ikeda, Kenneth Andersson, Minhua Zhou, and Geert Van der Auwera. 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1746–1754.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.

Claudio Rota, Marco Buzzelli, Simone Bianco, and Raimondo Schettini. 2023. Video restoration based on deep learning: a comprehensive survey. Artificial Intelligence Review 56, 6 (2023), 5317–5364.

I Sandvine. 2024. Global internet phenomena report.

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649–1668.

Junchao Tong, Xilin Wu, Dandan Ding, Zheng Zhu, and Zoe Liu. 2019. Learningbased multi-frame video quality enhancement. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 929–933.

Chia-Yang Tsai, Ching-Yeh Chen, Tomoo Yamakage, In Suk Chong, Yu-Wen Huang, Chih-Ming Fu, Takayuki Itoh, Takashi Watanabe, Takeshi Chujoh, Marta Karczewicz, et al. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934–945.

Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0.

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127 (2019), 1106–1125.

Ren Yang, Mai Xu, Tie Liu, Zulin Wang, and Zhenyu Guan. 2018. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2018), 2039–2054.

Ren Yang, Mai Xu, ZulinWang, and Tianyi Li. 2018. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6664–6673.

Xinyi Ying, Longguang Wang, Yingqian Wang, Weidong Sheng, Wei An, and Yulan Guo. 2020. Deformable 3d convolution for video super-resolution. IEEE Signal Processing Letters 27 (2020), 1500–1504.

Yifan Zhang, Lei Shi, YiWu, Ke Cheng, Jian Cheng, and Hanqing Lu. 2020. Gesture recognition based on deep deformable 3D convolutional neural networks. Pattern Recognition 107 (2020), 107416.
Publicado
10/11/2025
SILVEIRA JÚNIOR, Garibaldi da; ZATT, Bruno; PALOMINO, Daniel; CORREA, Guilherme. 3D-STDF: Compressed Video Quality Enhancement with 3D Spatio-Temporal Fusion and Deformable Convolution. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 491-495. DOI: https://doi.org/10.5753/webmedia.2025.15022.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>