FLEXA: A Modular and Flexible Framework for AI-Assisted Video Communication
Resumo
Semantic communication offers a promising paradigm for efficient video transmission. However, existing implementations often rely on specialized, tightly coupled architectures that hinder the integration and comparative evaluation of new Artificial Intelligence (AI) models. In this paper, we present FLEXA, a modular framework for AI-assisted video communication that decouples AI processing from transport logic, enabling simplified integration of new models within a standard Web Real-Time Communication (WebRTC) environment. The architecture introduces policy-driven semantic orchestrators that dynamically manage and chain AI models during transmission. To demonstrate its capabilities, we integrated a Super-Resolution Generative Adversarial Network (SRGAN) and evaluated seven upscaling configurations under different operating policies. Results indicate that one configuration achieves the best balance between perceptual quality and efficiency, with LPIPS < 0.1 and SSIM > 0.7 under moderate processing cost. These findings demonstrate that FLEXA effectively manages different AI-driven policies, revealing an optimal operating point for resolution enhancement and quantifying the distinct performance trade-offs between different operating policies.
Referências
Dor Bank, Noam Koenigstein, and Raja Giryes. 2023. Autoencoders. Springer International Publishing, Cham, 353–374. DOI: 10.1007/978-3-031-24628-9_16
James Bankoski, John Koleszar, Lou Quillio, Janne Salonen, Paul Wilkins, and Yaowu Xu. 2011. Vp8 data format and decoding guide. Technical Report. Google Inc.
Niklas Blum, Serge Lachapelle, and Harald Alvestrand. 2021. Webrtc: Real-time communication for the open web platform. Commun. ACM 64, 8 (2021), 50–54.
Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, and Junchen Jiang. 2024. GRACE: loss-resilient real-time video through neural codecs. In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI’24). USENIX Association, USA, 509–531.
Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. IEEE, Turkey, 2366–2369.
Sagar Imambi, Kolla Bhanu Prakash, and G. R. Kanagachidambaresan. 2021. PyTorch. Springer International Publishing, Cham, 87–104. DOI: 10.1007/978-3-030-57077-4_10
Peiwen Jiang, Chao-KaiWen, and Shi Jin. 2022. Adaptive Semantic Video Conferencing for OFDM Systems. In 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, China, 1–6.
Peiwen Jiang, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li. 2022. Wireless semantic communications for video conferencing. IEEE Journal on Selected Areas in Communications 41, 1 (2022), 230–244.
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 4401–4410.
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, USA, 4681–4690.
Tianhong Li, Vibhaalakshmi Sivaraman, Pantea Karimi, Lijie Fan, Mohammad Alizadeh, and Dina Katabi. 2024. Reparo: Loss-Resilient Generative Codec for Video Conferencing. arXiv:2305.14135 [cs.NI] [link]
Chengsi Liang, Xiangyi Deng, Yao Sun, Runze Cheng, Le Xia, Dusit Niyato, and Muhammad Ali Imran. 2023. VISTA: Video Transmission over A Semantic Communication Approach. In 2023 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, Italy, 1777–1782.
Xuewen Luo, Hsiao-Hwa Chen, and Qing Guo. 2022. Semantic communications: Overview, open issues, and future research directions. IEEE Wireless communications 29, 1 (2022), 210–219.
H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. 2003. RFC3550: RTP: A Transport Protocol for Real-Time Applications. Technical Report. Blue Coat Systems Inc., USA.
Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, and Vivienne Sze. 2024. Gemino: Practical and Robust Neural Compression for Video Conferencing. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 569–590.
Haonan Tong, Haopeng Li, Hongyang Du, Zhaohui Yang, Changchuan Yin, and Dusit Niyato. 2025. Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing. IEEE Wireless Communications Letters 14, 1 (2025), 93–97.
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
WebRTC. 2025. WebRTC. [link]. Accessed: 2025-08-11.
YihanWen, Zheng Zhang, Jiayi Sun, Jinglei Li, Chung Shue Chen, and Guanchong Niu. 2025. SAW: Semantic-Aware WebRTC Transmission Using Diffusion-Based Scalable Video Coding. IEEE Internet of Things Journal 12, 5 (2025), 5346–5359.
Le Xia, Yao Sun, Chengsi Liang, Daquan Feng, Runze Cheng, Yang Yang, and Muhammad Ali Imran. 2023. WiserVR: Semantic communication enabled wireless virtual reality delivery. IEEE Wireless Communications 30, 2 (2023), 32–39.
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, USA, 586–595.
