Enhancing Robustness in Audio Deepfake Detection for VR Applications using data augmentation and Mixup
Resumo
The rapid advancement of virtual reality (VR) technology has heightened the need for robust and reliable deepfake audio detection to ensure the authenticity and integrity of virtual interactions. Al-though current state-of-the-art models exhibit promising results, they are often overconfident, which can lead to poor generalization and reduced effectiveness against novel or slightly altered deepfake attacks. In this work, we investigate the application of data augmentation techniques and Mixup techniques to increase the diversity of training data and improve the generalization of deepfake audio detection models. Mixup creates new training examples by combining pairs of existing examples, promoting smoother and more robust decision boundaries, while data augmentation creates new training examples altering a sample with a given probability. Our results demonstrate that applying such techniques to the Wav2vec 2.0 model significantly improves its generalization ability, leading to more reliable deepfake detection in VR environments
Palavras-chave:
Deepfake Detection, Audio Classification, Machine Learning, Feature Abstraction, Mixup
Referências
ASVspoof 2019: The Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan. [link]. [Online].
Fatih Arslan. 2023. Deepfake Technology: A Criminological Literature Review. The Sakarya Journal of Law (The SJL) 11, 1 (2023), 701–720.
Rebecca A. Delfino. 2023. Deepfakes em julgamento: uma chamada para expandir o papel de controle do juiz de julgamento para proteger os processos legais contra falsificação tecnológica. Hastings Law Journal 74 (2023), 293. [link]
Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, and Yuehai Wang. 2023. Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier. arXiv preprint arXiv:2312.08089 (2023). DOI: 10.48550/arXiv.2312.08089
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, and Nicholas Evans. 2021. AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks. arXiv preprint arXiv:2110.01200 (2021). DOI: 10.48550/arXiv.2110.01200
Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, and Il-Youp Kwak. 2024. Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0. arXiv preprint arXiv:2402.17127 (2024). DOI: 10.48550/arXiv.2402.17127
Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, and Dacheng Tao. 2024. Deepfake Generation and Detection: A Benchmark and Survey. arXiv preprint arXiv:2403.17881 (2024). DOI: 10.48550/arXiv.2403.17881
Tomasz Walczyna and Zbigniew Piotrowski. 2023. Overview of voice conversion methods based on deep learning. Applied Sciences 13, 5 (2023), 3100.
X. Wang, J. Yamagishi, and et al. 2020. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech. Computer Speech & Language (CSL) 64 (2020), 101114.
Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, and Shuchen Shi. 2024. Generalized Fake Audio Detection via Deep Stable Learning. arXiv preprint arXiv:2406.03237 (2024). DOI: 10.48550/arXiv.2406.03237
Junichi Yamagishi, Xuechen Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuenan Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and et al. 2021. Asvspoof 2021: accelerating progress in spoofed and deep-fake speech detection. InASVspoof 2021Workshop - Automatic Speaker Verification and Spoofing Countermeasures Challenge.
Fatih Arslan. 2023. Deepfake Technology: A Criminological Literature Review. The Sakarya Journal of Law (The SJL) 11, 1 (2023), 701–720.
Rebecca A. Delfino. 2023. Deepfakes em julgamento: uma chamada para expandir o papel de controle do juiz de julgamento para proteger os processos legais contra falsificação tecnológica. Hastings Law Journal 74 (2023), 293. [link]
Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, and Yuehai Wang. 2023. Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier. arXiv preprint arXiv:2312.08089 (2023). DOI: 10.48550/arXiv.2312.08089
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, and Nicholas Evans. 2021. AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks. arXiv preprint arXiv:2110.01200 (2021). DOI: 10.48550/arXiv.2110.01200
Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, and Il-Youp Kwak. 2024. Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0. arXiv preprint arXiv:2402.17127 (2024). DOI: 10.48550/arXiv.2402.17127
Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, and Dacheng Tao. 2024. Deepfake Generation and Detection: A Benchmark and Survey. arXiv preprint arXiv:2403.17881 (2024). DOI: 10.48550/arXiv.2403.17881
Tomasz Walczyna and Zbigniew Piotrowski. 2023. Overview of voice conversion methods based on deep learning. Applied Sciences 13, 5 (2023), 3100.
X. Wang, J. Yamagishi, and et al. 2020. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech. Computer Speech & Language (CSL) 64 (2020), 101114.
Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, and Shuchen Shi. 2024. Generalized Fake Audio Detection via Deep Stable Learning. arXiv preprint arXiv:2406.03237 (2024). DOI: 10.48550/arXiv.2406.03237
Junichi Yamagishi, Xuechen Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuenan Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and et al. 2021. Asvspoof 2021: accelerating progress in spoofed and deep-fake speech detection. InASVspoof 2021Workshop - Automatic Speaker Verification and Spoofing Countermeasures Challenge.
Publicado
30/09/2024
Como Citar
OLIVEIRA, Gustavo dos Reis; VIRGILLI, Rafaello; SOUZA, Lucas Alcântara; GRIS, Lucas Stefanel; ROSA, Evellyn Nicole Machado; REMIGIO MESQUITA, Isadora Stéfany Rezende; TUNNERMANN, Daniel; GALVÃO FILHO, Arlindo Rodrigues.
Enhancing Robustness in Audio Deepfake Detection for VR Applications using data augmentation and Mixup. In: SIMPÓSIO DE REALIDADE VIRTUAL E AUMENTADA (SVR), 26. , 2024, Manaus/AM.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 266-269.