Evaluation of models for detecting replay attacks using different databases
Abstract
A replay attack is a speech forgery used in an attempt to authenticate a speaker. Deep neural networks have been proposed as methods for detecting fraudulent audio. In view of the use of these models in real applications, in addition to good learning performance it is expected that the models show good results with databases other than the one used for training. In this work two approaches were evaluated with three public databases, with results that indicate low generalization capacity of the models.
References
Chettri, B., Mishra, S., Sturm, B. L., and Benetos, E. (2018). A study on convolutional neural network based end-to-end replay anti-spoofing. [link]
Gong, Y., Yang, J., Huber, J., MacKnight, M., and Poellabauer, C. (2019). Remasc: Realistic replay attack corpus for voice controlled systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2019-September, pages 2355–2359. International Speech Communication Association. DOI: 10.21437/Interspeech.2019-1541
Jain, A. K., Flynn, P., and Ross, A. A. (2008). Handbook of Biometrics. Springer. DOI: 10.1007/978-0-387-71041-9
Khan, A., Malik, K. M., Ryan, J., and Saravanan, M. (2023). Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artificial Intelligence Review, 56:513–566. 01. DOI: 10.1007/s10462-023-10539-8
Korshunov, P., Gonçalves, A. R., Violato, R. P. V., Simões, F. O., and Marcel, S. (2018). On the use of convolutional neural networks for speech presentation attack detection. In IEEE, editor, 2018 IEEE 4th international conference on identity, security, and behavior analysis (ISBA), pages 1–8. DOI: 10.1109/ISBA.2018.8311474
Korshunov, P. and Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 08-12-September-2016, pages 1705–1709. International Speech and Communication Association. DOI: 10.21437/Interspeech.2016
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., and Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2017-August, pages 82–86. International Speech Communication Association. DOI: 10.21437/Interspeech.2017-360
Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., and Kozlov, A. (2019). Stc antispoofing systems for the asvspoof2019 challenge. arXiv. DOI: 10.48550/arXiv.1904.05576
Lee, S.-K. (2024). Arbitrary discrete fourier analysis and its application in replayed speech detection. arXiv. DOI: 10.48550/arXiv.2403.01130
Lee, S.-K., Tsao, Y., and Wang, H.-M. (2022). Detecting replay attacks using single-channel audio: The temporal autocorrelation of speech. In Proceedings of 2022 APSIPA Annual Summit and Conference. 2022 APSIPA Annual Summit and Conference.
Liu, X., Wang, X., Sahidullah, M., Patino, J., Delgado, H., Kinnunen, T., Todisco, M., Yamagishi, J., Evans, N., Nautsch, A., and Lee, K. A. (2023). Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild. IEEE/ACM Transactions on Audio Speech and Language Processing, 31:2507–2522. DOI: 10.1109/TASLP.2023.3285283
Nautsch, A., Wang, X., Evans, N., Kinnunen, T., Vestman, V., Todisco, M., Delgado, H., Sahidullah, M., Yamagishi, J., and Lee, K. A. (2021). Asvspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech. arXiv. DOI: 10.1109/TBIOM.2021.3059479
Zhang, Z., Yi, X., and Zhao, X. (2021). Fake speech detection using residual network with transformer encoder. In IH and MMSec 2021 - Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pages 13–22. Association for Computing Machinery, Inc. DOI: 10.1145/3437880.34604
