Evaluation of models for detecting replay attacks using different databases

  • Giovana Y. Nakashima CPQD
  • Higor D. C. Santos CPQD
  • Jone W. M. Soares CPQD
  • Mário Uliani Neto CPQD
  • Fernando O. Runstein CPQD
  • Ricardo P. V. Violato CPQD
  • Marcus Lima PUC-Campinas https://orcid.org/0009-0008-7254-285X

Abstract


A replay attack is a speech forgery used in an attempt to authenticate a speaker. Deep neural networks have been proposed as methods for detecting fraudulent audio. In view of the use of these models in real applications, in addition to good learning performance it is expected that the models show good results with databases other than the one used for training. In this work two approaches were evaluated with three public databases, with results that indicate low generalization capacity of the models.

Keywords: voice biometrics, replay atacks, anti-spoofing

References

Alzantot, M., Wang, Z., and Srivastava, M. B. (2019). Deep residual neural networks for audio spoofing detection. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September:1078–1082. DOI: 10.21437/interspeech.2019-3174

Chettri, B., Mishra, S., Sturm, B. L., and Benetos, E. (2018). A study on convolutional neural network based end-to-end replay anti-spoofing. [link]

Gong, Y., Yang, J., Huber, J., MacKnight, M., and Poellabauer, C. (2019). Remasc: Realistic replay attack corpus for voice controlled systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2019-September, pages 2355–2359. International Speech Communication Association. DOI: 10.21437/Interspeech.2019-1541

Jain, A. K., Flynn, P., and Ross, A. A. (2008). Handbook of Biometrics. Springer. DOI: 10.1007/978-0-387-71041-9

Khan, A., Malik, K. M., Ryan, J., and Saravanan, M. (2023). Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artificial Intelligence Review, 56:513–566. 01. DOI: 10.1007/s10462-023-10539-8

Korshunov, P., Gonçalves, A. R., Violato, R. P. V., Simões, F. O., and Marcel, S. (2018). On the use of convolutional neural networks for speech presentation attack detection. In IEEE, editor, 2018 IEEE 4th international conference on identity, security, and behavior analysis (ISBA), pages 1–8. DOI: 10.1109/ISBA.2018.8311474

Korshunov, P. and Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 08-12-September-2016, pages 1705–1709. International Speech and Communication Association. DOI: 10.21437/Interspeech.2016

Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., and Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2017-August, pages 82–86. International Speech Communication Association. DOI: 10.21437/Interspeech.2017-360

Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., and Kozlov, A. (2019). Stc antispoofing systems for the asvspoof2019 challenge. arXiv. DOI: 10.48550/arXiv.1904.05576

Lee, S.-K. (2024). Arbitrary discrete fourier analysis and its application in replayed speech detection. arXiv. DOI: 10.48550/arXiv.2403.01130

Lee, S.-K., Tsao, Y., and Wang, H.-M. (2022). Detecting replay attacks using single-channel audio: The temporal autocorrelation of speech. In Proceedings of 2022 APSIPA Annual Summit and Conference. 2022 APSIPA Annual Summit and Conference.

Liu, X., Wang, X., Sahidullah, M., Patino, J., Delgado, H., Kinnunen, T., Todisco, M., Yamagishi, J., Evans, N., Nautsch, A., and Lee, K. A. (2023). Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild. IEEE/ACM Transactions on Audio Speech and Language Processing, 31:2507–2522. DOI: 10.1109/TASLP.2023.3285283

Nautsch, A., Wang, X., Evans, N., Kinnunen, T., Vestman, V., Todisco, M., Delgado, H., Sahidullah, M., Yamagishi, J., and Lee, K. A. (2021). Asvspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech. arXiv. DOI: 10.1109/TBIOM.2021.3059479

Zhang, Z., Yi, X., and Zhao, X. (2021). Fake speech detection using residual network with transformer encoder. In IH and MMSec 2021 - Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pages 13–22. Association for Computing Machinery, Inc. DOI: 10.1145/3437880.34604
Published
2024-11-17
NAKASHIMA, Giovana Y.; SANTOS, Higor D. C.; SOARES, Jone W. M.; ULIANI NETO, Mário; RUNSTEIN, Fernando O.; VIOLATO, Ricardo P. V.; LIMA, Marcus. Evaluation of models for detecting replay attacks using different databases. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 15. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 6-11. DOI: https://doi.org/10.5753/stil.2024.245163.