Avaliação de modelos para detecção de ataques de replay usando diferentes bases de dados

Giovana Y. Nakashima; Higor D. C. Santos; Jone W. M. Soares; Mário Uliani Neto; Fernando O. Runstein; Ricardo P. V. Violato; Marcus Lima

doi:10.5753/stil.2024.245163

Giovana Y. Nakashima CPQD
Higor D. C. Santos CPQD
Jone W. M. Soares CPQD
Mário Uliani Neto CPQD
Fernando O. Runstein CPQD
Ricardo P. V. Violato CPQD
Marcus Lima PUC-Campinas https://orcid.org/0009-0008-7254-285X

DOI: https://doi.org/10.5753/stil.2024.245163

Resumo

Ataque de replay e uma falsificação de fala utilizada na tentativa de autenticação de locutor. Redes neurais profundas têm sido propostas como métodos para detecção de áudios fraudulentos. Tendo em vista a utilização desses modelos em aplicações reais, além de bom desempenho na aprendizagem, espera-se que o modelo obtido apresente bons resultados com bases de dados distintas da utilizada no treinamento. Neste trabalho, duas abordagens foram avaliadas com três bases de dados públicas, com resultados que indicam baixa capacidade de generalização dos modelos.

Palavras-chave: biometria de voz, ataques de replay, anti-spoofing

Referências

Alzantot, M., Wang, Z., and Srivastava, M. B. (2019). Deep residual neural networks for audio spoofing detection. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September:1078–1082. DOI: 10.21437/interspeech.2019-3174

Chettri, B., Mishra, S., Sturm, B. L., and Benetos, E. (2018). A study on convolutional neural network based end-to-end replay anti-spoofing. [link]

Gong, Y., Yang, J., Huber, J., MacKnight, M., and Poellabauer, C. (2019). Remasc: Realistic replay attack corpus for voice controlled systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2019-September, pages 2355–2359. International Speech Communication Association. DOI: 10.21437/Interspeech.2019-1541

Jain, A. K., Flynn, P., and Ross, A. A. (2008). Handbook of Biometrics. Springer. DOI: 10.1007/978-0-387-71041-9

Khan, A., Malik, K. M., Ryan, J., and Saravanan, M. (2023). Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artificial Intelligence Review, 56:513–566. 01. DOI: 10.1007/s10462-023-10539-8

Korshunov, P., Gonçalves, A. R., Violato, R. P. V., Simões, F. O., and Marcel, S. (2018). On the use of convolutional neural networks for speech presentation attack detection. In IEEE, editor, 2018 IEEE 4th international conference on identity, security, and behavior analysis (ISBA), pages 1–8. DOI: 10.1109/ISBA.2018.8311474

Korshunov, P. and Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 08-12-September-2016, pages 1705–1709. International Speech and Communication Association. DOI: 10.21437/Interspeech.2016

Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., and Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 2017-August, pages 82–86. International Speech Communication Association. DOI: 10.21437/Interspeech.2017-360

Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., and Kozlov, A. (2019). Stc antispoofing systems for the asvspoof2019 challenge. arXiv. DOI: 10.48550/arXiv.1904.05576

Lee, S.-K. (2024). Arbitrary discrete fourier analysis and its application in replayed speech detection. arXiv. DOI: 10.48550/arXiv.2403.01130

Lee, S.-K., Tsao, Y., and Wang, H.-M. (2022). Detecting replay attacks using single-channel audio: The temporal autocorrelation of speech. In Proceedings of 2022 APSIPA Annual Summit and Conference. 2022 APSIPA Annual Summit and Conference.

Liu, X., Wang, X., Sahidullah, M., Patino, J., Delgado, H., Kinnunen, T., Todisco, M., Yamagishi, J., Evans, N., Nautsch, A., and Lee, K. A. (2023). Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild. IEEE/ACM Transactions on Audio Speech and Language Processing, 31:2507–2522. DOI: 10.1109/TASLP.2023.3285283

Nautsch, A., Wang, X., Evans, N., Kinnunen, T., Vestman, V., Todisco, M., Delgado, H., Sahidullah, M., Yamagishi, J., and Lee, K. A. (2021). Asvspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech. arXiv. DOI: 10.1109/TBIOM.2021.3059479

Zhang, Z., Yi, X., and Zhao, X. (2021). Fake speech detection using residual network with transformer encoder. In IH and MMSec 2021 - Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pages 13–22. Association for Computing Machinery, Inc. DOI: 10.1145/3437880.34604