Synthetic data invasion detection for voice-based biometric applications

  • Wilson Neto UEA
  • Carlos Figueiredo UEA

Abstract


Voice-based biometric systems are very common nowadays, especi- ally with the popularization of voice command systems and digital assistants such as Google Assistant or Alexa. An important feature of these systems is to detect the user giving a command, as it controls the access to personal or sensitive information to your profile. Thus, as in face-based biometrics, audio biometrics can be attacked by synthetic data, where recordings can be presented as real data. This work presents a model based on deep neural networks capable of detecting this invasion technique. For the training, we used real data of recordings and synthetic data generated from the original recordings. We obtained satisfactory results, mainly due to the low rate of false acceptance and high rate of F1-Score, even in different environments and noises.

Keywords: anti-spoofing, synthetic data, synthetic audio, biometric system

References

Abbas, G., Humayoun, S. R., AlTarawneh, R., and Ebert, A. (2018). Simple shape-based touch behavioral biometrics authentication for smart mobiles. In Proceedings of the 2018 International Conference on Advanced Visual Interfaces, AVI ’18, pages 50:1– 50:3, New York, NY, USA. ACM.

Chetttri, B., Mishra, S., L. Sturm, B., and Benetos, E. (2018). Analysing the predictions of a cnn-based replay spoofing detection system. Pages 92–97.

Faundez-Zanuy, M., Hagmüller, M., and Kubin, G. (2006). Speaker verification security improvement by means of speech watermarking. Speech Communication, 48(12):1608 – 1619. NOLISP 2005.

Feng, H., Fawaz, K., and Shin, K. G. (2017). Continuous authentication for voice assis- tants. In Proceedings of the 23rd Annual International Conference on Mobile Compu- ting and Networking, MobiCom ’17, pages 343–355, New York, NY, USA. ACM.

Ghahabi, O. and Hernando, J. (2014). i-vector modeling with deep belief networks for multi-session speaker recognition.

Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., Seybold, B., Slaney, M., Weiss, R. J., and Wilson, K. (2017). Cnn architectures for large-scale audio classification. In 2017 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 131–135.

Hori, C., Alamri, H., Wang, J., Winchern, G., Hori, T., Cherian, A., Marks, T., Cartillier, V., Gontijo Lopes, R., Das, A., Essa, I., Batra, D., and Parikh, D. (2018). End-to-end audio visual scene-aware dialog using multimodal attention-based video features.

Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network trai- ning by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 448–456. JMLR.org.

Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. International Conference on Learning Representations.

Lai, C.-I., Abad, A., Richmond, K., Yamagishi, J., Dehak, N., and King, S. (2018). At- tentive filtering networks for audio replay attack detection.

Lei, X., Tu, G., Liu, A. X., Li, C., and Xie, T. (2017). The insecurity of home digital voice assistants - amazon alexa as a case study. CoRR, abs/1712.03327.

Mary Zarate, J., Tian, X., Woods, K. J. P., and Poeppel, D. (2015). Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Scientific Reports, 5:11475 EP –. Article.

Mozilla (2019). Common Voice by Mozilla — Common Voice. https://mzl.la/ voice. [Online; accessed 2019-01-10].

Portet, F., Vacher, M., Golanski, C., Roux, C., and Meillon, B. (2013). Design and eva- luation of a smart home voice interface for the elderly: Acceptability and objection aspects. Personal and Ubiquitous Computing, 17(1):127–144.

Pravallika, P. and Prasad, K. S. (2016). Svm classification for fake biometric detection using image quality assessment: Application to iris, face and palm print. In 2016 International Conference on Inventive Computation Technologies (ICICT), volume 1, pages 1–6.

Rebera, A. P., Bonfanti, M. E., and Venier, S. (2014). Societal and ethical implications of anti-spoofing technologies in biometrics. Science and Engineering Ethics, 20(1):155– 169.

Sajjad, M., Khan, S., Hussain, T., Muhammad, K., Sangaiah, A. K., Castiglione, A., Esposito, C., and Baik, S. W. (2018). Cnn-based anti-spoofing two-tier multi-factor authentication system. Pattern Recognition Letters.

Singh, N., Agrawal, A., and Khan, P. R. (2018). Voice biometric: A technology for voice based authentication. Advanced Science, Engineering and Medicine, 10.

Smith, M., Mann, M., and Urbas, G. (2018). Biometrics, Crime and Security.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Ma- chine Learning Research, 15:1929–1958

White, J. M. (2014). Chapter 12 - access control. In White, J. M., editor, Security Risk Assessment, pages 149 – 160. Butterworth-Heinemann, Boston.

Wu, Z., Siong, C. E., and Li, H. (2012). Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In INTERSPEECH.

Yang, J., Chen, J., Su, Y., Jing, Q., Li, Z., Yi, F., Wen, X., Wang, Z., and Wang, Z. L. (2015). Eardrum-inspired active sensors for self-powered cardiovascular system cha- racterization and throat-attached anti-interference voice recognition. Advanced Mate- rials, 27(8):1316–1326.

Ye, D., Zhang, T.-Y., and Guo, G. (2019). Stochastic coding detection scheme in cyber- physical systems against replay attack. Information Sciences, 481:432 – 444.

Zhang, Z. and Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 8778–8788. Curran Associates, Inc.

Zhao, Y., Zou, Z., Wu, L., and Li, Y. (2015). Frequency detection algorithm for frequency diversity signal based on stft. In 2015 Fifth International Conference on Instrumenta- tion and Measurement, Computer, Communication and Control (IMCCC), pages 790– 793.
Published
2019-07-12
NETO, Wilson; FIGUEIREDO, Carlos . Synthetic data invasion detection for voice-based biometric applications. In: PROCEEDINGS OF BRAZILIAN SYMPOSIUM ON UBIQUITOUS AND PERVASIVE COMPUTING (SBCUP), 11. , 2019, Belém. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . ISSN 2595-6183. DOI: https://doi.org/10.5753/sbcup.2019.6587.