Detecção de tentativa de invasão por dados sintéticos em aplicações de biometria por voz
Resumo
Sistemas de biometrias baseados em voz são muito comuns hoje em dia, principalmente com a popularização de sistemas de comando por voz e assistentes digitais, como Google Assistant ou Alexa. Uma funcionalidade importante desses sistemas é detectar o usuário que está emitindo um comando, pois, assim controla-se o acesso à informações pessoais ou sensı́veis ao seu perfil. Assim, como em biometria baseada em faces, biometria por áudio pode sofrer ataques por dados sintéticos, onde gravações podem se fazer passar por dados reais. Este trabalho apresenta um modelo baseado em redes neurais profundas capaz de detectar essa técnica de invasão. Para o treinamento utilizou-se dados reais de gravações e dados sintéticos gerados a partir das gravações originais. Obteve-se resultados condizentes com a literatura, principalmente pela baixa taxa de falsa aceitação e alta taxa de F1-Score, mesmo em diferentes ambientes e ruı́dos.
Referências
Chetttri, B., Mishra, S., L. Sturm, B., and Benetos, E. (2018). Analysing the predictions of a cnn-based replay spoofing detection system. Pages 92–97.
Faundez-Zanuy, M., Hagmüller, M., and Kubin, G. (2006). Speaker verification security improvement by means of speech watermarking. Speech Communication, 48(12):1608 – 1619. NOLISP 2005.
Feng, H., Fawaz, K., and Shin, K. G. (2017). Continuous authentication for voice assis- tants. In Proceedings of the 23rd Annual International Conference on Mobile Compu- ting and Networking, MobiCom ’17, pages 343–355, New York, NY, USA. ACM.
Ghahabi, O. and Hernando, J. (2014). i-vector modeling with deep belief networks for multi-session speaker recognition.
Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., Seybold, B., Slaney, M., Weiss, R. J., and Wilson, K. (2017). Cnn architectures for large-scale audio classification. In 2017 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 131–135.
Hori, C., Alamri, H., Wang, J., Winchern, G., Hori, T., Cherian, A., Marks, T., Cartillier, V., Gontijo Lopes, R., Das, A., Essa, I., Batra, D., and Parikh, D. (2018). End-to-end audio visual scene-aware dialog using multimodal attention-based video features.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network trai- ning by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pages 448–456. JMLR.org.
Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. International Conference on Learning Representations.
Lai, C.-I., Abad, A., Richmond, K., Yamagishi, J., Dehak, N., and King, S. (2018). At- tentive filtering networks for audio replay attack detection.
Lei, X., Tu, G., Liu, A. X., Li, C., and Xie, T. (2017). The insecurity of home digital voice assistants - amazon alexa as a case study. CoRR, abs/1712.03327.
Mary Zarate, J., Tian, X., Woods, K. J. P., and Poeppel, D. (2015). Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Scientific Reports, 5:11475 EP –. Article.
Mozilla (2019). Common Voice by Mozilla — Common Voice. https://mzl.la/ voice. [Online; accessed 2019-01-10].
Portet, F., Vacher, M., Golanski, C., Roux, C., and Meillon, B. (2013). Design and eva- luation of a smart home voice interface for the elderly: Acceptability and objection aspects. Personal and Ubiquitous Computing, 17(1):127–144.
Pravallika, P. and Prasad, K. S. (2016). Svm classification for fake biometric detection using image quality assessment: Application to iris, face and palm print. In 2016 International Conference on Inventive Computation Technologies (ICICT), volume 1, pages 1–6.
Rebera, A. P., Bonfanti, M. E., and Venier, S. (2014). Societal and ethical implications of anti-spoofing technologies in biometrics. Science and Engineering Ethics, 20(1):155– 169.
Sajjad, M., Khan, S., Hussain, T., Muhammad, K., Sangaiah, A. K., Castiglione, A., Esposito, C., and Baik, S. W. (2018). Cnn-based anti-spoofing two-tier multi-factor authentication system. Pattern Recognition Letters.
Singh, N., Agrawal, A., and Khan, P. R. (2018). Voice biometric: A technology for voice based authentication. Advanced Science, Engineering and Medicine, 10.
Smith, M., Mann, M., and Urbas, G. (2018). Biometrics, Crime and Security.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Ma- chine Learning Research, 15:1929–1958
White, J. M. (2014). Chapter 12 - access control. In White, J. M., editor, Security Risk Assessment, pages 149 – 160. Butterworth-Heinemann, Boston.
Wu, Z., Siong, C. E., and Li, H. (2012). Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In INTERSPEECH.
Yang, J., Chen, J., Su, Y., Jing, Q., Li, Z., Yi, F., Wen, X., Wang, Z., and Wang, Z. L. (2015). Eardrum-inspired active sensors for self-powered cardiovascular system cha- racterization and throat-attached anti-interference voice recognition. Advanced Mate- rials, 27(8):1316–1326.
Ye, D., Zhang, T.-Y., and Guo, G. (2019). Stochastic coding detection scheme in cyber- physical systems against replay attack. Information Sciences, 481:432 – 444.
Zhang, Z. and Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 8778–8788. Curran Associates, Inc.
Zhao, Y., Zou, Z., Wu, L., and Li, Y. (2015). Frequency detection algorithm for frequency diversity signal based on stft. In 2015 Fifth International Conference on Instrumenta- tion and Measurement, Computer, Communication and Control (IMCCC), pages 790– 793.