SpeakerID – Toolkit para Processamento e Modelagem de Características de Alto Nível para Reconhecimento Automático de Locutor

Cristian Keil de Abreu; André Gustavo Adami

Cristian Keil de Abreu UCS
André Gustavo Adami UCS

Resumo

Este artigo descreve um toolkit para extração e modelagem de características de alto nível para sistemas de reconhecimento do locutor. Além disso, o toolkit oferece ferramentas para avaliar as características e modelos conforme o paradigma de avaliação do NIST (comumente utilizado como referência de avaliação de tais sistemas). O toolkit foi implementado nas linguagens Perl e C e utiliza diversos softwares livres para tarefas de escalonamento de processos e extração de características. Alguns resultados do toolkit na competição do NIST realizada em 2001 são apresentados neste artigo.

Referências

Adami, A. (2004) Modeling Prosodic Differences for Speaker and Language Recognition. Ph.D. Thesis, OGI School of Science & Engineering at OHSU, Portland, OR, 152 pp.

Andrews, W.D., Kohler, M.A., Campbell, J.P. and Godfrey, J.J. (2001) "Phonetic, Idiolectal and Acoustic Speaker Recognition", 2001: A Speaker Odyssey, Crete, Greece, pp. 55-63.

Bimbot, F. et al. (2004) "A Tutorial on Text-Independent Speaker Verification". EURASIP Journal on Applied Signal Processing, 4: 430-451.

Farahani, F., Georgiou, P.G. and Narayanan, S.S. (2004) "Speaker identification using supra-segmental pitch pattern dynamics", ICASSP, Montreal, Canada, pp. 89-92.

Furui, S. (2005) "50 years of progress in speech and speaker recognition", 10th International Conference on Speech and Computer - SPECOM, Patras, Greece, pp. 1-9.

Gillick, L. and Cox, S.J. (1989) "Some Statistical Issues in the Comparison of Speech Recognition Algorithms", ICASSP. IEEE, Glasgow, Scotland, pp. 532-535.

Hannani, A.E. and Petrovska-Delacrétaz, D. (2005) "Exploiting High-Level Information Provided by ALISP in Speaker Recognition", Non Linear Speech Processing Workshop (NOLISP05), Barcelona, Spain, pp. 19-24.

Lavner, Y., Gath, I. and Rosenhouse, J. (2000) "The Effects of Acoustic Modifications on the Identification of Familiar Voices Speaking Isolated Vowels". Speech Communication, 30: 9-26.

Martin, A. (2001), NIST 2001 Speaker Recognition Evaluation Plan, [link].

Reynolds, D.A., Quatieri, T.F. and Dunn, R.B. (2000) "Speaker Verification Using Adapted Mixture Models". Digital Signal Processing, 10: 19-41.

Schmidt-Nielsen, A. and Crystal, T.H. (1998) "Human vs. Machine Speaker Identification with Telephone Speech", ICSLP, Sydney, Australia, pp. 221-224.