SpeakerID – Toolkit para Processamento e Modelagem de Características de Alto Nível para Reconhecimento Automático de Locutor
Resumo
Este artigo descreve um toolkit para extração e modelagem de características de alto nível para sistemas de reconhecimento do locutor. Além disso, o toolkit oferece ferramentas para avaliar as características e modelos conforme o paradigma de avaliação do NIST (comumente utilizado como referência de avaliação de tais sistemas). O toolkit foi implementado nas linguagens Perl e C e utiliza diversos softwares livres para tarefas de escalonamento de processos e extração de características. Alguns resultados do toolkit na competição do NIST realizada em 2001 são apresentados neste artigo.Referências
Adami, A. (2004) Modeling Prosodic Differences for Speaker and Language Recognition. Ph.D. Thesis, OGI School of Science & Engineering at OHSU, Portland, OR, 152 pp.
Andrews, W.D., Kohler, M.A., Campbell, J.P. and Godfrey, J.J. (2001) "Phonetic, Idiolectal and Acoustic Speaker Recognition", 2001: A Speaker Odyssey, Crete, Greece, pp. 55-63.
Bimbot, F. et al. (2004) "A Tutorial on Text-Independent Speaker Verification". EURASIP Journal on Applied Signal Processing, 4: 430-451.
Farahani, F., Georgiou, P.G. and Narayanan, S.S. (2004) "Speaker identification using supra-segmental pitch pattern dynamics", ICASSP, Montreal, Canada, pp. 89-92.
Furui, S. (2005) "50 years of progress in speech and speaker recognition", 10th International Conference on Speech and Computer - SPECOM, Patras, Greece, pp. 1-9.
Gillick, L. and Cox, S.J. (1989) "Some Statistical Issues in the Comparison of Speech Recognition Algorithms", ICASSP. IEEE, Glasgow, Scotland, pp. 532-535.
Hannani, A.E. and Petrovska-Delacrétaz, D. (2005) "Exploiting High-Level Information Provided by ALISP in Speaker Recognition", Non Linear Speech Processing Workshop (NOLISP05), Barcelona, Spain, pp. 19-24.
Lavner, Y., Gath, I. and Rosenhouse, J. (2000) "The Effects of Acoustic Modifications on the Identification of Familiar Voices Speaking Isolated Vowels". Speech Communication, 30: 9-26.
Martin, A. (2001), NIST 2001 Speaker Recognition Evaluation Plan, [link].
Reynolds, D.A., Quatieri, T.F. and Dunn, R.B. (2000) "Speaker Verification Using Adapted Mixture Models". Digital Signal Processing, 10: 19-41.
Schmidt-Nielsen, A. and Crystal, T.H. (1998) "Human vs. Machine Speaker Identification with Telephone Speech", ICSLP, Sydney, Australia, pp. 221-224.
Andrews, W.D., Kohler, M.A., Campbell, J.P. and Godfrey, J.J. (2001) "Phonetic, Idiolectal and Acoustic Speaker Recognition", 2001: A Speaker Odyssey, Crete, Greece, pp. 55-63.
Bimbot, F. et al. (2004) "A Tutorial on Text-Independent Speaker Verification". EURASIP Journal on Applied Signal Processing, 4: 430-451.
Farahani, F., Georgiou, P.G. and Narayanan, S.S. (2004) "Speaker identification using supra-segmental pitch pattern dynamics", ICASSP, Montreal, Canada, pp. 89-92.
Furui, S. (2005) "50 years of progress in speech and speaker recognition", 10th International Conference on Speech and Computer - SPECOM, Patras, Greece, pp. 1-9.
Gillick, L. and Cox, S.J. (1989) "Some Statistical Issues in the Comparison of Speech Recognition Algorithms", ICASSP. IEEE, Glasgow, Scotland, pp. 532-535.
Hannani, A.E. and Petrovska-Delacrétaz, D. (2005) "Exploiting High-Level Information Provided by ALISP in Speaker Recognition", Non Linear Speech Processing Workshop (NOLISP05), Barcelona, Spain, pp. 19-24.
Lavner, Y., Gath, I. and Rosenhouse, J. (2000) "The Effects of Acoustic Modifications on the Identification of Familiar Voices Speaking Isolated Vowels". Speech Communication, 30: 9-26.
Martin, A. (2001), NIST 2001 Speaker Recognition Evaluation Plan, [link].
Reynolds, D.A., Quatieri, T.F. and Dunn, R.B. (2000) "Speaker Verification Using Adapted Mixture Models". Digital Signal Processing, 10: 19-41.
Schmidt-Nielsen, A. and Crystal, T.H. (1998) "Human vs. Machine Speaker Identification with Telephone Speech", ICSLP, Sydney, Australia, pp. 221-224.
Publicado
30/06/2007
Como Citar
ABREU, Cristian Keil de; ADAMI, André Gustavo.
SpeakerID – Toolkit para Processamento e Modelagem de Características de Alto Nível para Reconhecimento Automático de Locutor. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 5. , 2007, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2007
.
p. 1709-1712.
