Audiovisual Voice Activity Detection and Localization of Simultaneous Speech Sources

Vicente Minotto; Claudio Jung

Vicente Minotto UFRGS
Claudio Jung UFRGS

Resumo

Esta dissertação não possui resumo.

Referências

Bins, J., Jung, C. R., Dihl, L., and Said, A. (2009). Feature-based face tracking for videoconferencing applications. In Multimedia, 2009. ISM ’09. 11th IEEE International Symposium on, pages 227 –234.

Blauth, D. A., Minotto, V. P., Jung, C. R., Lee, B., and Kalker, T. (2012). Voice activity detection and speaker localization using audiovisual cues. Pattern Recognition Letters, 33(4):373 – 380.

Brandstein, M. and Ward, D. (2001). Microphone arrays: signal processing techniques and applications. Digital signal processing. Springer.

da Silveira, L. G., Minotto, V. P., Jung, C. R., and Lee, B. (2010). A GPU Implementation of the SRP-PHAT Sound Source Localization Algorithm. The 12th International Workshop on Acoustic Echo and Noise Control.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10–18.

ITU-T (1996). A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70, Annex B.

Jaimes, A. and Sebe, N. (2007). Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1-2):116–134.

Lopes, C., Goncalves, A., Scharcanski, J., and Jung, C. (2011). Color-based lips extraction applied to voice activity detection. In Image Processing (ICIP), 2011 18th IEEE International Conference on, pages 1057 –1060.

Minotto, V., Jung, C., da Silveira, L., and Lee, B. (2012). GPU-based Approaches for Real-Time Sound Source Localization using the SRP-PHAT Algorithm. International Journal of High Performance Computing Applications.

Minotto, V., Jung, C., and Lee, B. (2014). Simultaneous-speaker voice activity detection and localization using mid-fusion of svm and hmms. In IEEE Transactions on Multimedia. Accepted por publication. Available at IEEE EARLY ACCESS ARTICLES.

Minotto, V., Lopes, C., Scharcanski, J., Jung, C., and Lee, B. (2013). Audiovisual voice activity detection based on microphone arrays and color information. Selected Topics in Signal Processing, IEEE Journal of, 7(1):147–156.

Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Sohn, J., Member, S., Kim, N. S., and Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Process. Lett, 6:1–3.

Thiran, J.-P., Marqués, F., and Bourlard, H. (2010). Multimodal Signal Processing, Theory and Applications for Human-Computer Interaction. Academic Press.