Identificação de Áudio Vocaloid com Support Vector Machines: um Estudo de Caso da Hatsune Miku

  • Felipe V. de Almeida Universidade de São Paulo
  • Victor T. Hayashi Universidade de São Paulo

Resumo


O processamento de sinais de áudio em conjunto com modelos de aprendizado de máquina tem aplicações em diversas áreas: música, análise forense e análise de fala humana e ruído ambiente. Analisar música de diferentes gêneros pode encorajar investigações interessantes pela comunidade científica, como a investigação de tendências culturais. Este trabalho apresenta uma iniciativa para o processamento de canções vocaloides que ganharam grande popularidade nas redes sociais. Os classificadores Support Vector Machine (SVM) foram treinados em dois experimentos para distinguir canções da vocaloide Hatsune Miku de canções instrumentais e canções de outros vocaloides, apresentando resultados promissores de precisão acima de 80%, que validam a iniciativa.

Palavras-chave: Artificial Intelligence, A-Life and Evolutionary Music Systems, Digital Sound Processing, Music, Society, and Technology

Referências

Vladimir Kulyukin, Sarbajit Mukherjee, and Prakhar Amlathe. Toward audio beehive monitoring: Deep learning vs. standard machine learning in classifying beehive audio samples. Applied Sciences, 8(9):1573, 2018.

Hareesh Bahuleyan. Music genre classification using machine learning techniques. arXiv preprint arXiv:1804.01149, 2018.

Gerson Albuquerque Silva. Proposta de construção de um banco de dados de amostras de fala para uso forense em um arcabouço bayesiano. Revista Brasileira de Criminalística, 5(1):35–45, 2016.

Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan. Trends in audio signal feature extraction methods. Applied Acoustics, 158:107020, 2020.

Roberto Font, Juan M Espín, and María José Cano. Experimental analysis of features for replay attack detectionresults on the asvspoof 2017 challenge. In Interspeech, pages 7–11, 2017.

Xavier Serra. Creating research corpora for the computational study of music: the case of the compmusic project. In Audio engineering society conference: 53rd international conference: Semantic audio. Audio Engineering Society, 2014.

Mohamed Sordo, Amin Chaachoo, and Xavier Serra. Creating corpora for computational research in arab-andalusian music. In Proceedings of the 1st International Workshop on Digital Libraries for Musicology, pages 1–3, 2014.

Ajay Srinivasamurthy, Gopala Krishna Koduri, Sankalp Gulati, Vignesh Ishwar, and Xavier Serra. Corpora for music information research in indian art music. In Georgaki A, Kouroupetroglou G, eds. Proceedings of the 2014 International Computer Music Conference, ICMC/SMC; 2014 Sept 14-20; Athens, Greece.[Michigan]: Michigan Publishing; 2014. Michigan Publishing, 2014.

Hideki Kenmochi and Hayato Ohshita. Vocaloidcommercial singing synthesizer based on sample concatenation. In Eighth Annual Conference of the International Speech Communication Association, 2007.

Zhicong Lu, Chenxinran Shen, Jiannan Li, Hong Shen, and Daniel Wigdor. More kawaii than a real-person live streamer: Understanding how the otaku community engages with and perceives virtual youtubers. CHI ’21, New York, NY, USA, 2021. Association for Computing Machinery.

Xin Zhou. Virtual youtuber kizuna ai. ies, L u n d U, page 205.

RAISING THEIR. The hatsune miku phenomenon: More than a virtual j-pop diva. The Journal of Popular Culture, 49(5), 2016.

Niconico. Niconico. https://www.nicovideo.jp/, 2021. Acesso: 05/06/2021.

Bilibili. Bilibili. https://www.bilibili.com/, 2021. Acesso: 05/06/2021.

Hideki Kenmochi. Vocaloid and hatsune miku phenomenon in japan. In Interdisciplinary Workshop on Singing Voice, 2010.

Chiaki. Hatsune miku joins the cc community. https://creativecommons.org/2012/12/14/hatsune-miku-joins-the-cc-community/, 2021. Acesso: 05/06/2021.

FFmpeg-developers. Ffmpeg. https://www.ffmpeg.org/, 2021. Acesso: 05/06/2021.

Theodoros Giannakopoulos. pyaudioanalysis: An opensource python library for audio signal analysis. PloS one, 10(12):e0144610, 2015.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.

Pavel Korshunov and Sébastien Marcel. A cross-database study of voice presentation attack detection. In Handbook of Biometric Anti-Spoofing, pages 363–389. Springer, 2019.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.

Ivan Himawan, Srikanth Madikeri, Petr Motlicek, Milos Cernak, Sridha Sridharan, and Clinton Fookes. Voice presentation attack detection using convolutional neural networks. In Handbook of Biometric Anti-Spoofing, pages 391–415. Springer, 2019.
Publicado
24/10/2021
ALMEIDA, Felipe V. de; HAYASHI, Victor T.. Identificação de Áudio Vocaloid com Support Vector Machines: um Estudo de Caso da Hatsune Miku. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO MUSICAL (SBCM), 18. , 2021, Recife. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 217-220. DOI: https://doi.org/10.5753/sbcm.2021.19451.