Identificação de Áudio Vocaloid com Support Vector Machines: um Estudo de Caso da Hatsune Miku

Felipe V. de Almeida; Victor T. Hayashi

doi:10.5753/sbcm.2021.19451

Felipe V. de Almeida Universidade de São Paulo
Victor T. Hayashi Universidade de São Paulo

DOI: https://doi.org/10.5753/sbcm.2021.19451

Resumo

O processamento de sinais de áudio em conjunto com modelos de aprendizado de máquina tem aplicações em diversas áreas: música, análise forense e análise de fala humana e ruído ambiente. Analisar música de diferentes gêneros pode encorajar investigações interessantes pela comunidade científica, como a investigação de tendências culturais. Este trabalho apresenta uma iniciativa para o processamento de canções vocaloides que ganharam grande popularidade nas redes sociais. Os classificadores Support Vector Machine (SVM) foram treinados em dois experimentos para distinguir canções da vocaloide Hatsune Miku de canções instrumentais e canções de outros vocaloides, apresentando resultados promissores de precisão acima de 80%, que validam a iniciativa.

Palavras-chave: Artificial Intelligence, A-Life and Evolutionary Music Systems, Digital Sound Processing, Music, Society, and Technology

Referências

Vladimir Kulyukin, Sarbajit Mukherjee, and Prakhar Amlathe. Toward audio beehive monitoring: Deep learning vs. standard machine learning in classifying beehive audio samples. Applied Sciences, 8(9):1573, 2018.

Hareesh Bahuleyan. Music genre classification using machine learning techniques. arXiv preprint arXiv:1804.01149, 2018.

Gerson Albuquerque Silva. Proposta de construção de um banco de dados de amostras de fala para uso forense em um arcabouço bayesiano. Revista Brasileira de Criminalística, 5(1):35–45, 2016.

Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan. Trends in audio signal feature extraction methods. Applied Acoustics, 158:107020, 2020.

Roberto Font, Juan M Espín, and María José Cano. Experimental analysis of features for replay attack detectionresults on the asvspoof 2017 challenge. In Interspeech, pages 7–11, 2017.

Xavier Serra. Creating research corpora for the computational study of music: the case of the compmusic project. In Audio engineering society conference: 53rd international conference: Semantic audio. Audio Engineering Society, 2014.

Mohamed Sordo, Amin Chaachoo, and Xavier Serra. Creating corpora for computational research in arab-andalusian music. In Proceedings of the 1st International Workshop on Digital Libraries for Musicology, pages 1–3, 2014.

Ajay Srinivasamurthy, Gopala Krishna Koduri, Sankalp Gulati, Vignesh Ishwar, and Xavier Serra. Corpora for music information research in indian art music. In Georgaki A, Kouroupetroglou G, eds. Proceedings of the 2014 International Computer Music Conference, ICMC/SMC; 2014 Sept 14-20; Athens, Greece.[Michigan]: Michigan Publishing; 2014. Michigan Publishing, 2014.

Hideki Kenmochi and Hayato Ohshita. Vocaloidcommercial singing synthesizer based on sample concatenation. In Eighth Annual Conference of the International Speech Communication Association, 2007.

Zhicong Lu, Chenxinran Shen, Jiannan Li, Hong Shen, and Daniel Wigdor. More kawaii than a real-person live streamer: Understanding how the otaku community engages with and perceives virtual youtubers. CHI ’21, New York, NY, USA, 2021. Association for Computing Machinery.

Xin Zhou. Virtual youtuber kizuna ai. ies, L u n d U, page 205.

RAISING THEIR. The hatsune miku phenomenon: More than a virtual j-pop diva. The Journal of Popular Culture, 49(5), 2016.

Niconico. Niconico. https://www.nicovideo.jp/, 2021. Acesso: 05/06/2021.

Bilibili. Bilibili. https://www.bilibili.com/, 2021. Acesso: 05/06/2021.

Hideki Kenmochi. Vocaloid and hatsune miku phenomenon in japan. In Interdisciplinary Workshop on Singing Voice, 2010.

Chiaki. Hatsune miku joins the cc community. https://creativecommons.org/2012/12/14/hatsune-miku-joins-the-cc-community/, 2021. Acesso: 05/06/2021.

FFmpeg-developers. Ffmpeg. https://www.ffmpeg.org/, 2021. Acesso: 05/06/2021.

Theodoros Giannakopoulos. pyaudioanalysis: An opensource python library for audio signal analysis. PloS one, 10(12):e0144610, 2015.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.

Pavel Korshunov and Sébastien Marcel. A cross-database study of voice presentation attack detection. In Handbook of Biometric Anti-Spoofing, pages 363–389. Springer, 2019.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.

Ivan Himawan, Srikanth Madikeri, Petr Motlicek, Milos Cernak, Sridha Sridharan, and Clinton Fookes. Voice presentation attack detection using convolutional neural networks. In Handbook of Biometric Anti-Spoofing, pages 391–415. Springer, 2019.