On Generalist and Domain-Specific Music Classification Models and Their Impacts on Brazilian Music Genre Recognition
Resumo
Deep learning has become the standard procedure to deal with Music Information Retrieval problems. This category of machine learning algorithms has achieved state-of-the-art results in several tasks, such as classification and auto-tagging. However, obtaining a good-performing model requires a significant amount of data. At the same time, most of the music datasets available lack cultural diversity. Therefore, the performance of the currently most used pre-trained models on underrepresented music genres is unknown. If music models follow the same direction that language models in Natural Language Processing, they should have poorer performance on music styles that are not present in the data used to train them. To verify this assumption, we use a well-known music model designed for auto-tagging in the task of genre recognition. We trained this model from scratch using a large general-domain dataset and two subsets specifying different domains. We empirically show that models trained on specific-domain data perform better than generalist models to classify music in the same domain, even trained with a smaller dataset. This outcome is distinctly observed in the subset that mainly contains Brazilian music, including several usually underrepresented genres.
Referências
Geoffroy Peeters. The deep learning revolution in mir: The pros and cons, the needs and the challenges. In International Symposium on Perception, Representations, Image, Sound, Music, pages 3–30. Springer International Publishing, 2021.
Sander Dieleman, Philémon Brakel, and Benjamin Schrauwen. Audio-based music classification with a pretrained convolutional network. In International Society for Music Information Retrieval Conference, pages 669–674. University of Miami, 2011.
Jordi Pons and Xavier Serra. musicnn: Pre-trained convolutional neural networks for music audio tagging. arXiv preprint arXiv:1909.06654, 2019.
Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. Transfer learning for music classification and regression tasks. In International Society for Music Information Retrieval Conference, 2017.
Jeong Choi, Jongpil Lee, Jiyoung Park, and Juhan Nam. Zero-shot learning for audio-based music classification and tagging. In International Society for Music Information Retrieval Conference, 2019.
Hemanth Venkateswara, Shayok Chakraborty, and Sethuraman Panchanathan. Deep-learning systems for domain adaptation in computer vision: Learning transferable feature representations. IEEE Signal Processing Magazine, 34(6):117–129, 2017.
Telmo Pires, Eva Schlinger, and Dan Garrette. How multilingual is multilingual bert? In Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, 2019.
Minz Won, Andres Ferraro, Dmitry Bogdanov, and Xavier Serra. Evaluation of cnn-based automatic music tagging models. In Sound and Music Computing Conference, 2020.
Naresh N Vempala and Frank A Russo. Modeling music emotion judgments using machine learning methods. Frontiers in psychology, 8:2239, 2018.
Bob L Sturm, Oded Ben-Tal, Úna Monaghan, Nick Collins, Dorien Herremans, Elaine Chew, Gaëtan Hadjeres, Emmanuel Deruty, and François Pachet. Machine learning research that matters for music creation: A case study. Journal of New Music Research, 48(1):36–55, 2019.
James Bergstra, Norman Casagrande, Dumitru Erhan, Douglas Eck, and Balázs Kégl. Aggregate features and a da boost for music classification. Machine learning, 65(23):473–484, 2006.
Jingxian Li, Lixin Han, Xiaoshuang Li, Jun Zhu, Baohua Yuan, and Zhinan Gou. An evaluation of deep neural network models for music classification using spectrograms. Multimedia Tools and Applications, pages 1–27, 2021.
Douglas Eck, Thierry Bertin-Mahieux, and Paul Lamere. Autotagging music using supervised machine learning. In International Conference of Students of Systematic Musicology, pages 367–368, 2007.
Vincent Lostanlen, Joakim Andén, and Mathieu Lagrange. Extended playing techniques: the next milestone in musical instrument recognition. In International Conference on Digital Libraries for Musicology, pages 1–10, 2018.
Xiao Hu, Kahyun Choi, and J Stephen Downie. A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology, 68(2):273–285, 2017.
Renato Panda, Ricardo Malheiro, and Rui Pedro Paiva. Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing, 11(4):614–626, 2018.
Johan Pauwels, Ken O’Hanlon, Emilia Gómez, Mark Sandler, et al. 20 years of automatic chord recognition from audio. In International Society for Music Information Retrieval Conference, 2019.
Carlos Soares Araujo, Marco Cristo, and Rafael Giusti. Predicting music popularity on streaming platforms. In Brazilian Symposium on Computer Music, pages 141–148. SBC, 2019.
Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. IEEE transactions on multimedia, 13(2):303–319, 2010.
Martin McKinney and Jeroen Breebaart. Features for audio and music classification. In International Society for Music Information Retrieval Conference. Johns Hopkins University, 2003.
George Tzanetakis, Andrey Ermolinskyi, and Perry Cook. Pitch histograms in audio and symbolic music information retrieval. Journal of New Music Research, 32(2):143–152, 2003.
Kristopher West and Stephen Cox. Features and classifiers for the automatic classification of musical audio signals. In International Society for Music Information Retrieval Conference, 2004.
Yandre MG Costa, Luiz S Oliveira, and Carlos N Silla Jr. An evaluation of convolutional neural networks for music classification using spectrograms. Applied soft computing, 52:28–38, 2017.
Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. Convolutional recurrent neural networks for music classification. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2392–2396. IEEE, 2017.
Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, and Xavier Serra. End-to-end learning for music audio tagging at scale. In International Society for Music Information Retrieval Conference, 2017.
Juhan Nam, Keunwoo Choi, Jongpil Lee, Szu-Yu Chou, and Yi-Hsuan Yang. Deep learning for audio-based music classification and tagging: Teaching computers to distinguish rock from bach. IEEE signal processing magazine, 36(1):41–51, 2018.
Yang Yu, Sen Luo, Shenglan Liu, Hong Qiao, Yang Liu, and Lin Feng. Deep attention based music genre classification. Neurocomputing, 372:84–91, 2020.
Deepanway Ghosal and Maheshkumar H Kolekar. Music genre recognition using deep neural networks and transfer learning. In Interspeech, pages 2087–2091, 2018.
Beici Liang and Minwei Gu. Music genre classification using transfer learning. In IEEE Conference on Multimedia Information Processing and Retrieval, pages 392–393. IEEE, 2020.
Yilun Zhao and Jia Guo. Musicoder: A universal musicacoustic encoder based on transformer. In International Conference on Multimedia Modeling, pages 417–429. Springer, 2021.
Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, and Chao Wang. Multi-task self-supervised pre-training for music classification. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 556–560. IEEE, 2021.
Janne Spijkervet and John Ashley Burgoyne. Contrastive learning of musical representations. arXiv preprint arXiv:2103.09410, 2021.
Júlia Luiza Conceição, Rosiane de Freitas, Bruno Gadelha, João Gustavo Kienen, Sérgio Anders, and Brendo Cavalcante. Applying supervised learning techniques to brazilian music genre classification. In 2020 XLVI Latin American Computing Conference, pages 102–107. IEEE, 2020.
Carlos N Silla Jr, Alessandro L Koerich, and Celso AA Kaestner. The Latin Music Database. In International Conference on Music Information Retrieval, pages 451–456, 2008.
Jefferson Martins de Sousa, Eanes Torres Pereira, and Luciana Ribeiro Veloso. A robust music genre classification approach for global and regional music datasets evaluation. In IEEE International Conference on Digital Signal Processing, pages 109–113. IEEE, 2016.
Jeronimo Barbosa, Cory McKay, and Ichiro Fujinaga. Evaluating automated classification techniques for folk music genres from the Brazilian Northeast. In Brazilian Symposium on Computer Music, 2015.
Rodrigo Borges and Marcelo Queiroz. Evolution of timbre diversity in a dataset of brazilian popular music: 19502000. In International Conference of Students of Systematic Musicology, 2018.
Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martín Rocamora, and Slim Essid. SAMBASET: A dataset of historical samba de enredo recordings for computational music analysis. In International Society for Music Information Retrieval Conference, 2019.
Felipe Falcão, Nazareno Andrade, Flávio Figueiredo, Diego Silva, and Fabio Morais. Measuring disruption in song similarity networks. In International Society for Music Information Retrieval Conference, 2020.
Angelo Cesar Mendes da Silva, Diego Furtado Silva, and Ricardo Marcondes Marcacini. 4mula: A multitask, multimodal, and multilingual dataset of music lyrics and audio features. In Brazilian Symposium on Multimedia and the Web, pages 145–148, 2020.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186, 2019.
Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, and Iryna Gurevych. How good is your tokenizer? on the monolingual performance of multilingual language models. arXiv preprint arXiv:2012.15613, 2020.
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234, 2020.
Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. Finbert: A pre-trained financial language representation model for financial text mining. In International Joint Conference on Artificial Intelligence, pages 4513–4519, 2020.
Iz Beltagy, Kyle Lo, and Arman Cohan. Scibert: A pretrained language model for scientific text. In Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing, pages 3615–3620, 2019.
Jordi Pons, Thomas Lidy, and Xavier Serra. Experimenting with musically motivated convolutional neural networks. In International Workshop on Content-Based Multimedia Indexing, pages 1–6. IEEE, 2016.