On the Fusion of Multiple Audio Representations for Music Genre Classification

Diego Furtado Silva; Micael Valterlânio da Silva; Ricardo Szram Filho; Angelo Cesar Mendes da Silva

doi:10.5753/sbcm.2021.19423

Diego Furtado Silva Universidade Federal de São Carlos
Micael Valterlânio da Silva Universidade Federal de São Carlos
Ricardo Szram Filho Universidade Federal de São Carlos
Angelo Cesar Mendes da Silva Universidade de São Paulo

DOI: https://doi.org/10.5753/sbcm.2021.19423

Resumo

Music classification is one of the most studied tasks in music information retrieval. Notably, one of the targets with high interest in this task is the music genre. In this scenario, the use of deep neural networks has led to the current state-of-the-art results. Research endeavors in this knowledge domain focus on a single feature to represent the audio in the input for the classification model. Due to this task’s nature, researchers usually rely on time-frequency-based features, especially those designed to make timbre more explicit. However, the audio processing literature presents many strategies to build representations that reveal diverse characteristics of music, such as key and tempo, which may contribute with relevant information for the classification of genres. We showed an exploratory study on different neural network model fusion techniques for music genre classification with multiple features as input. Our results demonstrate that Multi-Feature Fusion Networks consistently improve the classification accuracy for suitable choices of input representations.

Palavras-chave: Music Information Retrieval

Referências

Paul Lamere. Social tagging and music information retrieval. Journal of new music research, 37(2):101–114, 2008.

Peter Knees and Markus Schedl. Music retrieval and recommendation: A tutorial overview. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1133–1136. ACM, 2015.

Yan Wan. Deep Learning for Music Classification. PhD thesis, Hong Kong University of Science and Technology, 2016.

Sergio Oramas, Francesco Barbieri, Oriol Nieto, and Xavier Serra. Multimodal deep learning for music genre classification. Transactions of the International Society for Music Information Retrieval, 1(1):4–21, 2018.

Sandy Manolios, Alan Hanjalic, and Cynthia CS Liem. The influence of personal values on music taste: towards valuebased music recommendations. In ACM Conference on Recommender Systems, pages 501–505, 2019.

Markus Schedl, Emilia Gómez, Julián Urbano, et al. Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval, 8(2-3):127–261, 2014.

YV Murthy and Shashidhar G Koolagudi. Content-based music information retrieval (cb-mir) and its applications toward the music industry: A review. ACM Computing Surveys, 51(3):45, 2018.

Caifeng Liu, Lin Feng, Guochao Liu, Huibing Wang, and Shenglan Liu. Bottom-up broadcast neural network for music genre classification. arXiv preprint arXiv:1901.08928, 2019. 18th Brazilian Symposium on Computer Music - SBCM 2021

Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. Convolutional recurrent neural networks for music classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2392–2396. IEEE, 2017.

Yandre MG Costa, Luiz S Oliveira, and Carlos N Silla Jr. An evaluation of convolutional neural networks for music classification using spectrograms. Applied soft computing, 52:28–38, 2017.

Klaus Seyerlehner. Content-based music recommender systems: Beyond simple frame-level audio similarity. PhD thesis, Johannes Kepler Universität Linz, 2010.

Federico Simonetta, Stavros Ntalampiras, and Federico Avanzini. Multimodal music information processing and retrieval: Survey and future challenges. In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), pages 10–18. IEEE, 2019.

W. Guo, J. Wang, and S. Wang. Deep multimodal representation learning: A survey. IEEE Access, 7:63373–63394, 2019.

Chao Zhang, Zichao Yang, Xiaodong He, and Li Deng. Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing, 14(3):478–493, Mar 2020.

Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, and Alan Hanjalic. One deep music representation to rule them all? a comparative analysis of different representation learning strategies. Neural Computing and Applications, 32:1067– 1093, 2020.

Dhanesh Ramachandram and Graham W Taylor. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine, 34(6):96–108, 2017.

Wenwu Zhu, Xin Wang, and Hongzhi Li. Multi-modal deep analysis for multimedia. IEEE Transactions on Circuits and Systems for Video Technology, 2019.

Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. A survey of audio-based music classification and annotation. IEEE transactions on multimedia, 13(2):303– 319, 2010.

Bob L Sturm. The state of the art ten years after a state of the art: Future research in music information retrieval. Journal of New Music Research, 43(2):147–172, 2014.

James Bergstra, Norman Casagrande, Dumitru Erhan, Douglas Eck, and Balázs Kégl. Aggregate features and adaboost for music classification. Machine Learning, 65(23):473–484, 2006.

George Tzanetakis, Andrey Ermolinskyi, and Perry Cook. Pitch histograms in audio and symbolic music information retrieval. Journal of New Music Research, 32(2):143–152, 2003.

Kris West and Stephen Cox. Finding an optimal segmentation for audio genre classification. In International Society for Music Information Retrieval Conference, pages 680– 685, 2005.

Siddharth Sigtia and Simon Dixon. Improved music feature learning with deep neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6959–6963. IEEE, 2014.

Jaehun Kim, Julián Urbano, Cynthia CS Liem, and Alan Hanjalic. One deep music representation to rule them all? a comparative analysis of different representation learning strategies. Neural Computing and Applications, 32(4):1067–1093, 2020. 43

Christopher J Tralie. Early MFCC and HPCP fusion for robust cover song identification. In International Society for Music Information Retrieval Conference, pages 294– 301, 2017.

Loris Nanni, Yandre MG Costa, Rafael L Aguiar, Carlos N Silla Jr, and Sheryl Brahnam. Ensemble of deep learning, visual and acoustic features for music genre classification. Journal of New Music Research, 47(4):383–397, 2018.

Rodolfo M Pereira, Yandre MG Costa, Rafael L Aguiar, Alceu S Britto, Luiz ES Oliveira, and Carlos N Silla. Representation learning vs. handcrafted features for music genre classification. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2019.

Deepanway Ghosal and Maheshkumar H Kolekar. Music genre recognition using deep neural networks and transfer learning. In Interspeech, volume 2018, pages 2087–2091, 2018.

Francois Chollet. Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der KerasBibliothek. MITP-Verlags GmbH & Co. KG, 2018.

Helge Homburg, Ingo Mierswa, Bulent Moller, Katharina Morik, and Michael Wurst. A benchmark dataset for audio classification and clustering. In International Society for Music Information Retrieval Conference, pages 528–531, 2005.

Ugo Marchand and Geoffroy Peeters. The extended ballroom dataset, 2016.

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. Fma: A dataset for music analysis. In International Society for Music Information Retrieval Conference, pages 316–323, 2017.

Wenqin Chen, Jessica Keast, Jordan Moody, Corinne Moriarty, Felicia Villalobos, Virtue Winter, Xueqi Zhang, Xuanqi Lyu, Elizabeth Freeman, Jessie Wang, Sherry Kai, and Katherine M. Kinnaird. Data usage in mir: History & future recommendations. In International Society for Music Information Retrieval Conference, pages 25–32, 2019.

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, pages 18–25, 2015.

Caifeng Liu, Lin Feng, Guochao Liu, Huibing Wang, and Shenglan Liu. Bottom-up broadcast neural network for music genre classification. Multim. Tools Appl., 80:7313–7331, 2021.