Tempo estimation via neural networks - a comparative analysis

  • Mila Soares de Oliveira de Souza Universidade Federal do Estado do Rio de Janeiro
  • Pedro Nuno de Souza Moura Universidade Federal do Estado do Rio de Janeiro
  • Jean-Pierre Briot Universidade Federal do Estado do Rio de Janeiro / Sorbonne Université

Resumo


This paper presents a comparative analysis on two artificial neural networks (with different architectures) for the task of tempo estimation. For this purpose, it also proposes the modeling, training and evaluation of a B-RNN (Bidirectional Recurrent Neural Network) model capable of estimating tempo in bpm (beats per minutes) of musical pieces, without using external auxiliary modules. An extensive database (12,333 pieces in total) was curated to conduct a quantitative and qualitative analysis over the experiment. Percussion-only tracks were also included inthe dataset. The performance of the B-RNN is compared to that of state-of-the-art models. For further comparison, a state-of-the-art CNN was also retrained with the same datasets used for the B-RNN training. Evaluation results for each model and datasets are presented and discussed, as well as observations and ideas for future research. Tempo estimation was more accurate for the percussion-only dataset, suggesting that the estimation can be more accurate for percussion-only tracks, although further experiments (with more of such datasets) should be made to gather stronger evidence.

Palavras-chave: Music Information Retrieval

Referências

SCHEDL, M. Automatically extracting, analyzing, and visualizing information on music artists from the World Wide Web. PhD Thesis. Johannes Kepler University, Linz, 2008.

GÓMEZ, E. et al. Music Information Retrieval: Overview, Recent Developments and Future Challenges. In: 17th International Society for Music Information Retrieval (ISMIR) Conference, 2016, New York.

ALONSO, M.; DAVID., B.; RICHARD, G. Tempo and beat estimation of musical signals. In: 5th International Society for Music Information Retrieval (ISMIR) Conference, 2004, Barcelona.

HÖRSCHLÄGER, F. et al. Addressing Tempo Estimation Octave Errors in Electronic Music by Incorporating Style Information Extracted from Wikipedia . Available on: [link]. Access in: 15 ago. 2020.

HAINSWORTH, S.; MACLEOD, M.; Particle filtering applied to musical tempo tracking. EURASIP J. on Applied Signal Processing, v. 15, p. 2385-2395, 2004.

PUCKETTE, M. “Pd Documentation”. http://crca.ucsd.edu/~msp/Pd_documentation/. Access: 23/07/2021.

BERRY, W. Structural Functions in Music. New Jersey: Prentice Hall, 1976. 447p.

QUINN, S.; WATT, R. The Perception of Tempo in Music. Perception, v. 35, p. 267-80, 2006.

SCHREIRER, E.D. Tempo and beat analysis of acoustic musical signals. The Journal of the Acoustical Society of America, n. 103, v. 1, p. 588-601, 1998.

BÖCK, S.; KREBS, F.; WIDMER, G. Accurate Tempo Estimation based on Recurrent Neural Networks and Resonating Comb Filters. In: 16th International Society for Music Information Retrieval Conference, Madrid, 2015.

GKIOKAS, A.; KATSOUROS, V.; CARAYANNIS, G. Reducing tempo octave errors by periodicity vector coding and SVM learning. In: 13th International Society for Music Information Retrieval Conference, 2012, Porto.

HOCHREITER, S.; SCHMIDHUBER, J. Long short- term memory. Neural computation, v. 9, n. 8, p. 1735- 1780, 1997.

BÖCK, S.; SCHEDL, M. Enhanced beat tracking with context-aware neural networks. Proceedings of the 14th International Conference on Digital Audio Effects, p. 135- 139, set. 2011.

SCHREIBER, H.; MÜLLER, M. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In: 19th International Society for Music Information Retrieval Conference (ISMIR), 2018, Paris.

BÖCK, S.; DAVIES, M.E.P. Deconstruct, analyse, reconstruct: How to improve tempo, beat, and downbeat estimation. Proceedings of the 21st ISMIR Conference (International Society for Music Information Retrieval). Montreal, Canada, p. 574-582, 2020.

BÖCK, S.; DAVIES, M.E.P.; KNEES, P.; Multi-task learning of tempo and beat: learning one to improve the other. Proceedings of the 20th ISMIR Conference, 2019.

PEETERS, G.; FLOCON-CHLOET, J. Perceptual tempo estimation using GMMregression. Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies (MIRUM), p. 45-50, 2012.

MARCHAND, U.; PEETERS, G. Scale and shift invariant time/frequency representation using auditory statistics: application to rhythm description. In: IEEE International Workshop on Machine Learning for Signal Processing, Salerno, set. 2016.

KNEES, P. et al. Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 364-470, 2015.

TZANETAKIS, G.; COOK, P. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, v. 10, n. 5, p. 293-302, 2002.

RAFFEL, C. Learning-Based Methods for Comparing Sequences, with Applications to Audio-to- MIDI Alignment and Matching. PhD Thesis, Columbia University, New York, 2016.

GOODFELLOW, I.; BENGIO, Y.; COURVILLE, A. Deep learning. Cambridge: MIT Press, 2016.

MÜLLER, M. Fourier Analysis of Signals. In: MÜLLER, M. Fundamentals of Music Processing. Switzerland: Springer International Publishing, 2015. p. 39-57.

MONTECCHIO, N.; ROY, P.; PACHET, F. The Skipping Behavior of Users of Music Streaming Services and its Relation to Musical Structure. arXiv:1903.06008, 2019.

SCHREIBER, H.; MÜLLER, M. Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters. Proceedings of the 16th Sound and Music Computing Conference, 2019.

SCHUSTER, M.; PALIWAL K.K. Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, v. 45, p. 2673-2681, 1997.

CLEVERT, D.A; UNTERTHINER, T.; HOCHREITER, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv:1511.07289v5. 2015.

GRAVES, A.; MOHAMED, A.R.; HINTON, G. Speech recognition with deep recurrent neural networks. In:Acoustics, speech and signal processing (icassp), 2013.

GOUYON, F. et al. An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech, and Language Processing, v. 14, p. 1832 – 1844, 2005.

HÖRSCHLÄGER, F. et al. Addressing Tempo Estimation Octave Errors in Electronic Music by Incorporating Style Information Extracted from Wikipedia . Available on: [link]. Access in: 15 ago. 2020.

WU, F.H.F. (2015). Musical tempo octave error reducing based on the statistics of tempogram. In:23rd Mediterranean Conference on Control and Automation (MED). 2015.

SCHREIBER, H.; URBANO, J.; MÜLLER, M. Music Tempo Estimation: Are We Done Yet?. Transactions of the International Society for Music Information Retrieval, p. 111–125, 2020.
Publicado
24/10/2021
SOUZA, Mila Soares de Oliveira de; MOURA, Pedro Nuno de Souza; BRIOT, Jean-Pierre. Tempo estimation via neural networks - a comparative analysis. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO MUSICAL (SBCM), 18. , 2021, Recife. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 17-24. DOI: https://doi.org/10.5753/sbcm.2021.19420.