Audio Segmentation to Build Bird Training Datasets

Diego T. Terasaka; Luiz E. Martins; Virginia A. dos Santos; Thiago M. Ventura; Allan G. de Oliveira; Gabriel de S. G. Pedroso

doi:10.5753/wcama.2024.2055

Diego T. Terasaka UFMT
Luiz E. Martins UFMT
Virginia A. dos Santos UFMT
Thiago M. Ventura UFMT
Allan G. de Oliveira UFMT
Gabriel de S. G. Pedroso UFMT

DOI: https://doi.org/10.5753/wcama.2024.2055

Resumo

To create a bird classification model, it is necessary to have training datasets with thousands of samples. Automating this task is possible, but the first step is being able to segment soundscapes by identifying bird vocalizations. In this study, we address this issue by testing four methods for audio segmentation, the Librosa Library, Few-Shot Learning technique: the BirdNET Framework, and a Bird Classification Model called Perch. The results show that the best method for the purpose of this work was BirdNET, achieving the highest values for precision, accuracy, and F1-score.

Referências

Chen, H. L., Chuang K. T., and Chen M. S., (2008) On Data Labeling for Clustering Categorical Data. IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1458-1472, Nov. 2008, DOI: 10.1109/TKDE.2008.81.2,

García-Ordás, M. T., Rubio-Martín, S., Benítez-Andrades, J. A., et al. (2023). Multispecies bird sound recognition using a fully convolutional neural network. Applied Intelligence, 53, 23287–23300.

Google Research (2023). Google Bird Vocalization Classifier: A global bird embedding and classification model. [link].

Han, X., & Peng, J. (2023). Bird sound classification based on ECOC-SVM. Applied Acoustics, Volume 204, 2023, 109245.

Kahl, S., Wood, C. M., Eibl, M., & Klinck, H. (2021). BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61, 101236.

McFee, B., Raffel, C., Liang, D., Ellis, D. PW., McVicar, M., Battenberg, E., Nieto, O. (2015) Librosa: Audio and music signal analysis in python. Proceedings of the 14th python in science conference, pp. 18-25.

Narasimhan, R., Fern, X. Z, Raich, R. (2017). Simultaneous segmentation and classification of bird song using CNN. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 146-150, DOI: 10.1109/ICASSP.2017.7952135.

Nolasco, I., Singh, S., Morfi, V., Lostanlen, V., Strandburg-Peshkin, A., Vidaña-Vila, E., Gill, L., Pamuła, H., Whitehead, H., Kiskin, I., Jensen, F. H., Morford, J., Emmerson, M. G., Versace, E., Grout, E., Liu, H., Ghani, B., & Stowell, D. (Eds.). (2023). Learning to Detect an Animal Sound from Five Examples. Ecological Informatics. 77.

Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International conference on machine learning (pp. 6105-6114).

Ventura T. M., Ganchev, T. D., Granados, C. P., Oliveira, A. G., Pedroso, G. S. G., Marques, M. I. and Schuchmann K. L. (2024) The importance of acoustic background modelling in CNN-based detection of the neotropical White-lored Spinetail (Aves, Passeriformes, Furnaridae). Bioacoustics, DOI: 10.1080/09524622.2024.2309362,

Wang, H., Xu, Y., Yu, Y., Lin, Y., & Ran, J. (2022). An Efficient Model for a Vast Number of Bird Species Identification Based on Acoustic Features. Animals, 12(18), 2434.

Xeno-Canto. Sharing wildlife sounds from around the world. 2022. [accessed 2024 March 06]. [link].