Embeddable architecture for sound events detection using artificial intelligence
Abstract
Machine Learning Techniques have revolutionized monitoring applications in intelligent environments, regardless of what is being monitored. This article proposes an event detection pipeline based on audio signals. This solution starts from the problem that the monitoring of real environments must contain audios of interest and unknown audios, in sequences of very variable sounds, bringing the need to combine different learning techniques in a single model. In particular, we propose the combination of anomaly detection techniques, followed by audio segment classifiers and, later, a final classifier for sequences of events of interest. We also evaluated the performance of such a model on an embedded platform. Results show the feasibility of the model, with a general accuracy of 93.75% over a test dataset, and a prediction time of 0.45s on a popular embedded platform.
References
Agarap, A. F. (2018). Deep learning using rectified linear units (relu).
Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey.ACMComput. Surv., 41
Chen, H., Liu, Z., Liu, Z., Zhang, P., and Yan, Y. (2019). Integrating the data augmen-tation scheme with various classifiers for acoustic scene modeling. Technical report,DCASE2019 Challenge.
Eghbal-Zadeh, H., Lehner, B., Dorfer, M., and Widmer, G. (2016). CP-JKU submissionsfor DCASE-2016: a hybrid approach using binaural i-vectors and deep convolutionalneural networks. Technical report, DCASE2016 Challenge.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.Neural Comput.,9(8):1735–1780.
Kapka, S. and Lewandowski, M. (2019). Sound source detection, localization and clas-sification using consecutive ensemble of crnn models. Technical report, DCASE2019Challenge.
Koizumi, Y., Saito, S., Uematsu, H., Kawachi, Y., and Harada, N. (2019). Unsuperviseddetection of anomalous sound based on deep learning and the neyman–pearson lemma.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(1):212–224.
Lin, L. and Wang, X. (2019). Guided learning convolution system for dcase 2019 task 4.Technical report, Institute of Computing Technology, Chinese Academy of Sciences,Beijing, China.
Mohammadi, M., Al-Fuqaha, A., Sorour, S., and Guizani, M. (2017). Deep learningfor iot big data and streaming analytics: A survey.IEEE Communications Surveys &Tutorials, PP
Munir, M., Siddiqui, S., Dengel, A., and Ahmed, S. (2018). Deepant: A deep learningapproach for unsupervised anomaly detection in time series.IEEE Access, PP:1–1
Piczak, K. J. ESC: Dataset for Environmental Sound Classification. InProceedings ofthe 23rd Annual ACM Conference on Multimedia, pages 1015–1018. ACM Press
Purwins, H., Li, B., Virtanen, T., Schluter, J., Chang, S.-Y., and Sainath, T. (2019). Deeplearning for audio signal processing.IEEE Journal of Selected Topics in Signal Pro-cessing, 13(2):206–219
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy,A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet LargeScale Visual Recognition Challenge.International Journal of Computer Vision (IJCV),115(3):211–252.
Sakashita, Y. and Aono, M. (2018). Acoustic scene classification by ensemble of spectro-grams based on adaptive temporal divisions. Technical report, DCASE2018 Challenge.
Simonyan, K. and Zisserman, A. (2014a). Very deep convolutional networks for large-scale image recognition.
Simonyan, K. and Zisserman, A. (2014b). Very deep convolutional networks for large-scale image recognition
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., and Plumbley, M. D. (2015).Detection and classification of acoustic scenes and events.IEEE Transactions on Mul-timedia, 17(10):1733–1746
Xin, M. and Wang, Y. (2019). Research on image classification model based on deepconvolution neural network.EURASIP Journal on Image and Video Processing,2019(1):40.
