Monitoring and sound classification in Neonatal ICU using neural networks
Abstract
Neonatal Intensive Care Units (NICUs) are specialized units to treat newborns with health complications. Many factors can influence treatment phases, including noise levels and sound sources. To provide a helpful tool to enable proper monitoring and feedback to medical staff, we performed correct sound classification in NICUs using Convolutional and Long Short-Term Memory (LSTM) Neural Networks. We focus on three audio classes: cry, human talks, and alerts from hospital machines (beep sounds). The results include the extraction of relevant sound features and comparisons between classifiers. State-of-the-art models for environmental sounds achieve, on average, 74.4% classification performance. Using the proposed models, we achieved up to 84% performance using the evaluation metrics.
References
Adapa, S. (2019). Urban sound tagging using convolutional neural networks. arXiv preprint arXiv:1909.12699.
Ahrendt, P., Meng, A., and Larsen, J. (2004). Decision time horizon for music genre classification using short time features. In 2004 12th European Signal Processing Conference, pages 1293–1296, Vienna, Austria. IEEE, IEEE.
Banica, I.-A., Cucu, H., Buzo, A., Burileanu, D., and Burileanu, C. (2016). Automatic methods for infant cry classification. In 2016 International Conference on Communications (COMM), pages 51–54, Bucharest, Romania. IEEE, IEEE.
Dave, N. (2013). Feature extraction methods lpc, plp and mfcc in speech recognition. International journal for advance research in engineering and technology, 1(6):1–4.
Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., and Fan, H. (2020). Heart sound classification based on improved mfcc features and convolutional recurrent neural networks. Neural Networks, 130:22–32.
Ferretti, D., Severini, M., Principi, E., Cenci, A., and Squartini, S. (2018). Infant cry detection in adverse acoustic environments by using deep neural networks. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 992–996, Rome, Italy. IEEE, IEEE.
Giannakopoulos, T. and Pikrakis, A. (2014). Introduction to audio analysis: a MATLAB® approach. Academic Press, Oxford, UK.
Hossin, M. and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2):1.
Lerch, A. (2012). An introduction to audio content analysis: Applications in signal processing and music informatics. Wiley-IEEE Press, Hoboken, New Jersey.
Lezhenin, I., Bogach, N., and Pyshkin, E. (2019). Urban sound classification using long short-term memory neural network. In 2019 federated conference on computer science and information systems (FedCSIS), pages 57–60. IEEE.
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, pages 18–25, Austin, Texas. Citeseer, Citeseer.
Mushtaq, Z., Su, S.-F., and Tran, Q.-V. (2021). Spectral images based environmental sound classification using cnn with meaningful data augmentation. Applied Acoustics, 172:107581.
Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the cuidado project. CUIDADO Ist Project Report, 54(0):1–25.
Piczak, K. J. (2015). Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pages 1–6, Boston, USA. IEEE, IEEE.
Salamon, J. and Bello, J. P. (2015). Unsupervised feature learning for urban sound classification. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 171–175, South Brisbane, QLD, Australia. IEEE.
Salamon, J., Jacoby, C., and Bello, J. P. (2014). A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1041–1044.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520.
Sang, J., Park, S., and Lee, J. (2018). Convolutional recurrent neural networks for urban sound classification using raw waveforms. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 2444–2448. IEEE.
Severini, M., Ferretti, D., Principi, E., and Squartini, S. (2019). Automatic detection of cry sounds in neonatal intensive care units by using deep learning and acoustic scene simulation. IEEE Access, 7:51982–51993.
Smith, S. W., Ortmann, A. J., and Clark, W. W. (2018). Noise in the neonatal intensive care unit: a new approach to examining acoustic events. Noise & health, 20(95):121.
Tschannen, M., Kramer, T., Marti, G., Heinzmann, M., and Wiatowski, T. (2016). Heart sound classification using deep structured features. In 2016 Computing in Cardiology Conference (CinC), pages 565–568, Vancouver, BC, Canada. IEEE, IEEE.
Umesh, S., Cohen, L., and Nelson, D. (1999). Fitting the mel scale. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), volume 1, pages 217–220, Phoenix, AZ, USA. IEEE, IEEE.
