Monitoramento e classificação sonora em UTI Neonatal usando redes neurais
Resumo
As Unidades de Terapia Intensiva Neonatais (UTINs) são unidades especializadas no tratamento de recém-nascidos com complicações de saúde. Muitos fatores podem influenciar as fases do tratamento, os quais incluem níveis de ruído e fontes sonoras. Para fornecer uma ferramenta útil para permitir o monitoramento e feedback adequados à equipe médica, realizamos uma correta classificação sonora em UTIs Neonatais usando redes neurais convolucionais e Long Short-Term Memory. Focamos em três classes de áudio: choro, conversas humanas e alertas de máquinas hospitalares (sons de bipe). Os resultados incluem a extração de features sonoras relevantes e comparações entre classificadores. Modelos do estado da arte para sons ambientais atingem, em média, 74,4% de performance na classificação. Utilizando os modelos propostos, alcançamos um desempenho de até 84% usando as métricas de avaliação.
Referências
Adapa, S. (2019). Urban sound tagging using convolutional neural networks. arXiv preprint arXiv:1909.12699.
Ahrendt, P., Meng, A., and Larsen, J. (2004). Decision time horizon for music genre classification using short time features. In 2004 12th European Signal Processing Conference, pages 1293–1296, Vienna, Austria. IEEE, IEEE.
Banica, I.-A., Cucu, H., Buzo, A., Burileanu, D., and Burileanu, C. (2016). Automatic methods for infant cry classification. In 2016 International Conference on Communications (COMM), pages 51–54, Bucharest, Romania. IEEE, IEEE.
Dave, N. (2013). Feature extraction methods lpc, plp and mfcc in speech recognition. International journal for advance research in engineering and technology, 1(6):1–4.
Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., and Fan, H. (2020). Heart sound classification based on improved mfcc features and convolutional recurrent neural networks. Neural Networks, 130:22–32.
Ferretti, D., Severini, M., Principi, E., Cenci, A., and Squartini, S. (2018). Infant cry detection in adverse acoustic environments by using deep neural networks. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 992–996, Rome, Italy. IEEE, IEEE.
Giannakopoulos, T. and Pikrakis, A. (2014). Introduction to audio analysis: a MATLAB® approach. Academic Press, Oxford, UK.
Hossin, M. and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2):1.
Lerch, A. (2012). An introduction to audio content analysis: Applications in signal processing and music informatics. Wiley-IEEE Press, Hoboken, New Jersey.
Lezhenin, I., Bogach, N., and Pyshkin, E. (2019). Urban sound classification using long short-term memory neural network. In 2019 federated conference on computer science and information systems (FedCSIS), pages 57–60. IEEE.
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, volume 8, pages 18–25, Austin, Texas. Citeseer, Citeseer.
Mushtaq, Z., Su, S.-F., and Tran, Q.-V. (2021). Spectral images based environmental sound classification using cnn with meaningful data augmentation. Applied Acoustics, 172:107581.
Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the cuidado project. CUIDADO Ist Project Report, 54(0):1–25.
Piczak, K. J. (2015). Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pages 1–6, Boston, USA. IEEE, IEEE.
Salamon, J. and Bello, J. P. (2015). Unsupervised feature learning for urban sound classification. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 171–175, South Brisbane, QLD, Australia. IEEE.
Salamon, J., Jacoby, C., and Bello, J. P. (2014). A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1041–1044.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520.
Sang, J., Park, S., and Lee, J. (2018). Convolutional recurrent neural networks for urban sound classification using raw waveforms. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 2444–2448. IEEE.
Severini, M., Ferretti, D., Principi, E., and Squartini, S. (2019). Automatic detection of cry sounds in neonatal intensive care units by using deep learning and acoustic scene simulation. IEEE Access, 7:51982–51993.
Smith, S. W., Ortmann, A. J., and Clark, W. W. (2018). Noise in the neonatal intensive care unit: a new approach to examining acoustic events. Noise & health, 20(95):121.
Tschannen, M., Kramer, T., Marti, G., Heinzmann, M., and Wiatowski, T. (2016). Heart sound classification using deep structured features. In 2016 Computing in Cardiology Conference (CinC), pages 565–568, Vancouver, BC, Canada. IEEE, IEEE.
Umesh, S., Cohen, L., and Nelson, D. (1999). Fitting the mel scale. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), volume 1, pages 217–220, Phoenix, AZ, USA. IEEE, IEEE.