Um Estudo sobre Técnicas utilizadas para o Reconhecimento de Sons com o uso de Inteligência Artificial e Python

  • Jéfer Benedett Dörr UEM
  • Linnyer Beatrys Ruiz Aylon UEM

Resumo


Este artigo apresenta uma introdução de conceitos e práticas para aprender o inicial de como trabalhar com Reconhecimento de Sons. Será apresentado o embasamento teórico necessário, os conceitos e as ferramentas para trabalhar na prática em um projeto prático de Reconhecimento de Padrões em áudio utilizando Inteligência Artificial com a linguagem de programação Python.

Palavras-chave: Reconhecimento de Áudio, Identificação de Áudio, Inteligência Artificial, Aprendizagem de Máquina, Machine Listening

Referências

Y. Alsouda, S. Pllana, and A. Kurti, "Iot-based urban noise identification using machine learning: performance of svm, knn, bagging, and random forest," in Proceedings of the international conference on omni-layer intelligent systems, 2019, pp. 62-67.

L. Lhoest, M. Lamrini, J. Vandendriessche, N. Wouters, B. da Silva, M. Y. Chkouri, and A. Touhafi, "Mosaic: A classical machine learning multiclassifier based approach against deep learning classifiers for embedded sound classification," Applied Sciences, vol. 11, no. 18, p. 8394, 2021.

B. da Silva, A. W Happi, A. Braeken, and A. Touhafi, "Evaluation of classical machine learning techniques towards urban sound recognition on embedded systems," Applied Sciences, vol. 9, no. 18, p. 3885, 2019.

Y. Alsouda, S. Pllana, and A. Kurti, "A machine learning driven iot solution for noise classification in smart cities," 2018.

J. Segura-Garcia, S. Felici-Castell, J. J. Perez-Solano, M. Cobos, and J. M. Navarro, "Low-cost alternatives for urban noise nuisance monitoring using wireless sensor networks," IEEE Sensors Journal, vol. 15, no. 2, pp. 836-844, 2014.

J. Ye, T. Kobayashi, and T. Higuchi, "Smart audio sensor on anomaly respiration detection using flac features," in 2012 IEEE Sensors Applications Symposium Proceedings. IEEE, 2012, pp. 1-5.

A. A. Mahmoud, I. N. A. Alawadh, G. Latif, and J. Alghazo, "Smart nursery for smart cities: Infant sound classification based on novel features and support vector classifier," in 2020 7th International Conference on Electrical and Electronics Engineering (ICEEE), 2020, pp. 47-52.

S. K. Shah, Z. Tariq, and Y. Lee, "Iot based urban noise monitoring in deep learning using historical reports," in 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 4179-4184.

R. E. Hall, B. Bowerman, J. Braverman, J. Taylor, H. Todosow, and U. Von Wimmersperg, "The vision of a smart city," Brookhaven National Lab., Upton, NY (US), Tech. Rep., 2000.

J. P. Bello, C. Mydlarz, and J. Salamon, "Sound analysis in smart cities," in Computational Analysis of Sound Scenes and Events. Springer, 2018, pp. 373-397.

J. Svatos and J. Holub, "Smart acoustic sensor," in 2019 IEEE 5th International forum on Research and Technology for Society and Industry (RTSI). IEEE, 2019, pp. 161-165.

Q. Mei, M. Gül, and M. Boay, "Indirect health monitoring of bridges using mel-frequency cepstral coefficients and principal component analysis," Mechanical Systems and Signal Processing, vol. 119, pp. 523-546, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0888327018306678

R. K. Gunupudi, M. Nimmala, N. Gugulothu, and S. R. Gali, "Clapp: A self constructing feature clustering approach for anomaly detection," Future Generation Computer Systems, vol. 74, pp. 417-429, 2017.

A. R. Hilal, A. Sayedelahl, A. Tabibiazar, M. S. Kamel, and O. A. Basir, "A distributed sensor management for large-scale iot indoor acoustic surveillance," Future Generation Computer Systems, vol. 86, pp. 1170-1184, 2018.

Z. Tariq, S. K. Shah, and Y. Lee, "Speech emotion detection using iot based deep learning for health care," in 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 4191-4196.

J. Chin, A. Tisan, V. Callaghan, and D. Chik, "Smart-object-based reasoning system for indoor acoustic profiling of elderly inhabitants," Electronics, vol. 10, no. 12, p. 1433, 2021.

A. W. Ramadhan, A. Wijayanto, and H. Oktavianto, "Implementation of audio event recognition for the elderly home support using convolutional neural networks," in 2020 International Electronics Symposium (IES). IEEE, 2020, pp. 91-95.

L. Gantert, M. Sammarco, M. Detyniecki, M. Elias, and M. Campista, "A supervised approach for corrective maintenance using spectral features from industrial sounds," in IEEE 7th World Forum on Internet of Things (WF-IoT), 2021.

R. Müller, F. Ritz, S. Illium, and C. Linnhoff-Popien, "Acoustic anomaly detection for machine sounds based on image transfer learning," in ICAART 2021 - Proceedings of the 13th International Conference on Agents and Artificial Intelligence, vol. 2. SciTePress, 2021, pp. 49-56.

J. Sikora, R. Wagnerová, L. Landryová, J. Šíma, and S. Wrona, "Influence of environmental noise on quality control of hvac devices based on convolutional neural network," Applied Sciences, vol. 11, p. 7484, 8 2021. [Online]. Available: https://www.mdpi.com/2076-3417/11/16/7484

E. Browning, R. Gibb, P. Glover-Kapfer, and K. E. Jones, "Passive acoustic monitoring in ecology and conservation." WWF Conservation Tecnology Series, 2017.

S. S. Sethi, R. M. Ewers, N. S. Jones, C. D. L. Orme, and L. Picinali, "Robust, real-time and autonomous monitoring of ecosystems with an open, low-cost, networked device," Methods in Ecology and Evolution, vol. 9, no. 12, pp. 2383-2387, 2018.

B. Krause and A. Farina, "Using ecoacoustic methods to survey the impacts of climate change on biodiversity," Biological conservation, vol. 195, pp. 245-254, 2016.

A. J. Fairbrass, M. Firman, C. Williams, G. J. Brostow, H. Titheridge, and K. E. Jones, "Citynet-deep learning tools for urban ecoacoustic assessment," Methods in ecology and evolution, vol. 10, no. 2, pp. 186-197, 2019.

A. Farina and S. H. Gage, Ecoacoustics: The ecological role of sounds. John Wiley & Sons, 2017.

A. Zgank, "Bee swarm activity acoustic classification for an iot-based farm service," Sensors, vol. 20, no. 1, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/1/21

D. Vasconcelos, M. S. Yin, F. Wetjen, A. Herbst, T. Ziemer, A. Förster, T. Barkowsky, N. Nunes, and P. Haddawy, "Counting mosquitoes in the wild: An internet of things approach," in Proceedings of the Conference on Information Technology for Social Good, 2021, pp. 43-48.

S. Thangavel and C. S. Shokkalingam, "The iot based embedded system for the detection and discrimination of animals to avoid human-wildlife 9 conflict," Journal of Ambient Intelligence and Humanized Computing, pp. 1-17, 2021.

Y. B. Ouattara, T. A. Kobea, G. Baudoin, J.-M. Laheurte et al., "Knn and svm classification for chainsaw identification in the forest areas," International journal of advanced computer science and applications (IJACSA), vol. 10, no. 12, 2019.

B. Holgate, R. Maggini, and S. Fuller, "Mapping ecoacoustic hot spots and moments of biodiversity to inform conservation and urban planning," Ecological Indicators, vol. 126, p. 107627, 2021.

C. C. Constantinou, E. Michaelides, I. Alexopoulos, T. Pieri, S. Neophytou, I. Kyriakides, E. Abdi, J. Reodica, and D. R. Hayes, "Modeling the operating characteristics of iot for underwater sound classification," in 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2021, pp. 1016-1022.

M. Antonini, M. Vecchio, F. Antonelli, P. Ducange, and C. Perera, "Smart audio sensors in the internet of things edge for anomaly detection," IEEE Access, vol. 6, pp. 67 594-67 610, 2018.

idceurope.com, "Why is sound recognition a key strategic technology for artificial intelligence," Dezembro 2019. [Online]. Available: [link].

hellofuture, "Ai breaks the sound barrier: Sound recognition remains a relatively unexplored field of artificial intelligence," Dezembro 2020. [Online]. Available: https://hellofuture.orange.com/en/ai-breaks-the-sound-barrier/

IBM, "Sound as a new data source for industry 4.0," DEZEMBRO 2021. [Online]. Available: [link]

gartner, "Gartner top strategic tech trends for 2021: Gartner's new ebook highlights trends, like internet of things (iot) edge cloud, that will define the future of it." Dezembro 2020. [Online]. Available: [link].

J. McCarthy, "Artificial intelligence, logic and formalizing common sense," in Philosophical logic and artificial intelligence. Springer, 1989, pp. 161-190.

T. Mitchell, Machine learning. McGraw hill Burr Ridge, 1997.

X.-D. Zhang, "Machine learning," in A Matrix Algebra Approach to Artificial Intelligence. Springer, 2020, pp. 223-440.

W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.

Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.

M. V. Valueva, N. Nagornov, P. A. Lyakhov, G. V. Valuev, and N. I. Chervyakov, "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation," Mathematics and Computers in Simulation, vol. 177, pp. 232-243, 2020.

S. S. Stevens, J. Volkmann, and E. B. Newman, "A scale for the measurement of the psychological magnitude pitch," The journal of the acoustical society of america, vol. 8, no. 3, pp. 185-190, 1937.

M. N. VIEIRA. (2004) Acústica - princípios da produção e análise da voz. [Online]. Available: http://www.cefala.org/fonologia/acustica

D. Gabor, "Theory of communication. part 1: The analysis of information," Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, vol. 93, no. 26, pp. 429-441, 1946.

P. Heckbert, "Fourier transforms and the fast fourier transform (fft) algorithm," Computer Graphics, vol. 2, pp. 15-463, 1995.

S. A. Majeed, H. Husain, S. A. Samad, and T. F. Idbeaa, "Mel frequency cepstral coefficients (mfcc) feature extraction enhancement in the application of speech recognition: a comparison study," Journal of theoretical and applied information technology, vol. 79, no. 1, p. 38, 2015.

S. Li, H. Kim, S. Lee, J. C. Gallagher, D. Kim, S. Park, and E. T. Matson, "Convolutional neural networks for analyzing unmanned aerial vehicles sound," in 2018 18th International Conference on Control, Automation and Systems (ICCAS), 2018, pp. 862-866.

H. Purohit, R. Tanabe, K. Ichige, T. Endo, Y. Nikaido, K. Suefusa, and Y. Kawaguchi, "Mimii dataset: Sound dataset for malfunctioning industrial machine investigation and inspection," arXiv preprint arXiv:1909.09347, 2019.

A. A. Rahman and J. Angel Arul Jothi, "Classification of urbansound8k: A study using convolutional neural network and multiple data augmentation techniques," in International Conference on Soft Computing and its Engineering Applications. Springer, 2020, pp. 52-64.

K. J. Piczak, "Esc: Dataset for environmental sound classification," in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 1015-1018.

B. L. Sturm, "The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use," arXiv preprint arXiv:1306.1461, 2013.

P. Warden, "Speech commands: A dataset for limited-vocabulary speech recognition," arXiv preprint arXiv:1804.03209, 2018.

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, "Audio set: An ontology and humanlabeled dataset for audio events," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp. 776-780.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.

Y. Tokozume, Y. Ushiku, and T. Harada, "Learning from between-class examples for deep sound recognition," arXiv preprint arXiv:1711.10282, 2017.

A. Guzhov, F. Raue, J. Hees, and A. Dengel, "Esresnet: Environmental sound classification based on visual domain models," in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 4933-4940.

B. Gfeller, C. Frank, D. Roblek, M. Sharifi, M. Tagliasacchi, and M. Velimirović, "Spice: Self-supervised pitch estimation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1118-1128, 2020.

S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, "Youtube-8m: A large-scale video classification benchmark," arXiv preprint arXiv:1609.08675, 2016.

S. O. Folorunso, E. Ogbuju, and F. Oladipo, "Artificial intelligence and the control of covid-19: A review of machine and deep learning approaches," Artificial Intelligence for COVID-19, pp. 167-185, 2021.

X. Huai, S. Kitada, D. Choi, P. Siriaraya, N. Kuwahara, and T. Ashihara, "Heart sound recognition technology based on convolutional neural network," Informatics for Health and Social Care, pp. 1-13, 2021.

"Python," 2021. [Online]. Available: https://www.python.org/

J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan, D. Moore, B. Patton, A. Alemi, M. Hoffman, and R. A. Saurous, "Tensorflow distributions," 2017.

"Tensorflow," 2021. [Online]. Available: https://www.tensorflow.org/

B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, "librosa: Audio and music signal analysis in python," in Proceedings of the 14th python in science conference, vol. 8. Citeseer, 2015, pp. 18-25.

"librosa," 2021. [Online]. Available: https://librosa.org/doc/latest/index.htm

T. Giannakopoulos, "pyaudioanalysis: An open-source python library for audio signal analysis," PloS one, vol. 10, no. 12, p. e0144610, 2015.

K. J. Piczak, "Environmental sound classification with convolutional neural networks," in 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP). IEEE, 2015, pp. 1-6.
Publicado
02/11/2022
DÖRR, Jéfer Benedett; AYLON, Linnyer Beatrys Ruiz. Um Estudo sobre Técnicas utilizadas para o Reconhecimento de Sons com o uso de Inteligência Artificial e Python. In: CONGRESSO LATINO-AMERICANO DE SOFTWARE LIVRE E TECNOLOGIAS ABERTAS (LATINOWARE), 19. , 2022, Evento Híbrido. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 103-112. DOI: https://doi.org/10.5753/latinoware.2022.227845.