Optimizing CleanUNet Architecture Parameters for Enhancing Speech Denoising
Resumo
Speech enhancement refers to a set of techniques aiming to recover clean speech from a corrupted signal. One way to corrupt a signal is through noise addition. Noise comes in a variety of ways. Suboptimal acoustic conditions can cause background noise and echo, hampering speech clarity and making denoising techniques necessary to enhance the speech. In this work, we optimized CleanUNet, a convolutional neural network (CNN) architecture proposed specifically for causal speech denoising tasks. We explored alternatives for the transformer bottleneck, such as Mamba architecture, capable of handling encoder outputs more efficiently with linear complexity, we also reduced the number of hidden channels in the convolutional layers. This decreases the model’s parameter count and improves training and inference speed on a single GPU, offering a streamlined approach for enhanced performance. To our understanding, this is the first attempt to incorporate Mamba as a replacement for the vanilla transformer in the CleanUnet architecture.
Publicado
17/11/2024
Como Citar
SILVA, Matheus Vieira da; MARI, João Fernando; BACKES, André Ricardo.
Optimizing CleanUNet Architecture Parameters for Enhancing Speech Denoising. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 254-264.
ISSN 2643-6264.