A Generative Approach for Face Mask Removal Using Audio and Appearance

Luiz E. L. Coelho; Raphael Prates; William Robson Schwartz

Luiz E. L. Coelho UFMG
Raphael Prates UFMG
William Robson Schwartz UFMG

Resumo

Since the COVID-19 pandemic, the use of facial masks in public spaces or during people gatherings has become common. Therefore, journalists, reporters, and interviewees frequently use a mask, following the public health measures to contain the pandemic. However, using a mask while speaking or conducting a presentation can be uncomfortable for viewers. Furthermore, the usage of a mask prevents lip reading, which can harm the speech comprehension of people with hearing impairment. Thus, this work aims at artificially removing masks in videos while recovering the lip movements using the audio and uncovered face features. We use the audio to infer the lip movement in a way it matches with the uttered phrase. From the audio, we estimate landmarks representing the mouth structure. Finally, the landmarks (i.e. uncovered and estimated) are the input in a generative adversarial network (GAN) that reconstructs the full face image with the mouth in a correct shape. We present quantitative results in the form of evaluation metrics and qualitative results in the form of visual examples.

Palavras-chave: Measurement, Visualization, Pandemics, Shape, Lips, Mouth, Generative adversarial networks