SynLibras: A Disentangled Deep Generative Model for Brazilian Sign Language Synthesis

Wellington Silveira; Andrew Alaniz; Marina Hurtado; Bernardo Castello Da Silva; Rodrigo De Bem

Wellington Silveira FURG
Andrew Alaniz FURG
Marina Hurtado FURG
Bernardo Castello Da Silva FURG
Rodrigo De Bem FURG

Resumo

Recent advances regarding deep generative models have strengthened a realm of approaches in which discriminative and generative tasks are tackled jointly in an analysis-by-synthesis manner. In this category, variational autoencoders (VAEs) and generative adversarial networks (GANs) aim for learning latent data representations from which sampling of synthetic images may be performed. However, sampling in such models normally does not allow for independent control of diverse factors of variation. Despite general efforts to overcome this issue, deep generative models tailored for sign language with disentangled factors of variation are yet not vastly explored in the literature. In this work, we introduce the SynLibras, a novel model that allows for disentangling appearance and gestural communication (i.e. body, hands and face poses) on image synthesis. Our model is capable of performing cross-language pose-transfer while maintaining the appearance of the source signer. We perform experiments on the RWTH-PHOENIX-Weather dataset and evaluation using the PSNR and the SSIM metrics. To our knowledge, the SynLibras is the first method for Brazilian sign language (Libras) synthesis in images. We compare our model with the EDN, a well-known general pose-transfer method, achieving better results on Libras synthesis. Finally, we also introduce the SynLibras-Pose, a dataset with annotated poses of Libras signers performing single words.

Palavras-chave: Measurement, Graphics, Analytical models, Image synthesis, Gesture recognition, Assistive technologies, Generative adversarial networks