Face Reconstruction with Variational Autoencoder and Face Masks

Rafael S. Toledo; Eric A. Antonelo

doi:10.5753/eniac.2021.18282

Rafael S. Toledo UFSC
Eric A. Antonelo UFSC

DOI: https://doi.org/10.5753/eniac.2021.18282

Resumo

Variational AutoEncoders (VAE) employ deep learning models to learn a continuous latent z-space that is subjacent to a high-dimensional observed dataset. With that, many tasks are made possible, including face reconstruction and face synthesis. In this work, we investigated how face masks can help the training of VAEs for face reconstruction, by restricting the learning to the pixels selected by the face mask. An evaluation of the proposal using the celebA dataset shows that the reconstructed images are enhanced with the face masks, especially when SSIM loss is used either with l1 or l2 loss functions. We noticed that the inclusion of a decoder for face mask prediction in the architecture affected the performance for l1 or l2 loss functions, while this was not the case for the SSIM loss. Besides, SSIM perceptual loss yielded the crispest samples between all hypotheses tested, although it shifts the original color of the image, making the usage of the l1 or l2 losses together with SSIM helpful to solve this issue.

Referências

Brock, A., Donahue, J., and Simonyan, K. (2018). Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.

Dai, B. and Wipf, D. (2019). Diagnosing and enhancing vae models. arXiv preprint arXiv:1903.05789.

Dosovitskiy, A. and Brox, T. (2016). Generating images with perceptual similarity metrics based on deep networks. Advances in neural information processing systems, 29:658– 666.

Esser, P., Sutter, E., and Ommer, B. (2018). A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8857–8866.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.

He, Z., Zuo, W., Kan, M., Shan, S., and Chen, X. (2019). Attgan: Facial attribute IEEE transactions on image processing, editing by only changing what you want. 28(11):5464–5478.

Hou, X., Shen, L., Sun, K., and Qiu, G. (2017). Deep feature consistent variational autoencoder. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1133–1141. IEEE.

Khan, S. H., Hayat, M., and Barnes, N. (2018). Adversarial training of variational autoencoders for high fidelity image generation. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 1312–1320. IEEE.

Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes in 2nd international conference on learning representations. In ICLR 2014-Conference Track Proceedings.

Kingma, D. P. and Welling, M. (2019). An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691.

Larsen, A. B. L., Sønderby, S. K., Larochelle, H., and Winther, O. (2016). AutoencodIn International conference on ing beyond pixels using a learned similarity metric. machine learning, pages 1558–1566. PMLR.

Liu, M., Ding, Y., Xia, M., Liu, X., Ding, E., Zuo, W., and Wen, S. (2019). Stgan: A In Proceedunified selective transfer network for arbitrary image attribute editing. ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3673–3682.

Liu, Z., Luo, P., Wang, X., and Tang, X. (2018). Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018):11.

Mescheder, L., Geiger, A., and Nowozin, S. (2018). Which training methods for gans do actually converge? In International conference on machine learning, pages 3481– 3490. PMLR.

Nilsson, J. and Akenine-Möller, T. (2020). Understanding ssim. arXiv preprint arXiv:2006.13846.

Qian, S., Lin, K.-Y., Wu, W., Liu, Y., Wang, Q., Shen, F., Qian, C., and He, R. (2019). Make a face: Towards arbitrary high fidelity face manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10033–10042.

Razavi, A., van den Oord, A., and Vinyals, O. (2019). Generating diverse high-fidelity images with vq-vae-2. In Advances in neural information processing systems, pages 14866–14876.

Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pages 1278–1286. PMLR.

Seo, S., Ki, S., and Kim, M. (2020). A novel just-noticeable-difference-based saliencychannel attention residual network for full-reference image quality predictions. IEEE Transactions on Circuits and Systems for Video Technology.

Sheikh, H. R. and Bovik, A. C. (2006). Image information and visual quality. IEEE Transactions on image processing, 15(2):430–444.

Snell, J., Ridgeway, K., Liao, R., Roads, B. D., Mozer, M. C., and Zemel, R. S. (2017). Learning to generate images with perceptual similarity metrics. In 2017 IEEE International Conference on Image Processing (ICIP), pages 4277–4281. IEEE.

Vahdat, A. and Kautz, J. (2020). NVAE: A deep hierarchical variational autoencoder. In Neural Information Processing Systems (NeurIPS).

Wang, Z. and Bovik, A. C. (2009). Mean squared error: Love it or leave it? a new look at signal fidelity measures. IEEE signal processing magazine, 26(1):98–117.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612.

Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee.

Zhang, L., Zhang, L., Mou, X., and Zhang, D. (2012). A comprehensive evaluation of full reference image quality assessment algorithms. In 2012 19th IEEE International Conference on Image Processing, pages 1477–1480. IEEE.

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. (2018). The unreasonable In Proceedings of the IEEE effectiveness of deep features as a perceptual metric. conference on computer vision and pattern recognition, pages 586–595.

Zhao, H., Gallo, O., Frosio, I., and Kautz, J. (2016). Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 3(1):47–57.