Evaluating Image Synthesis: A Modest Review of Techniques and Metrics
Resumo
This paper reviews various image synthesis methods, highlighting key techniques such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. We analyze commonly used datasets and evaluation metrics, including SSIM, MS-SSIM, FID, IS, and LPIPS. Our findings show a preference for SSIM in structural quality assessment, while FID and IS are favored for overall quality and diversity. The growing use of LPIPS indicates a shift towards advanced perceptual metrics. This review emphasizes the necessity of combining multiple metrics for a comprehensive evaluation of image synthesis models, aiding future research in the field.Referências
M. Elasri, O. Elharrouss, S. Al-Maadeed, and H. Tairi, “Image gener415 ation: A review,” Neural Processing Letters, vol. 54, no. 5, pp. 4609–4646, 2022.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning. PMLR, 2015, pp. 2256–2265.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
S. Barratt and R. Sharma, “A note on the inception score,” arXiv preprint arXiv:1801.01973, 2018.
E. Betzalel, C. Penso, A. Navon, and E. Fetaya, “A study on the evaluation of generative models,” arXiv preprint arXiv:2206.10935, 2022.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2. Ieee, 2003, pp. 1398–1402.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
M. Prodan, G. V. Vl˘asceanu, and C.-A. Boiangiu, “Comprehensive evaluation of metrics for image resemblance,” Journal of Information Systems & Operations Management, vol. 17, no. 1, pp. 161–185, 2023.
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
A. Grigorev, A. Sevastopolsky, A. Vakhitov, and V. Lempitsky, “Coordinate-based texture inpainting for pose-guided human image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 135–12 144.
M. Zhai, L. Chen, F. Tung, J. He, M. Nawhal, and G. Mori, “Lifelong gan: Continual learning for conditional image generation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2759–2768.
U. Osahor, H. Kazemi, A. Dabouei, and N. Nasrabadi, “Quality guided sketch-to-photo image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 820–821.
R. Liu, Q. Yu, and S. X. Yu, “Unsupervised sketch to photo synthesis,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 36–52.
L. Li, Y. Sun, F. Hu, T. Zhou, X. Xi, and J. Ren, “Text to realistic image generation with attentional concatenation generative adversarial networks,” Discrete Dynamics in Nature and Society, vol. 2020, no. 1, p. 6452536, 2020.
J. Li, X. Zhang, C. Jia, J. Xu, L. Zhang, Y. Wang, S. Ma, and W. Gao, “Direct speech-to-image translation,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 3, pp. 517–529, 2020.
H. Li and J. Tang, “Dairy goat image generation based on improved-self-attention generative adversarial networks,” IEEE Access, vol. 8, pp. 62 448–62 457, 2020.
Z. Li, C. Deng, E. Yang, and D. Tao, “Staged sketch-to-image synthesis via semi-supervised generative adversarial networks,” IEEE Transactions on Multimedia, vol. 23, pp. 2694–2705, 2020.
T. Zia, S. Arif, S. Murtaza, and M. A. Ullah, “Text-to-image generation with attention based recurrent neural networks,” arXiv preprint arXiv:2001.06658, 2020.
H. Tang, S. Bai, L. Zhang, P. H. Torr, and N. Sebe, “Xinggan for person image generation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, 2020, pp. 717–734.
J. Zhang, K. Li, Y.-K. Lai, and J. Yang, “Pise: Person image synthesis and editing with decoupled gan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7982–7990.
L. Gao, D. Chen, Z. Zhao, J. Shao, and H. T. Shen, “Lightweight dynamic conditional gan with pyramid attention for text-to-image synthesis,” Pattern Recognition, vol. 110, p. 107384, 2021.
A. Phaphuangwittayakul, Y. Guo, and F. Ying, “Fast adaptive meta-learning for few-shot image generation,” IEEE Transactions on Multimedia, vol. 24, pp. 2205–2217, 2021.
X. Wang, T. Qiao, J. Zhu, A. Hanjalic, and O. Scharenborg, “Generating images from spoken descriptions,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 850–865, 2021.
H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 833–842.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning. PMLR, 2015, pp. 2256–2265.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
S. Barratt and R. Sharma, “A note on the inception score,” arXiv preprint arXiv:1801.01973, 2018.
E. Betzalel, C. Penso, A. Navon, and E. Fetaya, “A study on the evaluation of generative models,” arXiv preprint arXiv:2206.10935, 2022.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2. Ieee, 2003, pp. 1398–1402.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
M. Prodan, G. V. Vl˘asceanu, and C.-A. Boiangiu, “Comprehensive evaluation of metrics for image resemblance,” Journal of Information Systems & Operations Management, vol. 17, no. 1, pp. 161–185, 2023.
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
A. Grigorev, A. Sevastopolsky, A. Vakhitov, and V. Lempitsky, “Coordinate-based texture inpainting for pose-guided human image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 135–12 144.
M. Zhai, L. Chen, F. Tung, J. He, M. Nawhal, and G. Mori, “Lifelong gan: Continual learning for conditional image generation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2759–2768.
U. Osahor, H. Kazemi, A. Dabouei, and N. Nasrabadi, “Quality guided sketch-to-photo image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 820–821.
R. Liu, Q. Yu, and S. X. Yu, “Unsupervised sketch to photo synthesis,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 36–52.
L. Li, Y. Sun, F. Hu, T. Zhou, X. Xi, and J. Ren, “Text to realistic image generation with attentional concatenation generative adversarial networks,” Discrete Dynamics in Nature and Society, vol. 2020, no. 1, p. 6452536, 2020.
J. Li, X. Zhang, C. Jia, J. Xu, L. Zhang, Y. Wang, S. Ma, and W. Gao, “Direct speech-to-image translation,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 3, pp. 517–529, 2020.
H. Li and J. Tang, “Dairy goat image generation based on improved-self-attention generative adversarial networks,” IEEE Access, vol. 8, pp. 62 448–62 457, 2020.
Z. Li, C. Deng, E. Yang, and D. Tao, “Staged sketch-to-image synthesis via semi-supervised generative adversarial networks,” IEEE Transactions on Multimedia, vol. 23, pp. 2694–2705, 2020.
T. Zia, S. Arif, S. Murtaza, and M. A. Ullah, “Text-to-image generation with attention based recurrent neural networks,” arXiv preprint arXiv:2001.06658, 2020.
H. Tang, S. Bai, L. Zhang, P. H. Torr, and N. Sebe, “Xinggan for person image generation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, 2020, pp. 717–734.
J. Zhang, K. Li, Y.-K. Lai, and J. Yang, “Pise: Person image synthesis and editing with decoupled gan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7982–7990.
L. Gao, D. Chen, Z. Zhao, J. Shao, and H. T. Shen, “Lightweight dynamic conditional gan with pyramid attention for text-to-image synthesis,” Pattern Recognition, vol. 110, p. 107384, 2021.
A. Phaphuangwittayakul, Y. Guo, and F. Ying, “Fast adaptive meta-learning for few-shot image generation,” IEEE Transactions on Multimedia, vol. 24, pp. 2205–2217, 2021.
X. Wang, T. Qiao, J. Zhu, A. Hanjalic, and O. Scharenborg, “Generating images from spoken descriptions,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 850–865, 2021.
H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 833–842.
Publicado
30/09/2024
Como Citar
SOUSA, Roney Nogueira de; OLIVEIRA, Saulo Anderson Freitas.
Evaluating Image Synthesis: A Modest Review of Techniques and Metrics. In: WORKSHOP DE TRABALHOS EM ANDAMENTO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 82-87.
DOI: https://doi.org/10.5753/sibgrapi.est.2024.31649.