Classificação Multirrótulo Aplicada a Imagens Omnidirecionais

Manuel Veras; Thiago L. T. da Silveira

doi:10.5753/sibgrapi.est.2023.27458

Manuel Veras UFRGS
Thiago L. T. da Silveira UFRGS

DOI: https://doi.org/10.5753/sibgrapi.est.2023.27458

Resumo

Redes neurais convolucionais (CNNs) têm sido amplamente empregadas em problemas de visão computacional, especialmente em aplicações que envolvem imagens convencionais, baseadas em captura pinhole. No entanto, há uma crescente demanda por soluções capazes de lidar com imagens esféricas e a adaptação bem-sucedida de métodos utilizados em imagens planas para imagens omnidirecionais não é uma tarefa direta. Neste trabalho, nosso objetivo é realizar uma análise comparativa entre duas arquiteturas de redes neurais para a classificação multirrótulo aplicada a imagens esféricas. A primeira rede utiliza convoluções convencionais, enquanto a segunda incorpora convoluções esféricas. Ambas foram treinadas em um subconjunto da base de dados Structured3D. Foram feitos dois experimentos com o conjunto de dados: no primeiro experimento consideramos imagens ERP não-rotacionadas e no segundo experimento foram utilizadas imagens ERP rotacionadas, simulando capturas inclinadas. Constatamos que para ambos experimentos a CNN esférica obteve um desempenho mais satisfatório em relação as três métricas analisadas: Hamming Loss (HL), Exact Match Ratio (EMR) e F1-score.

Referências

B. Coors, A. P. Condurache, and A. Geiger, “Spherenet: Learning spherical representations for detection and classification in omnidirectional images,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.

T. Silveira and C. Jung, “Visual computing in 360°: Foundations, challenges, and applications,” Natal/RN, 2022, in Portuguese: Anais do XXXV Congresso de Gráficos, Padrões e Imagens.

S. Cho, R. Jung, and J. Kwon, “Spherical transformer,” 2022.

S. Wang and Z. Su, “Metamorphic testing for object detection systems,” CoRR, vol. abs/1912.12162, 2019. [Online]. Available: [link]

J. Read and F. Perez-Cruz, “Deep learning for multi-label classification,” 2014.

R. Szeliski, Computer Vision. Springer International Publishing, 2022. [Online]. Available: https://doi.org/10.1007/978-3-030-34372-9

S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu, “Query2label: A simple transformer way to multi-label classification,” 2021.

N. M. Bidgoli, R. G. de A. Azevedo, T. Maugey, A. Roumy, and P. Frossard, “OSLO: On-the-sphere learning for omnidirectional images and its application to 360-degree image compression,” IEEE Transactions on Image Processing, vol. 31, pp. 5813–5827, 2022. [Online]. Available: https://doi.org/10.1109%2Ftip.2022.3202357

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

Y.-C. Su and K. Grauman, “Learning spherical convolution for fast features from 360deg imagery,′′ 2018.

C. Fernandez-Labrador, J. M. Facil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. J. Guerrero, “Corners for layout: End-to-end layout recovery from 360 images,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1255–1262, 2020.

C.-O. Artizzu, G. Allibert, and C. Demonceaux, “OMNI-CONV: Generalization of the Omnidirectional Distortion-Aware Convolutions,” Journal of Imaging, vol. 9, no. 2, 2023. [Online]. Available: [link]

K. Tateno, N. Navab, and F. Tombari, “Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11220 LNCS, pp. 732–750, 2018.

J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, “Structured3d: A large photo-realistic dataset for structured 3d modeling,” in Proceedings of The European Conference on Computer Vision (ECCV), 2020.

T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical cnns,” CoRR, vol. abs/1801.10130, 2018. [Online]. Available: [link]

R. Khasanova and P. Frossard, “Graph-based classification of omnidirectional images,” CoRR, vol. abs/1707.08301, 2017. [Online]. Available: [link]

T. Kobayashi, “Two-way multi-label loss,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

Y. Kim, J. M. Kim, Z. Akata, and J. Lee, “Large loss matters in weakly supervised multi-label classification,” 2022.

E. Ben-Baruch, T. Ridnik, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asymmetric loss for multi-label classification,” 2021.

J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, “CNN-RNN: A unified framework for multi-label image classification,” CoRR, vol. abs/1604.04573, 2016. [Online]. Available: [link]

F. Shao, L. Chen, J. Shao, W. Ji, S. Xiao, L. Ye, Y. Zhuang, and J. Xiao, “Deep learning for weakly-supervised object detection and localization: A survey,” Neurocomputing, vol. 496, pp. 192–207, Jul. 2022. [Online]. Available: https://doi.org/10.1016/j.neucom.2022.01.095

J. Bogatinovski, L. Todorovski, S. Džeroski, and D. Kocev, “Comprehensive comparative study of multi-label classification methods,” Expert Systems with Applications, vol. 203, p. 117215, Oct. 2022. [Online]. Available: https://doi.org/10.1016/j.eswa.2022.117215

M. S. Sorower, “A literature survey on algorithms for multi-label learning,” 2010. [Online]. Available: [link]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.

F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus, Multilabel Classification. Cham: Springer International Publishing, 2016, pp. 17–31. [Online]. Available: https://doi.org/10.1007/978-3-319-41111-8_2

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929.

Classificação Multirrótulo Aplicada a Imagens Omnidirecionais

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)