Scene classification using a combination of aerial and ground images

Gabriel Machado; Keiller Nogueira; Jefersson Alex dos Santos

doi:10.5753/sibgrapi.est.2021.20015

Gabriel Machado UFMG
Keiller Nogueira University of Stirling
Jefersson Alex dos Santos UFMG

DOI: https://doi.org/10.5753/sibgrapi.est.2021.20015

Resumo

lt is undeniable that aerial images can provide useful information for a large variety of tasks, such as disaster relief, and urban planning. But, since these images only see the Earth from one point of view, some applications may benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public image repositories for both georeferenced photos and aerial images (such as Google Maps, and Street View), there is a lack of public datasets that allow studies that exploit the complementarity of aerial+ground imagery. Given this, we present two new publicly available datasets named AiRound and CV-BrCT. Using both, we tackled the scene classification task in 2 different scenarios. The first one has a fully-paired image set, while the second has missing samples. In both situations, we used deep learning and feature fusion algorithms. To handle missing samples, we proposed a content-based image retrieval framework.

Palavras-chave: deep learning, machine learning, remote sensing, image classification, multi-modal machine learning, metric learning, cross-view matching

Referências

N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore, “Google earth engine: Planetary-scale geospatial analysis for everyone,” Remote Sensing of Environment, 2017.

S. S. Rwanga, J. M. Ndambuki et al., “Accuracy assessment of land use/land cover classification using remote sensing and gis,” International Journal of Geosciences, vol. 8, no. 04, p. 611, 2017.

P. Zhang, Y. Ke, Z. Zhang, M. Wang, P. Li, and S. Zhang, “Urban land use and land cover classification using novel deep learning models based on high spatial resolution satellite imagery,” Sensors, vol. 18, no. 11, p. 3717, 2018.

C. Zhang, I. Sargent, X. Pan, H. Li, A. Gardiner, J. Hare, and P. M. Atkinson, “An object-based convolutional neural network (ocnn) for urban land use classification,” Remote Sensing of Environment, vol. 216, pp. 57–70, 2018.

H. Qiu, C. Wang, J. Wang, N. Wang, and W. Zeng, “Cross view fusion for 3d human pose estimation,” in IEEE International Conference on Computer Vision, October 2019.

S. Hu, M. Feng, R. M. Nguyen, and G. Hee Lee, “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” in IEEE/CVF Computer Vision and Pattern Recognition, 2018.

S. Srivastava, J. E. Vargas-Muñoz, and D. Tuia, “Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution,” Remote Sensing of Environment, 2019.

E. J. Hoffmann, Y. Wang, M. Werner, J. Kang, and X. X. Zhu, “Model fusion for building type classification from aerial and street view images,” Remote Sensing, vol. 11, no. 11, p. 1259, May 2019.

N. Ghouaiel and S. LefÃ¨vre, “Coupling ground-level panoramas and aerial imagery for change detection,” Geo-spatial Information Science, vol. 19, no. 3, pp. 222–232, 2016.

J. D. Wegner, S. Branson, D. Hall, K. Schindler, and P. Perona, “Cataloging public objects using aerial and street-level images-urban trees,” in IEEE/CVF Computer Vision and Pattern Recognition, 2016.

R. Cao, J. Zhu, W. Tu, Q. Li, J. Cao, B. Liu, Q. Zhang, and G. Qiu, “Integrating aerial and street view images for urban land use classification,” Remote Sensing, vol. 10, no. 10, p. 1553, Sep 2018.

L. Liu and H. Li, “Lending orientation to neural networks for cross-view geo-localization,” in International Conference on Pattern Recognition, 2019.

A. L. Majdik, Y. Albers-Schoenberg, and D. Scaramuzza, “Mav urban localization from google street view data,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 3979–3986.

M. Rumpler, A. Tscharf, C. Mostegel, S. Daftry, C. Hoppe, R. Prettenthaler, F. Fraundorfer, G. Mayer, and H. Bischof, “Evaluations on multiscale camera networks for precise and geo-accurate reconstructions from aerial and terrestrial images with user guidance,” Computer Vision and Image Understanding, vol. 157, pp. 255–273, 2017.

M. Zhai, Z. Bessinger, S. Workman, and N. Jacobs, “Predicting groundlevel scene layout from aerial imagery,” in IEEE/CVF Computer Vision and Pattern Recognition, 2017.

S. Workman, R. Souvenir, and N. Jacobs, “Wide-area image geolocalization with aerial reference imagery,” in IEEE International Conference on Computer Vision, December 2015.

T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in IEEE/CVF Computer Vision and Pattern Recognition, 2015, pp. 5007–5015.

N. N. Vo and J. Hays, “Localizing and orienting street views using overhead imagery,” in European Conference on Computer Vision. Springer, 2016.

S. Workman, M. Zhai, D. J. Crandall, and N. Jacobs, “A unified model for near and remote sensing,” in IEEE International Conference on Computer Vision, 2017, pp. 2688–2697.

Y. Tian, C. Chen, and M. Shah, “Cross-view image matching for geolocalization in urban environments,” in IEEE/CVF Computer Vision and Pattern Recognition, 2017, pp. 3608–3616.

X. Lu, Z. Li, Z. Cui, M. R. Oswald, M. Pollefeys, and R. Qin, “Geometry-aware satellite-to-ground image synthesis for urban areas,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 859–867.

G. Machado, E. Ferreira, K. Nogueira, H. Oliveira, M. Brito, P. H. T. Gama, and J. A. d. Santos, “Airound and cv-brct: Novel multiview datasets for scene classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 488–503, 2021.

E. Ferreira, M. Brito, R. Balaniuk, M. S. Alvim, and J. A. dos Santos, “Brazildam: A benchmark dataset for tailings dam detection,” in 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS). IEEE, 2020, pp. 339–344.

E. Alpaydin, Introduction to machine learning. MIT press, 2014.

M. K.-P. Ng, Q. Yuan, L. Yan, and J. Sun, “An adaptive weighted tensor completion method for the recovery of remote sensing images with missing data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 6, pp. 3367–3381, 2017.

L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, ser. KDD ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 1158â1166.

L. Zhang, Y. Zhao, Z. Zhu, D. Shen, and S. Ji, “Multi-view missing data completion,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1296–1309, 2018.

Y. Shi, L. Liu, X. Yu, and H. Li, “Spatial-aware feature aggregation for image based cross-view geo-localization,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019.

Y. Shi, X. Yu, D. Campbell, and H. Li, “Where am i looking at? joint location and orientation estimation by cross-view matching,” in IEEE/CVF Computer Vision and Pattern Recognition, 2020, pp. 4064–4072.

H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in IEEE/CVF Computer Vision and Pattern Recognition, 2016, pp. 4004–4012.

H. Liu, Y. Tian, Y. Yang, L. Pang, and T. Huang, “Deep relative distance learning: Tell the difference between similar vehicles,” in IEEE/CVF Computer Vision and Pattern Recognition, 2016, pp. 2167–2175.

K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in Neural Information Processing Systems, vol. 29, pp. 1857–1865, 2016.

Y. Zhao, Z. Jin, G.-j. Qi, H. Lu, and X.-s. Hua, “An adversarial approach to hard triplet generation,” in European Conference on Computer Vision, 2018, pp. 501–517.

M. Carvalho, R. Cadène, D. Picard, L. Soulier, N. Thome, and M. Cord, “Cross-modal retrieval in the cooking context: Learning semantic textimage embeddings,” in The 41st International ACM SIGIR Conference on Research Development in Information Retrieval, ser. SIGIR ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 35â44.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in IEEE/CVF Computer Vision and Pattern Recognition, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Computer Vision and Pattern Recognition, 2016.

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in IEEE/CVF Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE/CVF Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.

X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in IEEE/CVF Computer Vision and Pattern Recognition, 2019, pp. 510–519.

Scene classification using a combination of aerial and ground images

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)