ML-Based Road Asset Geolocation Using Object Detection and Camera Displacement
Resumo
Geolocation methods identify objects in images and determine their geospatial locations. Current object geolocalization methods face challenges such as high hardware costs, limited object class coverage, difficulties with repeated object occurrences, and performance issues in dynamic environments. This paper introduces a machine learning approach for geolocalizing objects from low frame rate video using a single camera and image metadata, aiming to reduce costs and complexity compared to traditional methods. The method combines camera displacement data and object bounding boxes obtained from an object detection model to estimate geospatial locations. The approach was evaluated using diverse datasets that capture various driving environments and object types, demonstrating its capability to handle multiple scenarios.
Palavras-chave:
Geolocation, Road Management, Object Detection, Highway
Referências
M. Chaabane, L. Gueguen, A. Trabelsi, J. R. Beveridge, and S. O’Hara, “End-to-end learning improves static object geo-localization in monocular video,” CoRR, vol. abs/2004.05232, 2020. DOI: 10.48550/arXiv.2004.05232
A. S. Nassar, S. D’Aronco, S. Lefèvre, and J. D. Wegner, “Geograph: Graph-based multi-view object detection with geometric cues end-to-end,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII. Berlin, Heidelberg: Springer-Verlag, 2020, p. 488–504. DOI: 10.1007/978-3-030-58610-2_29
C. McManus, W. Churchill, W. Maddern, A. D. Stewart, and P. Newman, “Shady dealings: Robust, long-term visual localisation using illumination invariance,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 901–906. DOI: 10.1109/ICRA.2014.6906974
N. Sünderhauf, S. Shirazi, A. Jacobson, E. Pepperell, F. Dayoub, B. Upcroft, and M. Milford, “Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free,” in Proceedings of Robotics: Science and Systems XII, 07 2015. DOI: 10.15607/RSS.2016.XII.045
D. Wilson, X. Zhang, W. Sultani, and S. Wshah, “Image and object geo-localization,” International Journal of Computer Vision, vol. 132, no. 4, p. 1350–1392, Nov. 2024. DOI: 10.1007/s11263-023-01779-3
V. A. Krylov, E. Kenny, and R. Dahyot, “Automatic discovery and geotagging of objects from street view imagery,” Remote Sensing, vol. 10, no. 5, 2018. DOI: 10.3390/rs10050739
V. A. Krylov and R. Dahyot, “Object geolocation using mrf based multi-sensor fusion,” 10 2018, pp. 2745–2749. DOI: 10.1109/ICIP.2018.8451446
A. S. Nassar, S. Lefèvre, and J. D. Wegner, “Simultaneous multi-view instance detection with learned geometric soft-constraints,” CoRR, vol. abs/1907.10892, 2019. DOI: 10.48550/arXiv.1907.10892
R. Szeliski, Computer Vision: Algorithms and Applications, ser. Texts in Computer Science. Springer, 2011. DOI: 10.1007/978-1-84882-935-0
N. Fairfield and C. Urmson, “Traffic light mapping and detection,” 06 2011, pp. 5421–5426.
B. Soheilian, N. Paparoditis, and B. Vallet, “Detection and 3d reconstruction of traffic signs from multiple view color images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 77, pp. 1–20, 03 2013. DOI: 10.1016/j.isprsjprs.2013.02.008
R. Hebbalaguppe, G. Garg, E. Hassan, H. Ghosh, and A. Verma, “Telecom inventory management via object recognition and localisation on google street view images,” in Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 03 2017. DOI: 10.1109/WACV.2017.74
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 07 2005, pp. 886–893. DOI: 10.1109/CVPR.2005.177
J. Bai, H. Qin, S. Lai, J. Guo, and Y. Guo, “Glpano-depth: Global-to-local panoramic depth estimation,” IEEE Transactions on Image Processing, vol. 33, pp. 2936–2949, 2024. DOI: 10.1109/TIP.2023.3248956
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944. DOI: 10.1109/CVPR.2017.106
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014. DOI: 10.1109/CVPR.2014.81
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2016. DOI: 10.1109/TPAMI.2016.2577031
P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 06 2019, pp. 7934–7943. DOI: 10.1109/CVPR.2019.00813
J. Xu, Y. Cao, Z. Zhang, and H. Hu, “Spatial-temporal relation networks for multi-object tracking,” 2019. DOI: 10.48550/arXiv.1904.11472
D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2544–2550. DOI: 10.1109/CVPR.2010.5539960
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015. DOI: 10.1109/TPAMI.2014.2345390
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in Computer Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds. Cham: Springer International Publishing, 2016, pp. 850–865. DOI: 10.1007/978-3-319-48881-3_56
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4277–4286. DOI: 10.1109/CVPR.2019.00441
B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 428–10 437. DOI: 10.1109/ICCV.2021.01031
Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 598–13 608. DOI: 10.1109/CVPR.2022.01325
J. Fan and S. Ji, “Adaptive and anti-drift motion constraints for object tracking in satellite videos,” Remote Sensing, vol. 16, no. 8, 2024. DOI: 10.3390/rs16081384
D. Wilson, T. Alshaabi, C. Van Oort, X. Zhang, J. Nelson, and S. Wshah, “Object tracking and geo-localization from street images,” Remote Sensing, vol. 14, no. 11, 2022. DOI: 10.3390/rs14112524
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: [link]
F. Almutairy, T. Alshaabi, J. Nelson, and S. Wshah, “Arts: Automotive repository of traffic signs for the united states,” Trans. Intell. Transport. Syst., vol. 22, no. 1, p. 457–465, dec 2020. DOI: 10.1109/TITS.2020.2978039
A. S. Nassar, S. D’Aronco, S. Lefèvre, and J. D. Wegner, “Geograph: Graph-based multi-view object detection with geometric cues end-to-end,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII. Berlin, Heidelberg: Springer-Verlag, 2020, p. 488–504. DOI: 10.1007/978-3-030-58610-2_29
C. McManus, W. Churchill, W. Maddern, A. D. Stewart, and P. Newman, “Shady dealings: Robust, long-term visual localisation using illumination invariance,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 901–906. DOI: 10.1109/ICRA.2014.6906974
N. Sünderhauf, S. Shirazi, A. Jacobson, E. Pepperell, F. Dayoub, B. Upcroft, and M. Milford, “Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free,” in Proceedings of Robotics: Science and Systems XII, 07 2015. DOI: 10.15607/RSS.2016.XII.045
D. Wilson, X. Zhang, W. Sultani, and S. Wshah, “Image and object geo-localization,” International Journal of Computer Vision, vol. 132, no. 4, p. 1350–1392, Nov. 2024. DOI: 10.1007/s11263-023-01779-3
V. A. Krylov, E. Kenny, and R. Dahyot, “Automatic discovery and geotagging of objects from street view imagery,” Remote Sensing, vol. 10, no. 5, 2018. DOI: 10.3390/rs10050739
V. A. Krylov and R. Dahyot, “Object geolocation using mrf based multi-sensor fusion,” 10 2018, pp. 2745–2749. DOI: 10.1109/ICIP.2018.8451446
A. S. Nassar, S. Lefèvre, and J. D. Wegner, “Simultaneous multi-view instance detection with learned geometric soft-constraints,” CoRR, vol. abs/1907.10892, 2019. DOI: 10.48550/arXiv.1907.10892
R. Szeliski, Computer Vision: Algorithms and Applications, ser. Texts in Computer Science. Springer, 2011. DOI: 10.1007/978-1-84882-935-0
N. Fairfield and C. Urmson, “Traffic light mapping and detection,” 06 2011, pp. 5421–5426.
B. Soheilian, N. Paparoditis, and B. Vallet, “Detection and 3d reconstruction of traffic signs from multiple view color images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 77, pp. 1–20, 03 2013. DOI: 10.1016/j.isprsjprs.2013.02.008
R. Hebbalaguppe, G. Garg, E. Hassan, H. Ghosh, and A. Verma, “Telecom inventory management via object recognition and localisation on google street view images,” in Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 03 2017. DOI: 10.1109/WACV.2017.74
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 07 2005, pp. 886–893. DOI: 10.1109/CVPR.2005.177
J. Bai, H. Qin, S. Lai, J. Guo, and Y. Guo, “Glpano-depth: Global-to-local panoramic depth estimation,” IEEE Transactions on Image Processing, vol. 33, pp. 2936–2949, 2024. DOI: 10.1109/TIP.2023.3248956
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944. DOI: 10.1109/CVPR.2017.106
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014. DOI: 10.1109/CVPR.2014.81
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2016. DOI: 10.1109/TPAMI.2016.2577031
P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 06 2019, pp. 7934–7943. DOI: 10.1109/CVPR.2019.00813
J. Xu, Y. Cao, Z. Zhang, and H. Hu, “Spatial-temporal relation networks for multi-object tracking,” 2019. DOI: 10.48550/arXiv.1904.11472
D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2544–2550. DOI: 10.1109/CVPR.2010.5539960
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015. DOI: 10.1109/TPAMI.2014.2345390
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in Computer Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds. Cham: Springer International Publishing, 2016, pp. 850–865. DOI: 10.1007/978-3-319-48881-3_56
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4277–4286. DOI: 10.1109/CVPR.2019.00441
B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 428–10 437. DOI: 10.1109/ICCV.2021.01031
Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 598–13 608. DOI: 10.1109/CVPR.2022.01325
J. Fan and S. Ji, “Adaptive and anti-drift motion constraints for object tracking in satellite videos,” Remote Sensing, vol. 16, no. 8, 2024. DOI: 10.3390/rs16081384
D. Wilson, T. Alshaabi, C. Van Oort, X. Zhang, J. Nelson, and S. Wshah, “Object tracking and geo-localization from street images,” Remote Sensing, vol. 14, no. 11, 2022. DOI: 10.3390/rs14112524
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: [link]
F. Almutairy, T. Alshaabi, J. Nelson, and S. Wshah, “Arts: Automotive repository of traffic signs for the united states,” Trans. Intell. Transport. Syst., vol. 22, no. 1, p. 457–465, dec 2020. DOI: 10.1109/TITS.2020.2978039
Publicado
06/11/2024
Como Citar
MEDEIROS, Victor Israel Anchieta de et al.
ML-Based Road Asset Geolocation Using Object Detection and Camera Displacement. In: WORKSHOP DE SISTEMAS DE INFORMAÇÃO (WSIS), 15. , 2024, Rio Paranaíba/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 8-14.
DOI: https://doi.org/10.5753/wsis.2024.33665.
