Mapeamento e Localização Simultâneos em Ambientes Dinâmicos usando Detecção de Pessoas

João Carlos Virgolino Soares; Marcelo Gattass; Marco Antonio Meggiolaro

doi:10.5753/wtdr_ctdr.2022.227367

João Carlos Virgolino Soares PUC-Rio
Marcelo Gattass PUC-Rio
Marco Antonio Meggiolaro PUC-Rio

DOI: https://doi.org/10.5753/wtdr_ctdr.2022.227367

Resumo

Localização e Mapeamento Simultâneos é um problema fundamental em robótica móvel. No entanto, a maioria dos algoritmos de SLAM Visual assume um cenário estático, limitando sua aplicabilidade em ambientes do mundo real. Lidar com conteúdo dinâmico em SLAM visual ainda é um problema em aberto. Este trabalho apresenta o primeiro método de SLAM visual feito para ambientes humanos lotados usando detecção de pessoas.

Referências

Bescos, B., Fácil, J. M., Civera, J., and Neira, J. (2018). Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters, 3(4):4076-4083.

Cui, L. and Ma, C. (2019). Sof-slam: A semantic visual slam for dynamic environments. IEEE Access, 7:166528-166539.

Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I. D., Roth, S., Schindler, K., and Leal-Taix'e, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. ArXiv, abs/2003.09003.

Endres, F., Hess, J., Sturm, J., Cremers, D., and Burgard, W. (2014). 3-d mapping with an rgb-d camera. IEEE Transactions on Robotics, 30:177 - 187.

Engel, J., Schöps, T., and Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II, pages 834-849.

Galvez-López, D. and Tardos, J. D. (2012). Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5):1188-1197.

Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440-1448.

He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969.

Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard, W. (2011). g2o: A general framework for graph optimization. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).

Labbé, M. and Michaud, F. (2019). Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics, 36(2):416-446.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer.

Liu, H., Liu, G., Tian, G., Xin, S., and Ji, Z. (2019). Visual slam based on dynamic object removal. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 596-601.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. (2016). Ssd: single shot multibox detector. In Proceedings of the European conference on computer vision, pages 21-37.

Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., and Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. ArXiv, abs/1603.00831.

Mur-Artal, R., Montiel, J. M. M., and Tardós, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147-1163.

Mur-Artal, R. and Tardós, J. D. (2017). Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255-1262.

Redmon, J. (2016). Darknet: Open source neural networks in c.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779-788.

Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91-99.

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564-2571.

Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123.

Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573-580.

Xiao, L., Wang, J., Qiu, X., Rong, Z., and Zou, X. (2019). Dynamic-slam: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robotics and Autonomous Systems, 117:1-16.

Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., and Fei, Q. (2018). Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168-1174. IEEE.

Zhong, F., Wang, S., Zhang, Z., Chen, C., and Wang, Y. (2018). Detect-slam: Making object detection and slam mutually beneficial. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1001-1010.