Mapeamento e Localização Simultâneos em Ambientes Dinâmicos usando Detecção de Pessoas
Abstract
Simultaneous Localization and Mapping is a fundamental problem in mobile robotics. However, the majority of Visual SLAM algorithms assume a static scenario, limiting their applicability in real-world environments. Dealing with dynamic content in Visual SLAM is still an open problem. This work presents the first visual SLAM method for crowded human environments using people detection.
References
Bescos, B., Fácil, J. M., Civera, J., and Neira, J. (2018). Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters, 3(4):4076-4083.
Cui, L. and Ma, C. (2019). Sof-slam: A semantic visual slam for dynamic environments. IEEE Access, 7:166528-166539.
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I. D., Roth, S., Schindler, K., and Leal-Taix'e, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. ArXiv, abs/2003.09003.
Endres, F., Hess, J., Sturm, J., Cremers, D., and Burgard, W. (2014). 3-d mapping with an rgb-d camera. IEEE Transactions on Robotics, 30:177 - 187.
Engel, J., Schöps, T., and Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II, pages 834-849.
Galvez-López, D. and Tardos, J. D. (2012). Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5):1188-1197.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440-1448.
He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969.
Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard, W. (2011). g2o: A general framework for graph optimization. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).
Labbé, M. and Michaud, F. (2019). Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics, 36(2):416-446.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer.
Liu, H., Liu, G., Tian, G., Xin, S., and Ji, Z. (2019). Visual slam based on dynamic object removal. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 596-601.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. (2016). Ssd: single shot multibox detector. In Proceedings of the European conference on computer vision, pages 21-37.
Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., and Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. ArXiv, abs/1603.00831.
Mur-Artal, R., Montiel, J. M. M., and Tardós, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147-1163.
Mur-Artal, R. and Tardós, J. D. (2017). Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255-1262.
Redmon, J. (2016). Darknet: Open source neural networks in c.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779-788.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91-99.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564-2571.
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573-580.
Xiao, L., Wang, J., Qiu, X., Rong, Z., and Zou, X. (2019). Dynamic-slam: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robotics and Autonomous Systems, 117:1-16.
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., and Fei, Q. (2018). Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168-1174. IEEE.
Zhong, F., Wang, S., Zhang, Z., Chen, C., and Wang, Y. (2018). Detect-slam: Making object detection and slam mutually beneficial. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1001-1010.
Cui, L. and Ma, C. (2019). Sof-slam: A semantic visual slam for dynamic environments. IEEE Access, 7:166528-166539.
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I. D., Roth, S., Schindler, K., and Leal-Taix'e, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. ArXiv, abs/2003.09003.
Endres, F., Hess, J., Sturm, J., Cremers, D., and Burgard, W. (2014). 3-d mapping with an rgb-d camera. IEEE Transactions on Robotics, 30:177 - 187.
Engel, J., Schöps, T., and Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II, pages 834-849.
Galvez-López, D. and Tardos, J. D. (2012). Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5):1188-1197.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440-1448.
He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969.
Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard, W. (2011). g2o: A general framework for graph optimization. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA).
Labbé, M. and Michaud, F. (2019). Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics, 36(2):416-446.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer.
Liu, H., Liu, G., Tian, G., Xin, S., and Ji, Z. (2019). Visual slam based on dynamic object removal. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 596-601.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. (2016). Ssd: single shot multibox detector. In Proceedings of the European conference on computer vision, pages 21-37.
Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., and Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. ArXiv, abs/1603.00831.
Mur-Artal, R., Montiel, J. M. M., and Tardós, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147-1163.
Mur-Artal, R. and Tardós, J. D. (2017). Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255-1262.
Redmon, J. (2016). Darknet: Open source neural networks in c.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779-788.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91-99.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564-2571.
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573-580.
Xiao, L., Wang, J., Qiu, X., Rong, Z., and Zou, X. (2019). Dynamic-slam: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robotics and Autonomous Systems, 117:1-16.
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., and Fei, Q. (2018). Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168-1174. IEEE.
Zhong, F., Wang, S., Zhang, Z., Chen, C., and Wang, Y. (2018). Detect-slam: Making object detection and slam mutually beneficial. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1001-1010.
Published
2022-10-18
How to Cite
SOARES, João Carlos Virgolino; GATTASS, Marcelo; MEGGIOLARO, Marco Antonio.
Mapeamento e Localização Simultâneos em Ambientes Dinâmicos usando Detecção de Pessoas. In: GRADUATE WORKS CONTEST IN ROBOTICS - CTDR (PHD) - BRAZILIAN SYMPOSIUM OF ROBOTICS & LATIN AMERICAN ROBOTICS SYMPOSIUM (SBR/LARS), 14. , 2022, São Bernardo do Campo/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 109-120.
DOI: https://doi.org/10.5753/wtdr_ctdr.2022.227367.
