Multi-object triangulation and 3D footprint tracking for multi-camera systems
Resumo
This paper presents a method for 3D footprint triangulation and tracking of multiple objects from different classes using a multi-camera system with overlapping views. The proposed method is designed to operate in complex environments such as Intelligent Spaces, where localization of heterogeneous entities (e.g., humans, robots, and furniture) is essential. Our approach integrates an object detection model (YOLO neural network) and a tracking algorithm (BoT-SORT). To establish cross-view correspondences, we exploit epipolar geometry by computing the normalized cross-distance between detection pairs and introducing a temporal consistency mechanism that favors previously matched pairs. Then a greedy matching algorithm, with filtering heuristics, provides robust association across camera views. From the valid matches, a graph is constructed, and connected components are triangulated to estimate the 3D footprint positions of each object. Our system requires only camera calibration and supports a flexible number of views, making it suitable for deployment in real-world multi-camera setups. The method is first evaluated by comparing reconstruction results against ArUco marker position estimation. Then a set of experiments demonstrates the effectiveness of the algorithm in tracking multiple entities of the same class, as well as different object classes. Due to its modular and generalizable structure, the framework supports any detector and tracker combination, while the proposed method is applicable to a wide range of scenarios.Referências
J.-H. Lee and H. Hashimoto, “Intelligent space - concept and contents,” Advanced Robotics, vol. 16, no. 3, pp. 265–280, 2002.
A. S. Olagoke, H. Ibrahim, and S. S. Teoh, “Literature survey on multi-camera system and its application,” IEEE Access, vol. 8, pp. 172 892–172 922, 2020.
D. Almonfrey, A. P. do Carmo, F. M. de Queiroz, R. Picoreti, R. F. Vassallo, and E. O. T. Salles, “A flexible human detection service suitable for intelligent spaces based on a multi-camera network,” International Journal of Distributed Sensor Networks, vol. 14, no. 3, 2018. [Online]. DOI: 10.1177/1550147718763550
Y.-J. Lee, M.-W. Park, and I. Brilakis, “Entity matching across stereo cameras for tracking construction workers,” in Proceedings of the 33rd International Symposium on Automation and Robotics in Construction (ISARC), A. A. U. Sattineni, S. A. U. Azhar, and D. G. T. U. Castro, Eds. Auburn, USA: International Association for Automation and Robotics in Construction (IAARC), July 2016, pp. 669–677.
F. Yang, S. Odashima, S. Yamao, H. Fujimoto, S. Masui, and S. Jiang, “A unified multi-view multi-person tracking framework,” Computational Visual Media, vol. 10, no. 1, pp. 137–160, 2024. [Online]. Available: [link]
P. Kohl, A. Specker, A. Schumann, and J. Beyerer, “The mta dataset for multi-target multi-camera pedestrian tracking by weighted distance aggregation,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020.
G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: [link]
N. Aharon, R. Orfaig, and B.-Z. Bobrovsky, “Bot-sort: Robust associations multi-pedestrian tracking,” ArXiv, vol. abs/2206.14651, 2022. [Online]. Available: [link]
T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015.
S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, J. Li, X. Zhang, and J. Sun, “Objects365: A large-scale, high-quality dataset for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8425–8434.
A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, T. Duerig, and V. Ferrari, “The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale,” IJCV, 2020.
P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and tracking meet drones challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 03 1960.
H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
A. S. Olagoke, H. Ibrahim, and S. S. Teoh, “Literature survey on multi-camera system and its application,” IEEE Access, vol. 8, pp. 172 892–172 922, 2020.
D. Almonfrey, A. P. do Carmo, F. M. de Queiroz, R. Picoreti, R. F. Vassallo, and E. O. T. Salles, “A flexible human detection service suitable for intelligent spaces based on a multi-camera network,” International Journal of Distributed Sensor Networks, vol. 14, no. 3, 2018. [Online]. DOI: 10.1177/1550147718763550
Y.-J. Lee, M.-W. Park, and I. Brilakis, “Entity matching across stereo cameras for tracking construction workers,” in Proceedings of the 33rd International Symposium on Automation and Robotics in Construction (ISARC), A. A. U. Sattineni, S. A. U. Azhar, and D. G. T. U. Castro, Eds. Auburn, USA: International Association for Automation and Robotics in Construction (IAARC), July 2016, pp. 669–677.
F. Yang, S. Odashima, S. Yamao, H. Fujimoto, S. Masui, and S. Jiang, “A unified multi-view multi-person tracking framework,” Computational Visual Media, vol. 10, no. 1, pp. 137–160, 2024. [Online]. Available: [link]
P. Kohl, A. Specker, A. Schumann, and J. Beyerer, “The mta dataset for multi-target multi-camera pedestrian tracking by weighted distance aggregation,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020.
G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: [link]
N. Aharon, R. Orfaig, and B.-Z. Bobrovsky, “Bot-sort: Robust associations multi-pedestrian tracking,” ArXiv, vol. abs/2206.14651, 2022. [Online]. Available: [link]
T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015.
S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, J. Li, X. Zhang, and J. Sun, “Objects365: A large-scale, high-quality dataset for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8425–8434.
A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, T. Duerig, and V. Ferrari, “The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale,” IJCV, 2020.
P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and tracking meet drones challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 03 1960.
H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
Publicado
30/09/2025
Como Citar
ALTOÉ, Gabriel Donna; LUCAS, Miquelly Nicolini; COSMI FILHO, Luiz Carlos; VASSALLO, Raquel Frizera.
Multi-object triangulation and 3D footprint tracking for multi-camera systems. In: WORKSHOP DE TRABALHOS DA GRADUAÇÃO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 243-246.
