Evaluating Simulation Platforms for Visual Affordance Understanding in Computer Vision
Resumo
Affordance understanding in computer vision goes beyond object recognition — it involves interpreting scenes in terms of potential agent-object interactions. This work examines the role of simulated environments in supporting such reasoning in visual scenes. Three simulation tools (Gymnasium, SUMO, and CARLA) are evaluated regarding their suitability for affordance research, with special attention to video-based data generation, multimodal perception, and interaction modeling. Within the Gymnasium library, selected environments are compared based on criteria such as element countability, scene diversity, and logging capabilities. The analysis identifies key limitations in existing tools and underscores the need for scalable, userconfigurable platforms designed to support perception, learning, and generalization in affordance-centric applications.Referências
J. J. Gibson, The Ecological Approach to Visual Perception. Psychology Press, 1979, ch. The Theory of Affordances, pp. 119–135, accessed: Nov. 5, 2024. [Online]. Available: [link]
OpenAI, “CLIP: Connecting Text and Images,” 2021, accessed: Oct. 29, 2024. [Online]. Available: [link]
MetaAI, “Segment Anything Model (SAM): a new AI model from Meta AI that can ”cut out” any object, in any image, with a single click,” 2023, accessed: Nov. 5, 2024. [Online]. Available: [link]
K. Grauman et al., “Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,” 2024, accessed: Jun. 30, 2025. [Online]. Available: [link]
D. Damen et al., “Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100,” International Journal of Computer Vision (IJCV), vol. 130, p. 33–55, 2022, accessed: Jun. 30, 2025. [Online]. DOI: 10.1007/s11263-021-01531-2
A. Diba et. al, “Large Scale Holistic Video Understanding,” in European Conference on Computer Vision. Springer, 2020, pp. 593–610, accessed: Jun. 30, 2025.
Farama Foundation, “Gymnasium Documentation,” 2024, accessed: Apr. 27, 2025. [Online]. Available: [link]
A. Lopez et al., “Simulation of Urban Mobility (SUMO),” accessed: Jun. 30, 2025. [Online]. DOI: 10.5281/zenodo.15359663
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An Open Urban Driving Simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16, accessed: Jun. 30, 2025.
Google Brain Team, “Tensorboard graphs,” [link], 2024, accessed: Jul. 4, 2025.
OpenAI, “CLIP: Connecting Text and Images,” 2021, accessed: Oct. 29, 2024. [Online]. Available: [link]
MetaAI, “Segment Anything Model (SAM): a new AI model from Meta AI that can ”cut out” any object, in any image, with a single click,” 2023, accessed: Nov. 5, 2024. [Online]. Available: [link]
K. Grauman et al., “Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,” 2024, accessed: Jun. 30, 2025. [Online]. Available: [link]
D. Damen et al., “Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100,” International Journal of Computer Vision (IJCV), vol. 130, p. 33–55, 2022, accessed: Jun. 30, 2025. [Online]. DOI: 10.1007/s11263-021-01531-2
A. Diba et. al, “Large Scale Holistic Video Understanding,” in European Conference on Computer Vision. Springer, 2020, pp. 593–610, accessed: Jun. 30, 2025.
Farama Foundation, “Gymnasium Documentation,” 2024, accessed: Apr. 27, 2025. [Online]. Available: [link]
A. Lopez et al., “Simulation of Urban Mobility (SUMO),” accessed: Jun. 30, 2025. [Online]. DOI: 10.5281/zenodo.15359663
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An Open Urban Driving Simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16, accessed: Jun. 30, 2025.
Google Brain Team, “Tensorboard graphs,” [link], 2024, accessed: Jul. 4, 2025.
Publicado
30/09/2025
Como Citar
OLIVEIRA, Maria Gabriela Lustosa; COSTA, Paula Dornhofer Paro.
Evaluating Simulation Platforms for Visual Affordance Understanding in Computer Vision. In: WORKSHOP DE TRABALHOS DA GRADUAÇÃO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 38. , 2025, Salvador/BA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 263-266.
