Cost-Efficient Visual Perception for Autonomous Vehicles: Leveraging Attention-Based Sensor Fusion to Maintain Performance
Resumo
Autonomous vehicles rely on sophisticated perception systems to ensure safe navigation and decision-making. However, state-of-the-art sensor fusion models often demand extensive computational resources, hindering their deployment on cost-effective hardware. In this work, we address this challenge by modifying the BEVFusion framework to significantly reduce computational costs while maintaining high performance in 3D object detection and segmentation. Specifically, we replace the resource-intensive SwinTransformer backbone with the more efficient ResNet50 and integrate attention-based sensor fusion—leveraging channel and spatial attention mechanisms—to dynamically focus on the most relevant features. This approach reduces VRAM usage from 80 GB to approximately 20 GB, cuts training time from 20 d to 6 d, and boosts inference speed by up to 17.3% on lower-power GPUs. Experimental results on the nuScenes dataset demonstrate a 0.732% improvement in mean Average Precision (mAP) for 3D object detection, along with a 14.12% increase in mean Intersection over Union (mIoU) for semantic segmentation compared to the original BEVFusion model. These improvements underscore the feasibility of deploying advanced visual perception systems on more accessible hardware for real-world autonomous driving and mobile robotics applications.
Publicado
29/09/2025
Como Citar
HONORATO, Eduardo Sperle; BONATO, Vanderlei; WOLF, Denis Fernando.
Cost-Efficient Visual Perception for Autonomous Vehicles: Leveraging Attention-Based Sensor Fusion to Maintain Performance. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 285-299.
ISSN 2643-6264.
