Unified Multi-Modal Multi-Agent Cooperative Perception Framework for Intelligent Transportation Systems
2024-01-7028
12/13/2024
- Features
- Event
- Content
- Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructure to address the occlusion and sensing range limitation issues. To date, existing research is mainly focused on prototyping cooperative perception using only one type of sensor such as LiDAR and camera. In such cases, the performance of cooperative perception is constrained by individual sensor limitations. To exploit the multi-modality of sensors to further improve distant object detection accuracy, in this paper, we propose a unified multi-modal multi-agent cooperative perception framework that integrates camera and LiDAR data to enhance perception performance in intelligent transportation systems. By leveraging the complementary strengths of LiDAR and camera sensors, our framework utilizes the geometry information from LiDAR and the semantic information from cameras to achieve an accurate cooperative perception system. In order to fuse the multi-agent and multi-modal features, we use a bird’s-eye view (BEV) space as the consistent and unified feature representations and employ a transformer-based network for effective multi-agent multi-modal BEV feature fusion. We validate our method on the OPV2V and V2XSim benchmarks, achieving state-of-the-art performance in 3D cooperative perception tasks. The proposed framework significantly improves object detection accuracy and robustness, especially in complex traffic scenarios with occlusions such as dense intersections.
- Pages
- 8
- Citation
- Meng, Z., Xia, X., Zheng, Z., Gao, L. et al., "Unified Multi-Modal Multi-Agent Cooperative Perception Framework for Intelligent Transportation Systems," SAE Technical Paper 2024-01-7028, 2024, https://doi.org/10.4271/2024-01-7028.