Deep 4D Automotive Radar-Camera Fusion Odometry with Cross-Modal Transformer Fusion

Features
Event
SAE 2023 Intelligent and Connected Vehicles Symposium
Authors Abstract
Content
Many learning-based methods estimate ego-motion using visual sensors. However, visual sensors are prone to intense lighting variations and textureless scenarios. 4D radar, an emerging automotive sensor, complements visual sensors effectively due to its robustness in adverse weather and lighting conditions. This paper presents an end-to-end 4D radar-visual odometry (4DRVO) approach that combines sparse point cloud data from 4D radar with image information from cameras. Using the Feature Pyramid, Pose Warping, and Cost Volume (PWC) network architecture, we extract 4D radar point features and image features at multiple scales. We then employ a hierarchical iterative refinement approach to supervise the estimated pose. We propose a novel Cross-Modal Transformer (CMT) module to effectively fuse the 4D radar point modality, image modality, and 4D radar point-image connection modality at multiple scales, achieving cross-modal feature interaction and multi-modal feature fusion. Additionally, we designed a point confidence estimation module to mitigate the impact of dynamic objects on odometry estimation. Extensive experiments were conducted on the View-of-Delft (VoD) dataset, showcasing the remarkable performance and effectiveness of the proposed 4D radar-visual odometry method.
Meta TagsDetails
DOI
https://doi.org/10.4271/2023-01-7040
Pages
10
Citation
Lu, S., Zhuo, G., Xiong, L., Zhou, M. et al., "Deep 4D Automotive Radar-Camera Fusion Odometry with Cross-Modal Transformer Fusion," SAE Int. J. Adv. & Curr. Prac. in Mobility 6(5):2649-2658, 2024, https://doi.org/10.4271/2023-01-7040.
Additional Details
Publisher
Published
Dec 20, 2023
Product Code
2023-01-7040
Content Type
Journal Article
Language
English