MFFormer3D: A Multi-frame Fusion Transformer Utilizing Object Queries for Enhanced 3D Detection
2024-01-7043
12/13/2024
- Features
- Event
- Content
- 3D object detection based on the point cloud is a critical technology for environmental perception in autonomous driving systems. However, it has long faced dual challenges of accuracy and efficiency due to the sparsity and irregularity of point cloud data and the complicated driving scenarios. This paper proposes a novel multi-frame fusion 3D object detection algorithm, MFFormer3D, embedding object queries. The method first removes ground points from the raw point cloud to reduce subsequent computational load. It then employs Spatially Grouped Sparse Convolution Feature Extraction Layer (SGSC-FEL) to optimize the backbone network for feature extraction. Subsequently, we introduce a Cross-frame Object Query Filter (COQF) mechanism for efficient multi-frame fusion and a Motion Correction Module (MCM) to compensate for ego-motion effects. Experiments on the nuScenes dataset demonstrate that MFFormer3D outperforms the baseline method by 2.25% in NDS and 7.73% in mAP while maintaining an inference speed of 11.1 FPS. Ablation studies further validate the effectiveness of each module.
- Pages
- 8
- Citation
- Cheng, S., Zhang, Y., Gao, D., Li, J. et al., "MFFormer3D: A Multi-frame Fusion Transformer Utilizing Object Queries for Enhanced 3D Detection," SAE Technical Paper 2024-01-7043, 2024, https://doi.org/10.4271/2024-01-7043.