MFFormer3D: A Multi-frame Fusion Transformer Utilizing Object Queries for Enhanced 3D Detection

2024-01-7043

12/13/2024

Features
Event
SAE 2024 Intelligent and Connected Vehicles Symposium
Authors Abstract
Content
3D object detection based on the point cloud is a critical technology for environmental perception in autonomous driving systems. However, it has long faced dual challenges of accuracy and efficiency due to the sparsity and irregularity of point cloud data and the complicated driving scenarios. This paper proposes a novel multi-frame fusion 3D object detection algorithm, MFFormer3D, embedding object queries. The method first removes ground points from the raw point cloud to reduce subsequent computational load. It then employs Spatially Grouped Sparse Convolution Feature Extraction Layer (SGSC-FEL) to optimize the backbone network for feature extraction. Subsequently, we introduce a Cross-frame Object Query Filter (COQF) mechanism for efficient multi-frame fusion and a Motion Correction Module (MCM) to compensate for ego-motion effects. Experiments on the nuScenes dataset demonstrate that MFFormer3D outperforms the baseline method by 2.25% in NDS and 7.73% in mAP while maintaining an inference speed of 11.1 FPS. Ablation studies further validate the effectiveness of each module.
Meta TagsDetails
DOI
https://doi.org/10.4271/2024-01-7043
Pages
8
Citation
Cheng, S., Zhang, Y., Gao, D., Li, J. et al., "MFFormer3D: A Multi-frame Fusion Transformer Utilizing Object Queries for Enhanced 3D Detection," SAE Technical Paper 2024-01-7043, 2024, https://doi.org/10.4271/2024-01-7043.
Additional Details
Publisher
Published
Dec 13
Product Code
2024-01-7043
Content Type
Technical Paper
Language
English