LDFA: Lightweight Dynamic Feature Aggregation for Multi-Modal Fusion

2024-01-7008

11/15/2024

Features
Event
SAE 2024 Intelligent Urban Air Mobility Symposium
Authors Abstract
Content
Integrating 3D point cloud and image fusion into flying car detection systems is essential for enhancing both safety and operational efficiency. Accurate environmental mapping and obstacle detection enable flying cars to optimize flight paths, mitigate collision risks, and perform effectively in diverse and challenging conditions. The AutoAlignV2 paradigm recently introduced a learnable schema that unifies these data formats for 3D object detection. However, the computational expense of the dynamic attention alignment mechanism poses a significant challenge. To address this, we propose a Lightweight Cross-modal Feature Dynamic Aggregation Module, which utilizes a model-driven feature alignment strategy. This module dynamically realigns heterogeneous features and selectively emphasizes salient aspects within both point cloud and image datasets, enhancing the differentiation between objects and the background and improving detection accuracy. Additionally, we introduce the Lightweight Spatial-Reduction Attention (LSRA) layer to enhance the original attention mechanism. By employing spatial reduction and positional offset techniques, LSRA reduces computational complexity, accelerating the aggregation of cross-modal features while minimizing computational overhead. Furthermore, we implement a novel dropout scheme before extracting features from 2D images, enhancing the model's generalization capabilities and reducing computational costs. We present a new lightweight framework—Lightweight Dynamic Feature Aggregation for Multi-modal Fusion (LDFA)—designed specifically for the harmonious fusion of 3D point cloud data and 2D image-derived information. The LDFA framework achieves a meticulous balance between computational efficiency and enhanced perceptual capabilities. Extensive experimental evaluations on the nuScenes benchmark dataset confirm the efficacy and efficiency of the LDFA fusion strategy, demonstrating its potential to redefine the state-of-the-art in multimodal 3D object detection. Code will be available at https://github.com/zishenjiucai/LDFA.
Meta TagsDetails
DOI
https://doi.org/10.4271/2024-01-7008
Pages
10
Citation
Feng, X., Zhang, R., Chu, Z., Wei, L. et al., "LDFA: Lightweight Dynamic Feature Aggregation for Multi-Modal Fusion," SAE Technical Paper 2024-01-7008, 2024, https://doi.org/10.4271/2024-01-7008.
Additional Details
Publisher
Published
Nov 15
Product Code
2024-01-7008
Content Type
Technical Paper
Language
English