Long-Term Temporal Hierarchical Fusion Bird’s-Eye View Perception Method Based on Multiple Position Encodings
2025-01-7307
12/31/2025
- Content
- With the rapid development of autonomous driving technology, environmental perception, as its core module, has attracted much attention. Among them, the pure visual bird's-eye-view (BEV) 3D detection scheme has become a research hotspot due to its high spatial resolution and excellent semantic recognition ability in specific scenarios. Existing methods mainly utilize the Transformer encoder structure to perform position encoding in the BEV domain to achieve 3D perspective transformation, but they often fail to fully exploit the potential value of multi-perspective image information. To address this challenge, this paper proposes an improved Transformer-based visual BEV vehicle perception method that enhances perception performance by deeply fusing BEV domain and image domain information: an innovative multi-perspective position encoding mechanism is designed, which decouples camera parameters to more efficiently learn the mapping from images to 3D space; at the same time, a cyclic interaction attention mechanism is introduced to enhance the fine-grained association and fusion ability of pixel-level features, effectively improving the discriminability of features. In addition, to deal with challenges such as target occlusion in dynamic scenes, this method further proposes a long-term temporal perception framework that fuses multi-frame temporal information and designs a cross-time guidance module, significantly improving the robustness of target localization by injecting historical geometric constraints. Experiments on the nuScenes dataset verify the effectiveness of this method, and the results show that it achieves excellent performance in both spatial perception accuracy and temporal modeling capability, providing an innovative and practical solution for autonomous driving environmental perception.
- Pages
- 14
- Citation
- Chen, Pengyu, Xiaoxu Wei, and Zhenwei Chen, "Long-Term Temporal Hierarchical Fusion Bird’s-Eye View Perception Method Based on Multiple Position Encodings," SAE Technical Paper 2025-01-7307, 2025-, .