Monocular 3D Perception and Lane-Aware Bird’s-Eye-View Mapping for Autonomous Driving
2026-01-0012
To be published on 04/07/2026
- Content
- Accurate perception of the surrounding environment is fundamental and essential to safe and reliable autonomous driving. This work presents an integrated vision-based framework that combines object detection, 3D spatial localization, and lane segmentation to construct a unified bird’s-eye-view (BEV) representation of the driving scene. The pipeline provides geometric information on object position and orientation by employing Omni3D to infer 3D bounding boxes of objects from monocular camera frames. Detections are subsequently projected onto a 2D BEV canvas, where object instances are represented with respect to the ground plane for enhanced interpretability. To complement the object-level perception, we utilized YOLOPv2 to perform lane segmentation, producing both lane masks and lane line masks in the image domain for future coordinate transformation. By adopting a pinhole camera model, the coordinate transformation of these masks from the perspective image plane into the BEV canvas can be performed. The fusion of 3D object detections and geometrically transformed lane representations yields a coherent and structured spatial map of the vehicle’s surroundings. This unified environment model enables explicit reasoning about drivable space and surrounding obstacles, facilitating its integration into downstream modules such as path planning and trajectory prediction. The framework demonstrates the feasibility of leveraging recent advances in monocular 3D perception and deep learning-based lane segmentation to construct a computationally efficient and semantically rich BEV representation, which is a potential core perception component in real-time autonomous driving systems.
- Citation
- Tan, Lin et al., "Monocular 3D Perception and Lane-Aware Bird’s-Eye-View Mapping for Autonomous Driving," SAE Technical Paper 2026-01-0012, 2026-, .