This content is not included in your SAE MOBILUS subscription, or you are not logged in.

3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame

Journal Article
07-11-01-0005
ISSN: 1946-4614, e-ISSN: 1946-4622
Published September 23, 2017 by SAE International in United States
3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame
Sector:
Citation: Zhong, Y., Wang, S., Xie, S., Cao, Z. et al., "3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame," SAE Int. J. Passeng. Cars – Electron. Electr. Syst. 11(1):48-56, 2018, https://doi.org/10.4271/07-11-01-0005.
Language: English

Abstract:

Real-time reconstruction of 3D environment attributed with semantic information is significant for a variety of applications, such as obstacle detection, traffic scene comprehension and autonomous navigation. The current approaches to achieve it are mainly using stereo vision, Structure from Motion (SfM) or mobile LiDAR sensors. Each of these approaches has its own limitation, stereo vision has high computational cost, SfM needs accurate calibration between a sequences of images, and the onboard LiDAR sensor can only provide sparse points without color information. This paper describes a novel method for traffic scene semantic segmentation by combining sparse LiDAR point cloud (e.g. from Velodyne scans), with monocular color image. The key novelty of the method is the semantic coupling of stereoscopic point cloud with color lattice from camera image labelled through a Convolutional Neural Network (CNN). The presented method comprises three main process: (I) perform semantic segmentation on color image from monocular camera by using CNN, (II) extract ideal surfaces and other structural information from point cloud, (III) improve the image segmentation with the extracts and label the point cloud with the image segments. The whole process is done in a single frame, and the output of the system is labelled point cloud which can be used in construction of semantic object convex and alignment between frames. We demonstrate the effectiveness of our system on the KITTI dataset providing sufficient camera and LiDAR data, and present qualitative and quantitative results indicating the improvements in segmentation comparing to methods merely using either image or LiDAR data.