This content is not included in your SAE MOBILUS subscription, or you are not logged in.

Multi-task Learning of Semantics, Geometry and Motion for Vision-based End-to-End Self-Driving

Journal Article
2021-01-0194
ISSN: 2641-9645, e-ISSN: 2641-9645
Published April 06, 2021 by SAE International in United States
Multi-task Learning of Semantics, Geometry and Motion for Vision-based End-to-End Self-Driving
Sector:
Citation: Ni, H., Wu, J., Zhang, D., Wang, G. et al., "Multi-task Learning of Semantics, Geometry and Motion for Vision-based End-to-End Self-Driving," SAE Int. J. Adv. & Curr. Prac. in Mobility 3(4):1945-1954, 2021, https://doi.org/10.4271/2021-01-0194.
Language: English

Abstract:

It’s hard to achieve complete self-driving using hand-crafting generalized decision-making rules, while the end-to-end self-driving system is low in complexity, does not require hand-crafting rules, and can deal with complex situations. Modular-based self-driving systems require multi-task fusion and high-precision maps, resulting in high system complexity and increased costs. In end-to-end self-driving, we usually only use camera to obtain scene status information, so image processing is very important. Numerous deep learning applications benefit from multi-task learning, as the multi-task learning can accelerate model training and improve accuracy with combine all tasks into one model, which reduces the amount of calculation and allows these systems to run in real-time. Therefore, the approach of obtaining rich scene state information based on multi-task learning is very attractive. In this paper, we propose an approach to multi-task learning for semantics, geometry and motion. The multi-task learning includes four tasks: semantics segmentation, instance segmentation, depth regression, optical flow estimation. Optical flow method is currently an important method of moving image analysis and it not only contains the motion information of the observed object, but also contains rich information about the three-dimensional structure of the scene. Through the above work we can get compressed information about semantics, distance estimation and action recognition. As we all know, self-driving based on deep learning requires a lot of data to train neural network models. However, the basic attributes of the end-to-end system determine that it cannot be trained and learning in real world, such as collision testing and traffic accidents during reinforcement learning. Therefore, integrating existing autonomous driving data sets for model training and generalizing them to virtual environments for application is very critical. Finally, we use the virtual scene constructed by CARLA to train and evaluate the end-to-end self-driving system.