This content is not included in
your SAE MOBILUS subscription, or you are not logged in.
A Sparse Spatiotemporal Transformer for Detecting Driver Distracted Behaviors
Technical Paper
2023-01-0835
ISSN: 0148-7191, e-ISSN: 2688-3627
Annotation ability available
Sector:
Language:
English
Abstract
At present, the development of autonomous driving technology is still immature, and there is still a long way until fully driverless vehicles. Therefore, the state of the driver is still an important factor affecting traffic safety, and it is of great significance to detect the driver’s distracted behavior. In the task of driver distracted behavior detection, some characteristics of driver behavior in the cockpit can be further utilized to improve the detection performance. Compared with general human behaviors, driving behaviors are confined to enclosed space and are far less diverse. With this in mind, we propose a sparse spatiotemporal transformer which extracts local spatiotemporal features by segmenting the video at the low level of the model, and filters out local key spatiotemporal information associated with larger attention values based on the attention map in the middle layer, so as to enhance the high-level global semantic features. Experiments are conducted on a public driver behavior detection dataset (Drive&Act), and the generalization ability of the proposal is evaluated with a dataset collected. Results show that the sparse spatiotemporal transformer devised in this study can obtain robust global semantic features via retaining key local spatiotemporal information while reducing the computational burden, and therefore achieves a high accuracy for the driver distracted behavior detection.
Authors
Citation
Wang, P., Yin, Z., Nie, L., and Zhai, X., "A Sparse Spatiotemporal Transformer for Detecting Driver Distracted Behaviors," SAE Technical Paper 2023-01-0835, 2023, https://doi.org/10.4271/2023-01-0835.Also In
References
- SAE Taxonomy and Definitions for Terms Related to Driving Automation Systems for on-Road Motor Vehicles SAE Standard J 3016 2016 2016
- Eriksson , A. and Stanton , N.A. Takeover Time in Highly Automated Vehicles: Noncritical Transitions to and from Manual Control Human Factors: the Journal of Human Factors and Ergonomics Society 59 2017 689 705
- Deo , N. and Trivedi , M.M. Looking at the Driver/Rider in Autonomous Vehicles to Predict Take-over Readiness IEEE Transactions on Intelligent Vehicles 5 2018 41 52
- Brooks , C.A. and Rakotonirainy , A. 2005
- Liu , D. , Yamasaki , T. , Wang , Y. , Mase , K. et al. Toward Extremely Lightweight Distracted Driver Recognition with Distillation-Based Neural Architecture Search and Knowledge Transfer IEEE Transactions on Intelligent Transportation Systems n. pag 2022
- Yang , C. , Liu , P. , Chen , G. , Liu , Z. , Wu , Y. , and Knoll , A. Event-based Driver Distraction Detection and Action Recognition 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)
- Wharton , Z. , Behera , A. , Liu , Y. , and Bessis , N. Coarse Temporal Attention Network (cta-net) for Driver's Activity Recognition Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2021
- Ren , H. , Guo , Y. , Bai , Z. , and Cheng , X. 2021
- Tan , M. , Gengqin Ni , X. , Liu , S.Z. , Xiangmiao , W. et al. Bidirectional Posture-Appearance Interaction Network for Driver Behavior Recognition IEEE Transactions on Intelligent Transportation Systems 2021
- Zhao , M. , Beurier , G. , Wang , H. , and Wang , X. In Vehicle Diver Postural Monitoring Using a Depth Camera Kinect SAE Technical Paper 2018-01-0505 2018 https://doi.org/10.4271/2018-01-0505
- Park , B.-K.D. , Jones , M. , Miller , C. , Hallman , J. et al. In-Vehicle Occupant Head Tracking Using aLow-Cost Depth Camera SAE Technical Paper 2018-01-1172 2018 https://doi.org/10.4271/2018-01-1172
- Simonyan , K. and Zisserman , A. Two-Stream Convolutional Networks for Action Recognition in Videos Advances in Neural Information Processing Systems 27 2014
- Wang , L. , Xiong , Y. , Zhe Wang , Y. , Qiao , D.L. et al. Temporal Segment Networks for Action Recognition in Videos IEEE Transactions on Pattern Analysis and Machine Intelligence 41 2017 2740 2755
- Ma , Y. , Yin , Z. , and Nie , L. Driver Distraction Detection with a Two-Stream Convolutional Neural Network SAE Technical Paper 2020-01-1039 2020 https://doi.org/10.4271/2020-01-1039
- Tran , D. , Bourdev , L. , Fergus , R. , Torresani , L. , and Paluri , M. Learning Spatiotemporal Features with 3d Convolutional Networks Proceedings of the IEEE international conference on computer vision 2015
- Qiu , Z. , Yao , T. , and Mei , T. Learning Spatiotemporal Representation with Pseudo-3d Residual Networks Proceedings of the IEEE International Conference on Computer Vision 2017
- Feichtenhofer , C. X3d: Expanding architectures for efficient video recognition Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020
- Carreira , Joao , and Andrew Zisserman Quo vadis, action recognition? a new model and the kinetics dataset proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017
- Vaswani , A. , Shazeer , N. , Parmar , N. , Uszkoreit , J. et al. Attention Is all you Need Advances in Neural Information Processing Systems 30 2017
- Bertasius , G. , Wang , H. , and Torresani , L. Is Space-Time Attention all you Need for Video Understanding? In ICML 2 3 2021 4
- Girdhar , Rohit , João Carreira , Carl Doersch and Andrew Zisserman Video Action Transformer Network 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Martin , Manuel , Alina Roitberg , Monica Haurilet , Matthias Horne , Simon Reiß , Michael Voit and Rainer Stiefelhagen Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- Devlin , J. , Chang , M.-W. , Lee , K. and Toutanova , K. ArXiv
- Yuan , L. , Chen , Y. , Wang , T. , Yu , W. , et al. Tokens-to-token vit: Training Vision Transformers from Scratch on Imagenet Proceedings of the IEEE/CVF International Conference on Computer Vision 2021
- He , K. , Zhang , X. , Ren , S. , and Sun , J. Deep Residual Learning Image Recognition 7 2015
- Zhou , H. , Zhang , S. , Peng , J. , Zhang , S. et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting Proceedings of the AAAI Conference on Artificial Intelligence 35 12 2021 11106 11115
- Liu , D. , Yamasaki , T. , Wang , Y. , Mase , K. et al. TML: A Triple-Wise Multi-Task Learning Framework for Distracted Driver Recognition IEEE Access 9 2021 125955 125969
- Yang , C. , Liu , P. , Chen , G. , Liu , Z. , et al. Event-based Driver Distraction Detection and Action Recognition 2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)
- Deo , N. and Trivedi , M.M. Looking at the Driver/Rider in Autonomous Vehicles to Predict Take-over Readiness IEEE Transactions on Intelligent Vehicles 5 2018 41 52