This content is not included in your SAE MOBILUS subscription, or you are not logged in.
Prediction of Human Actions in Assembly Process by a Spatial-Temporal End-to-End Learning Model
ISSN: 0148-7191, e-ISSN: 2688-3627
Published April 02, 2019 by SAE International in United States
Annotation ability available
It’s important to predict human actions in the industry assembly process. Foreseeing future actions before they happened is an essential part for flexible human-robot collaboration and crucial to safety issues. Vision-based human action prediction from videos provides intuitive and adequate knowledge for many complex applications. This problem can be interpreted as deducing the next action of people from a short video clip. The history information needs to be considered to learn these relations among time steps for predicting the future steps. However, it is difficult to extract the history information and use it to infer the future situation with traditional methods. In this scenario, a model is needed to handle the spatial and temporal details stored in the past human motions and construct the future action based on limited accessible human demonstrations. In this paper, we apply an autoencoder-based deep learning framework for human action construction, merging into the RNN pipeline for human action prediction. This contrasts with traditional approaches which use hand-crafted features and different domain outputs. We implement the proposed framework on a model vehicle seat assembly task. Our experiment results indicate that the proposed model is effective in capturing the historical details that are necessary for future human action prediction. In addition, the proposed model synthesizes the prior information from human demonstrations and generates the corresponding future action by those spatial-temporal features successfully.
|Technical Paper||Automotive Manufacturing Task Analysis: An Integrated Approach|
|Technical Paper||Collaborative Product Creation Driving the MOST Cooperation|
CitationZhang, Z., Zhang, Z., Wang, W., Chen, Y. et al., "Prediction of Human Actions in Assembly Process by a Spatial-Temporal End-to-End Learning Model," SAE Technical Paper 2019-01-0509, 2019, https://doi.org/10.4271/2019-01-0509.
- Albukhary, N. and Mustafah, Y.M., “Real-Time Human Activity Recognition,” IOP Conf. Ser. Mater. Sci. Eng. 260:012017, 2017.
- Wang, W., Li, R., Diekel, Z.M., and Jia, Y., “Robot Action Planning by Online Optimization in Human-Robot Collaborative Tasks,” Int. J. Intell. Robot. Appl. 1-19, 2018.
- Wang, W., Li, R., Diekel, Z., Chen, Y. et al., “Controlling Object Hand-Over in Human-Robot Collaboration via Natural Wearable Sensing,” IEEE Trans. Human-Machine Syst. 1-12, 2018.
- Wang W., Li R., Chen Y., and Jia Y., “Human Intention Prediction in Human-Robot Collaborative Tasks,” in Proc. 2018 ACM/IEEE Int. Conf. Human-Robot Interact., pp. 279-280, 2018.
- Wang, W., Li, R., Diekel, Z.M., and Jia, Y., “Hands-Free Maneuvers of Robotic Vehicles via Human Intentions Understanding using Wearable Sensing,” J. Robot. 1-10, 2018.
- Wang, W., Li, R., Chen, Y., Diekel, Z. et al., “Facilitating Human-Robot Collaborative Tasks by Teaching-Learning-Collaboration from Human Demonstrations,” IEEE Trans. Autom. Sci. Eng. 1-12, 2018.
- Finn, C., Goodfellow, I., and Levine, S., “Unsupervised Learning for Physical Interaction through Video Prediction,” Neural Inf. Process. Syst. (Nips):64-72, 2016.
- He, Y., Shirakabe, S., Satoh, Y., and Kataoka, H., Human Action Recognition without Human. Vol. 9915 (Aug. 2016), 11-17.
- Kataoka H., Miyashita Y., Hayashi M., Iwata K. et al., “Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature,” in Procedings Br. Mach. Vis. Conf. 2016, 2016, 12.1-12.12.
- Liu, H. and Wang, L., “Human Motion Prediction for Human-Robot Collaboration,” J. Manuf. Syst. 44(October):287-294, 2017.
- Aghapour, E. and Farrell, J.A., “Human Action Prediction for Human Robot Interaction,” 2016 Am. Control Conf. 5407-5412, 2016.
- Xu, Z., Qing, L., and Miao, J., “Activity Auto-Completion: Predicting Human Activities from Partial Videos Key Lab of Intelligent Information Processing of Chinese Academy of Sciences ( CAS ),” Int. Conf. Comput. Vis. 3191-3199, 2015.
- Hawkins, K.P., Vo, N., Bansal, S., and Bobick, A.F., “Probabilistic Human Action Prediction and Wait-Sensitive Planning for Responsive Human-Robot Collaboration,” IEEE-RAS Int. Conf. Humanoid Robot. 2015-February(February):499-506, 2015.
- Oh, J., Guo, X., Lee, H., Lewis, R., and Singh, S., Action-Conditional Video Prediction Using Deep Networks in Atari Games (2015), 1-9.
- Mathieu, M., Couprie, C., and LeCun, Y., “Deep Multi-Scale Video Prediction Beyond Mean Square Error,” 2015:1-14, 2015.
- Soomro K., Zamir A.R., and Shah M., “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild,” November, 2012.
- Liang, X., Lee, L., Dai, W., and Xing, E.P., “Dual Motion GAN for Future-Flow Embedded Video Prediction,” Proc. IEEE Int. Conf. Comput. Vis. 2017-October:1762-1770, 2017.
- Lotter, W., Kreiman, G., and Cox, D., Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning (2016), 1-18.
- Shi, X., Chen, Z., Wang, H., Yeung, D.-Y. et al., Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting (2015), 1-12.
- Bengio, Y., Simard, P., and Frasconi, P., “Learning Long Term Dependencies with Gradient Descent is Difficult,” IEEE Trans. Neural Networks 5(2):157-166, 1994.
- Karpathy A., Toderici G., Shetty S., Leung T. et al., “Large-Scale Video Classification with Convolutional Neural Networks,” Comput. Vis. Pattern Recognit. (CVPR), 2014 IEEE Conf., 2014, 1725-1732.
- Goyal, R. et al., “The ‘Something Something’ Video Database for Learning and Evaluating Visual Common Sense,” Proc. IEEE Int. Conf. Comput. Vis. 2017-October:5843-5851, 2017.
- Wuest, T., Weimer, D., Irgens, C., and Thoben, K.-D., “Machine Learning in Manufacturing: Advantages, Challenges, and Applications,” Prod. Manuf. Res. 4(1):23-45, 2016.