This content is not included in
your SAE MOBILUS subscription, or you are not logged in.
Challenges and Approaches in Connected Vehicles Data Wrangling
Technical Paper
2017-01-0069
ISSN: 0148-7191, e-ISSN: 2688-3627
This content contains downloadable datasets
Annotation ability available
Sector:
Language:
English
Abstract
This manuscript compares window-based data imputation approaches for data coming from connected vehicles during actual driving scenarios and obtained using on-board data acquisition devices. Three distinct window-based approaches were used for cleansing and imputing the missing values in different CAN-bus (Controller Area Network) signals. Lengths of windows used for data imputation for the three approaches were: 1) entire time-course for each vehicle ID, 2) day, and 3) trip (defined as duration between vehicle's ignition statuses ON to OFF). An algorithm for identification of ignition ON and OFF events is also presented, since this signal was not explicitly captured during the data acquisition phase. As a case study, these imputation techniques were applied to the data from a driver behavior classification experiment. Forty four connected vehicles were used to provide data on various signals viz., engine speed, vehicle speed, engine torque, brake, clutch, acceleration pedal, and gear. Distribution plots for all variables showed similar difference when 3 methods were compared. Mainly, the shapes of the histograms were the same for all methods. However, dataset size was around 37% more for both the vehicle ID-wise and day-wise imputed dataset compared to the trip-wise imputation approach. K-Means clustering did not show significant differences between vehicle ID-wise and day-wise imputed datasets, but around 16% vehicles were assigned to different clusters when trip-wise imputed data was used. Trip-window was perceived to be a superior window compared to the other two sizes since it provides a means to remove noisy records from the connected vehicle data, thus increasing the robustness of any analytical model built on top of it according to garbage-in-garbage-out rule. Given the scale of the data, big data tools, like Hive and Spark are used on Hadoop platform to process and impute the data set.
Recommended Content
Authors
Topic
Citation
Raman, V., Narsude, M., and Padmanaban, D., "Challenges and Approaches in Connected Vehicles Data Wrangling," SAE Technical Paper 2017-01-0069, 2017, https://doi.org/10.4271/2017-01-0069.Data Sets - Support Documents
Title | Description | Download |
---|---|---|
Unnamed Dataset 1 | ||
Unnamed Dataset 2 | ||
Unnamed Dataset 3 |
Also In
References
- Storagenewsletter.com http://www.automotiveitnews.org/articles/1125256/future-connected-car-to-send-25gb-to-cloud-every-h/ 04 Oct 2016
- Leen , G. , and Heffernan , D. Expanding automotive electronic systems Computer 2002 35 1 88 93
- Varghese , J.Z. , and Boone , R.G. Overview of Autonomous Vehicle Sensors and Systems,” Proceedings of the 2015 International Conference on Operations Excellence and Service Engineering
- Horton , N. J. Kleinman K. P. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models The American Statistician 61 1 79 2007
- Ssali , G . Marwala T . Estimation of missing data using computational intelligence and decision trees Proceedings of IEEE International Joint Conference On Neural Networks Hong Kong
- MacQueen , J. B. 1967 Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability University of California Press 281 297 MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07
- Fogarty D. J. Multiple imputation as a missing data approach to reject inference on consumer credit scoring http://interstat.statiournals.net/YEAR/2006/articles/0609001.pdf
- Carpenter JR , Kenward MG . Multiple imputation and its application/James R. Carpenter and Michael G. Kenward 1st Chichester Wiley 2013
- Nelwamondo , F. V. Mohamed , S . Marwala T . Missing data: a of neural network and expectation maximisation techniques Current Science 93 11 1514 1521 2007
- Dempster , A.P .; Laird , N.M .; Rubin , D.B . 1977 Maximum Likelihood from Incomplete Data via the EM Algorithm Journal of the RoyalStatisticalSociety, Series B. 39 1 1 38
- Yuan , K. H. Bentler P. M. Three likelihood-based methods for mean and covariance structure analysis with non-normal missing data Sociological Methodology 165 200 2000
- Betechuoh , B. Leke Marwala , T . Tettey T . Autoencoder networks for HIV classification Current Science 91 11 1467 1473 2006
- Enders , C.K . 2010 Applied missing data analysis. New York Guilford Press
- Carpenter JR , Kenward MG . Multiple imputation and its application/James R. Carpenter and Michael G. Kenward. 1st Chichester Wiley 2013
- Wood AM , White IR , Thompson SG . Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals Clin Trials 2004 1 368 76
- Bell ML , Fiero M , Horton NJ , et al. Handling missing data in RCTs; a review of the top medical journals BMC Med Res Methodol 2014 14 118