On Collecting High Quality Labeled Data for Automatic Transportation Mode Detection
To be published on April 2, 2019 by SAE International in United States
With the recent advancements in sensing and processing capabilities of consumer mobile devices (e.g., smartphone, tablet, etc.), they are becoming attractive choices for pervasive computing applications. Always-on monitoring of human movement patterns is one of those applications that has gained a lot of importance in the field of mobility and transportation research. Automatic detection of the current transportation mode (e.g., walking, biking, riding a shuttle, etc.) of a consumer using data from their smartphone sensors enables delivering of a number of customized services for multi-modal journey planning. Most accurate models for automatic mode detection are trained with supervised learning algorithms. In order to achieve high accuracy, the training datasets need to be sufficiently large, diverse, and correctly labeled. Specifically, the training data requires each type of mode data to be collected for a minimum duration that is necessary and sufficient for building high accuracy models. Collecting such data in an efficient manner is challenging because of the variability in the test subjects’ multi-modal journey patterns, e.g., using mostly private vehicles for commute, not sufficiently using rideshares, etc. In this paper, we describe a Design of Experiment (DoE) to efficiently collect supervised training dataset from user smartphones in a controlled environment. In this DoE, we asked the subjects to use a data logger app during a multi-modal trip designed around the Ford Dearborn campus with the right trip characteristics. The app persistently logged GPS and motion sensor data in the background and sent it to a remote Hadoop server. Location data was used to detect movement and enhance the quality of the data collection in real-time, e.g., the app paused data logging when the user is detected to be waiting between two transit modes.