Analysis of Automatic Speech Recognition Failures in the Car



In this paper, an approach to analyze voice recognition data to understand how customers use voice recognition systems is explored. The analysis will help identify ASR failures and usability related issues that customers encounter while using the voice recognition system. This paper also examines the impact of these failures on the individual speech domains (media control, phone, navigation, etc.). Such information can be used to improve the current voice recognition system and direct the design of future systems. Infotainment system logs, audio recordings of the voice interactions, their transcriptions and CAN bus data were identified to be rich sources of data to analyze voice recognition usage. Infotainment logs help understand how the system interpreted or responded to customer commands and at what confidence level. The audio recordings of the voice interaction and their transcriptions provide information about what command is issued by the customer and if it adheres to the grammar of the voice recognition system. The system’s interpretation of the command from the logs can be compared to the actual command issued to detect if it is correctly recognized by the system. CAN bus data can help in determining if voice recognition failures occur due to noise sources in the car such as HVAC blower noise, engine noise, etc. These data sources can also be tied together to detect commands that are incorrectly recognized by the system. When the causes of failures by domain were studied, it was found that the navigation domain was most prone to errors. Natural language understanding and single command navigation would improve the success of the navigation domain. The media control and phone domains were significantly less error-prone. Errors that occurred were largely due to core speech recognition and a majority of those errors could be handled by examining the participant’s habits.
Apr 2, 2019
