This content is not included in your SAE MOBILUS subscription, or you are not logged in.
Honda Next Generation Speech User Interface
ISSN: 0148-7191, e-ISSN: 2688-3627
Published April 20, 2009 by SAE International in United States
Annotation ability available
Honda, working closely with IBM for their model year 2009 introduction, will be releasing the next step in the evolution of speech user interfaces in vehicles. The new Honda vehicles will include the leading edge Free Form Command (FFC) technology developed by IBM as part of its IBM Embedded ViaVoice (EVV) product line.
At its core, Free Form Commands improve the overall end-user experience by creating a system that is easier to use out of the box by increasing the overall usability of the system. Free Form Commands employ an innovative approach which allows Statistical Language Model technology to be deployed within the constraints of an embedded computing environment.
This technology allows users to speak commands that are not part of a predefined fixed set of phrases, as is required in today’s vehicles, but instead speak command phrases that match the application functional area they are targeting. The FFC technology goes beyond just recognizing what phrase is spoken, in that the system also uses statistical models to determine the meaning of the phrase via semantic interpretation. For example, in today’s systems, the system designer chooses what commands can be spoken to change the radio station, such as “radio tune 97.9”. When a system is designed using Free Form Commands, the system designer specifies that the user should be able to change the radio station, then using statistical modeling techniques, models are developed that enable this task.
As a result, the end-user can now speak a variety of natural phrases to complete that task, such as “tune the radio to 97.9”, “I want to listen to 97.9”, “switch to 97.9”, “change to 97.9”, etc…. This leads to an easier to use system providing greater end-user satisfaction. The improvement in usability is a direct result of reducing the amount of information the user has to memorize. With the previous releases of the system, the user had to remember specific format and phrases of the commands, but with FFC technology, the user only needs to know the type of commands that the system will understand and speak a phrase that is natural to them in that context.
In this paper, we will discuss how Honda engaged IBM at the early stages to produce research level prototypes to set the direction for the technology, We will then describe Honda’s and IBM’s refinement of the technology to target a system that would be deployable in a vehicle, the testing and methodology used to validate the solution was valuable to the end-user, and an overview of the Free Form Command technology.
- Masashi Satomura - Honda R&D Co., Ltd
- Hisayuki Nagashima - Honda R&D Co., Ltd
- Keisuke Kondo - Honda R&D Co., Ltd
- Kenneth White - IBM Corporation
- Harvey Ruback - IBM Corporation
- Roberto Sicconi - IBM Corporation
- Mahesh Viswanathan - IBM Corporation
- John Eckhart - IBM Corporation
- Dan Badt - IBM Corporation
- Masashi Morita - IBM Corporation
CitationWhite, K., Ruback, H., Sicconi, R., Viswanathan, M. et al., "Honda Next Generation Speech User Interface," SAE Technical Paper 2009-01-0518, 2009, https://doi.org/10.4271/2009-01-0518.
- Bahl et al. 1993 Estimating hidden Markov model parameters so as to maximize speech recognition accuracy IEEE Transactions on Speech and Audio Processing 1 i1 77 83
- Chappelier J. Rajman M. Aragues R. Rozenknop A. 1999 Lattice parsing for speech recognition Proceedings of the Sixth confrence sur le Traitement Automatique du Langage Naturel (TALN’99) 95 104
- Chelba Ciprian Jelinek. Frederick 2000 Structured language modeling Computer Speech and Language 14 4 283 332
- Gruenstein A. Seneff S. “CONTEXT-SENSITIVE LANGUAGE MODELING FOR LARGE SETS OF PROPER NOUNS IN MULTIMODAL DIALOGUE SYSTEMS,” Spoken Language Technology Workshop, 2006 IEEE 2006 130 133
- Kiefer, Bernd Krieger, Hans-Ulrich Nederhof. Mark-Jan 2000 Efficient and robust parsing of word graphs Wahlster, W. Verbmobil: Foundations of Speech-to-Speech Translation 280 295 Springer Berlin
- Mangu, L. Padmanabhan, M. 2001 Error corrective mechanisms for speech recognition Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Manning, Christopher D. Schütze, Hinrich Foundations of statistical natural language processing MIT Press Cambridge, MA 1999
- Ratnaparkhi, A. Roukos, S. Ward, R.T. 1994 A maximum entropy model for parsing Proceedings of the International Conference on Spoken Language Processing (ICSLP) 803 806
- Ringger, E.K. Allen, J.F. 1996 Error corrections via a post-processor for continuous speech recognition Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Sha, F. Pereira, F. 2003 Shallow parsing with conditional random fields Proceedings of HLT-NAACL Edmonton Canada
- Tsukada H. Yamamoto H. Takezawa T. Sagisaka Y. “Reliable utterance segment recognition by integrating a grammar with statistical language constraints,” Speech Communication 26 Dec. 1998 299 309
- Woodland, P. Povey, D. 2000 Large scale discriminative training for speech recognition Proc. ISCA ITRW ASR2000 7 16