This content is not included in your SAE MOBILUS subscription, or you are not logged in.
Honda Next Generation Speech User Interface
ISSN: 0148-7191, e-ISSN: 2688-3627
Published April 20, 2009 by SAE International in United States
Annotation ability available
Honda, working closely with IBM for their model year 2009 introduction, will be releasing the next step in the evolution of speech user interfaces in vehicles. The new Honda vehicles will include the leading edge Free Form Command (FFC) technology developed by IBM as part of its IBM Embedded ViaVoice (EVV) product line.
At its core, Free Form Commands improve the overall end-user experience by creating a system that is easier to use out of the box by increasing the overall usability of the system. Free Form Commands employ an innovative approach which allows Statistical Language Model technology to be deployed within the constraints of an embedded computing environment.
This technology allows users to speak commands that are not part of a predefined fixed set of phrases, as is required in today’s vehicles, but instead speak command phrases that match the application functional area they are targeting. The FFC technology goes beyond just recognizing what phrase is spoken, in that the system also uses statistical models to determine the meaning of the phrase via semantic interpretation. For example, in today’s systems, the system designer chooses what commands can be spoken to change the radio station, such as “radio tune 97.9”. When a system is designed using Free Form Commands, the system designer specifies that the user should be able to change the radio station, then using statistical modeling techniques, models are developed that enable this task.
As a result, the end-user can now speak a variety of natural phrases to complete that task, such as “tune the radio to 97.9”, “I want to listen to 97.9”, “switch to 97.9”, “change to 97.9”, etc…. This leads to an easier to use system providing greater end-user satisfaction. The improvement in usability is a direct result of reducing the amount of information the user has to memorize. With the previous releases of the system, the user had to remember specific format and phrases of the commands, but with FFC technology, the user only needs to know the type of commands that the system will understand and speak a phrase that is natural to them in that context.
In this paper, we will discuss how Honda engaged IBM at the early stages to produce research level prototypes to set the direction for the technology, We will then describe Honda’s and IBM’s refinement of the technology to target a system that would be deployable in a vehicle, the testing and methodology used to validate the solution was valuable to the end-user, and an overview of the Free Form Command technology.
- Kenneth White - IBM Corporation
- Harvey Ruback - IBM Corporation
- Roberto Sicconi - IBM Corporation
- Mahesh Viswanathan - IBM Corporation
- John Eckhart - IBM Corporation
- Dan Badt - IBM Corporation
- Masashi Morita - IBM Corporation
- Masashi Satomura - Honda R&D Co., Ltd
- Hisayuki Nagashima - Honda R&D Co., Ltd
- Keisuke Kondo - Honda R&D Co., Ltd
CitationWhite, K., Ruback, H., Sicconi, R., Viswanathan, M. et al., "Honda Next Generation Speech User Interface," SAE Technical Paper 2009-01-0518, 2009, https://doi.org/10.4271/2009-01-0518.
- Bahl et al., 1993. Estimating hidden Markov model parameters so as to maximize speech recognition accuracy. IEEE Transactions on Speech and Audio Processing. v1 i1. 77–83.
- Chappelier J., Rajman M., Aragues R., and Rozenknop A.. 1999. Lattice parsing for speech recognition. In Proceedings of the Sixth confrence sur le Traitement Automatique du Langage Naturel (TALN’99), pages 95–104.
- Chelba Ciprian and Jelinek. Frederick 2000. Structured language modeling. Computer Speech and Language, 14(4):283–332.
- Gruenstein A. and Seneff S., “CONTEXT-SENSITIVE LANGUAGE MODELING FOR LARGE SETS OF PROPER NOUNS IN MULTIMODAL DIALOGUE SYSTEMS,” Spoken Language Technology Workshop, 2006. IEEE, 2006, pp. 130–133.
- Kiefer, Bernd Krieger, Hans-Ulrich and Nederhof. Mark-Jan 2000. Efficient and robust parsing of word graphs. In Wahlster, W. editor, Verbmobil: Foundations of Speech-to-Speech Translation, pages 280–295. Springer, Berlin.
- Mangu, L., Padmanabhan, M., 2001. Error corrective mechanisms for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
- Manning, Christopher D. Schütze, Hinrich Foundations of statistical natural language processing, MIT Press, Cambridge, MA, 1999
- Ratnaparkhi, A., Roukos, S., Ward, R.T., 1994. A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP). pp. 803–806.
- Ringger, E.K., Allen, J.F., 1996. Error corrections via a post-processor for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
- Sha, F., Pereira, F., 2003. Shallow parsing with conditional random fields. Proceedings of HLT-NAACL. Edmonton, Canada.
- Tsukada H., Yamamoto H., Takezawa T., and Sagisaka Y., “Reliable utterance segment recognition by integrating a grammar with statistical language constraints,” Speech Communication, vol. 26, Dec. 1998, pp. 299–309.
- Woodland, P., Povey, D., 2000. Large scale discriminative training for speech recognition. In: Proc. ISCA ITRW ASR2000. pp. 7–16.