Browse Topic: Voice / speech

Items (151)
ABSTRACT This paper describes work to develop a hands-free, heads-up control system for Unmanned Ground Vehicles (UGVs) under an SBIR Phase I contract. Industry is building upon pioneering work that it has done in creating a speech recognition system that works well in noisy environments, by developing a robust key word spotting algorithm enabling UGV Operators to give speech commands to the UGV completely hands-free. Industry will also research and develop two sub-vocal control modes: whisper speech and teeth clicks. Industry is also developing a system that will enable the Operator to drive a UGV, with a high level of fidelity, to a location selected by the Operator using hands-free commands in conjunction with image segmentation and video overlays. This Phase I effort will culminate in a proof-of-concept demonstration of a hands-free, heads-up system, implemented on a small UGV, that will enable the Operator have a high level of fidelity for control of the system
Brown, JonathanGray, Jeremy P.Blanco, ChrisJuneja, AmitAlberts, JoelReinerman, Lauren
ABSTRACT The confluence of intra-vehicle networks, Vehicular Integration for (C4ISR) Command, Control Communication, Computers, Intelligence, Surveillance, Reconnaissance/(EW) Electronic Warfare Interoperability (VICTORY) standards and onboard general-purpose processors creates an opportunity to implement Army combat ground vehicle intercommunications (intercom) capability in software. The benefits of such an implementation include 1) SWAP savings, 2) cost savings, 3) simplified path to future upgrades and 4) enabling of potential new capabilities such as voice activated mission command. The VICTORY Standards Support Office (VSSO), working at the direction of its Executive Steering Group (ESG) members (Program Executive Office (PEO) Ground Combat Systems (GCS), PEO Combat Support and Combat Service Support (CS&CSS), PEO Command Control Communications-Tactical (C3T) and PEO Intelligence, Electronic Warfare and Sensors (IEW&S)), has developed and demonstrated a software intercom
Kelsch, GeoffreySerafinko, RobertFrissora, Anthony
Game-like navigation visuals Conversational-style voice commands. Contactless biometric sensing. A tidal wave of software code and sensing technologies are being prepped to alter in-vehicle activities. Two supplier companies, TomTom and Mitsubishi Electric Automotive America (MEAA), recently presented their concept cockpit demonstrators to media at TomTom's North American corporate offices in Farmington Hills, Michigan. A few highlights
Buchholz, Kami
Speech enhancement can extract clean speech from noise interference, enhancing its perceptual quality and intelligibility. This technology has significant applications in in-car intelligent voice interaction. However, the complex noise environment inside the vehicle, especially the human voice interference is very prominent, which brings great challenges to the vehicle speech interaction system. In this paper, we propose a speech enhancement method based on target speech features, which can better extract clean speech and improve the perceptual quality and intelligibility of enhanced speech in the environment of human noise interference. To this end, we propose a design method for the middle layer of the U-Net architecture based on Long Short-Term Memory (LSTM), which can automatically extract the target speech features that are highly distinguishable from the noise signal and human voice interference features in noisy speech, and realize the targeted extraction of clean speech. Then
Pei, KaikunZhang, LijunMeng, DejianHe, Yinzhi
ChatGPT has entered the car. At CES 2024, Volkswagen and technology partner Cerence introduced an update to IDA, VW's in-car voice assistant, so it can now use ChatGPT to expand what's possible using voice commands in vehicles. VW said the ChatGPT bot will be available in Europe in current MEB and MQB evo models from VW Group brands that currently use the IDA voice assistant. That includes some members of the ID family - the ID.7, ID.4, ID.5 and ID.3 - as well as the new Tiguan, Passat and Golf models. VW brands Seat, Škoda, Cupra and VW Commercial Vehicles also will get IDA integration. VW hopes to bring IDA to other markets, including North America, but did not make any timing announcements
Blanco, Sebastian
In this study, a novel assessment approach of in-vehicle speech intelligibility is presented using psychometric curves. Speech recognition performance scores were modeled at an individual listener level for a set of speech recognition data previously collected under a variety of in-vehicle listening scenarios. The model coupled an objective metric of binaural speech intelligibility (i.e., the acoustic factors) with a psychometric curve indicating the listener’s speech recognition efficiency (i.e., the listener factors). In separate analyses, two objective metrics were used with one designed to capture spatial release from masking and the other designed to capture binaural loudness. The proposed approach is in contrast to the traditional approach of relying on the speech recognition threshold, the speech level at 50% recognition performance averaged across listeners, as the metric for in-vehicle speech intelligibility. Results from the presented analyses suggest the importance of
Samardzic, NikolinaLavandier, MathieuShen, Yi
I know nothing more about artificial intelligence (AI) than what I read and what learned people tell me. I know it's supposed to bring new sophistication to all manner of processes and technologies, including automated driving. So, when a driverless robotaxi operated by GM's Cruise plowed into a road section of freshly poured cement in San Francisco, it raised questions about recently beleaguered Cruise. My mind wandered to AI, which many AV compute “stacks” are touted to leverage in abundance. Driving into wet cement isn't intelligent. Did somebody need to train the vehicle's AV stack specifically to recognize wet cement? If that's how it works, I'd prefer not to bet my life on whether some fairly oddball happenstance (is the term ‘edge case’ not cool anymore?) had been accounted for in that particular version of the AD system's algorithm running that particular day
Visnic, Bill
Although SAE level 5 autonomous vehicles are not yet commercially available, they will need to be the most intelligent, secure, and safe autonomous vehicles with the highest level of automation. The vehicle will be able to drive itself in all lighting and weather conditions, at all times of the day, on all types of roads and in any traffic scenario. The human intervention in level 5 vehicles will be limited to passenger voice commands, which means level 5 autonomous vehicles need to be safe and capable of recovering fail operational with no intervention from the driver to guarantee the maximum safety for the passengers. In this paper a LiDAR-based fail-safe emergency maneuver system is proposed to be implemented in the level 5 autonomous vehicle. This system is composed of an external redundant 3600 spinning LiDAR sensor and a redundant ECU that is running a single task to steer and fully stop the vehicle in emergency situations (e.g., vehicle crash, system failure, sensor failures
Alrousan, QusayAlzu'bi, HamzehTasky, TomVarasquim, Juliano
The smart cockpit has become an irreplaceable element for many new automobile brands, particularly New Energy Vehicles (NEV) of “new forces”. Since the cockpit is a direct interface for the interactions between users and the intelligent and connected functions of the vehicle, any improvements would be easily perceived by users and thus would directly affect user experiences. It would be most important to capture, collect, and understand what users need for a smart cockpit. Users’ online comments on existing smart cockpits contain information on users’ requirements. However, the current user comment text data is too massive, tanglesome, and sparse to process. How to efficiently mine valuable information from these data is non-trivial. This paper focuses on applying the Natural Language Process (NLP) technology for design, development, improvement, and update of a vehicle company’s smart cockpit. By obtaining user comment data from various sources such as eco-system Applications (APP
Lin, ShenheZou, JingkaiZhang, ChaokaiLai, XinjunMao, NingFu, Hui
As the world is moving from a manual workforce to a robot-based workforce, there is a huge scope for improved methods to make production lines more efficient. In this work, an effort is made to implement human-robot collaboration into an industrial process and is demonstrated with a flange assembly-line model. This paper explains how the Yolov4 algorithm was improved and fine-tuned to meet the requirements. A customized workspace was designed and manufactured to make the components more accessible. Different types of grippers were compared and the simplest and most efficient was then selected. Camera selection and calibration were done to get the RGB coordinates and the depth values which were finally converted into the robot's coordinate frame. The coordinates are then fed as the end goal position for the end effector to which the robot plans its motion and then executes. The paper also explains how the model responds to voice commands using the Google API to convert audio messages to
Seby, HarrisonSaju George, AlbinSadique, AnwarP P, Lalu
The general English speech recognition is based on the techniques of n-grams where the words before and after are predicted and the utterance prediction is produced. At the same time, having a significantly lengthier n-gram has its own impact in training and the accuracy. Shorter n-grams require the utterances to be split and predicted than using the complete utterance. This article discusses specific techniques to address the specific problems in Air Traffic Speech, which is a medium length utterance domain. Moving from the adapted language models (LMs) to rescored LM, a combined technique of syntax analysis along with a deep learning model is proposed, which improves the overall accuracy. It is explained that this technique can help to adapt the proposed method for different contexts within the same domain and can be successful
Srinivasan, NarayananBalasundaram, S. R.
Today, commercially available drones have limited use-cases in the rapidly evolving community. However, with advances in drone and software technology, it is possible to utilize these aerial machines to solve problems in a variety of industries such as mining, medical, construction, and law enforcement. For example, in order to reduce time of investigation, Indiana State Police are currently utilizing ad-hoc commercial drones to reconstruct crash scenes for insurance and legal purposes. In this paper, we illustrate how to effectively integrate drones for in-vehicle services and real-time prediction for automotive applications. In order to accomplish this, we first integrate simpler controls such as voice-commands to control the drone from the vehicle. Next, we build smart prediction software that monitors vehicle behavior and reacts in real-time to collisions. Furthermore, we employ object recognition techniques through In-Vehicle Infotainment (IVI) systems to identify the surroundings
Nithiyanantham, MayunthanSinnapolu, Giribabu
The objective of this ARP is to provide a set of user-centered design guidelines for the implementation of data driven electronic aeronautical charts, which dynamically create charts from a database of individual elements. The data driven chart is intended to provide information required to navigate, but it is not intended to supplant the aircraft’s primary navigation display. These guidelines seek to provide a balance between standardization of equipment with similar intended functions and individual manufacturer innovation. This ARP provides guidelines for the display of an electronic chart that can replace existing paper. This document addresses what information is required, when it is required, and how it should be displayed and controlled. This document does not include all the detailed specifications required to generate an electronic aeronautical chart. This document primarily addresses the human factors aspects of electronic chart display, and does not address the software
G-10EAB Executive Advisory Group
The performance of a vehicle’s Automatic Speech Recognition (ASR) system is dependent on the signal to noise ratio (SNR) in the cabin at the time a user voices their command. HVAC noise and environmental noise in particular (like road and wind noise), provide high amplitudes of broadband frequency content that lower the SNR within the vehicle cabin, and work to mask the user’s speech. Managing this noise is a vital key to building a vehicle that meets the customer’s expectations for ASR performance. However, a speech recognition engineer is not likely to be the same person responsible for designing the tires, suspension, air ducts and vents, sound package and exterior body shape that define the amount of noise present in the cabin. If objective relationships are drawn between the vehicle level performance of the ASR system, and the vehicle or system level performance of the individual noise, vibration and harshness (NVH) attributes, a partnership between the groups is brokered
Wheeler, Joshua
Voice Recognition (VR) systems have become an integral part of the infotainment systems in the current automotive industry. However, its recognition rate is impacted by external factors such as vehicle cabin noise, road noise, and internal factors which are a function of the voice engine in the system itself. This paper analyzes the VR performance under the effect of two external factors, vehicle cabin noise and the speakers’ speech patterns based on gender. It also compares performance of mid-level sedans from different manufacturers
Khan, RasheedAli, MahdiFrank, Eric C.
The design and operation of a vehicle’s heating, ventilation, and air conditioning (HVAC) system has great impact on the performance of the vehicle’s Automatic Speech Recognition (ASR) and Hands-Free Communication (HFC) system. HVAC noise provides high amplitudes of broadband frequency content that affects the signal to noise ratio (SNR) within the vehicle cabin, and works to mask the user’s speech. But what’s less obvious is that when the airflow from the panel vents or defroster openings can be directed toward the vehicle microphone, a mechanical “buffeting” phenomenon occurs on the microphone’s diaphragm that distresses the ASR system beyond its ability to interpret the user’s voice. The airflow velocity can be strong enough that a simple windscreen on the microphone is not enough to eliminate the problem. Minimizing this buffeting effect is a vital key to building a vehicle that meets the customer’s expectations for ASR and HFC performance. Systems design principles must be applied
Wheeler, Joshua
This paper describes a method to validate in-vehicle speech recognition by combining synthetically mixed speech and noise samples with batch speech recognition. Vehicle cabin noises are prerecorded along with the impulse response from the driver's mouth location to the cabin microphone location. These signals are combined with a catalog of speech utterances to generate a noisy speech corpus. Several factors were examined to measure their relative importance on speech recognition robustness. These include road surface and vehicle speed, climate control blower noise, and driver's seat position. A summary of the main effects from these experiments are provided with the most significant factors coming from climate control noise. Additionally, a Signal to Noise Ratio (SNR) experiment was conducted highlighting the inverse relationship with speech recognition performance
Huber, JohnRangarajan, RanjaniJi, AnCharette, FrancoisAmman, ScottWheeler, JoshuaRichardson, Brigitte
In this paper, a systems engineering approach is explored to evaluate the effect of design parameters that contribute to the performance of the embedded Automatic Speech Recognition (ASR) engine in a vehicle. This includes vehicle designs that influence the presence of environmental and HVAC noise, microphone placement strategy, seat position, and cabin material and geometry. Interactions can be analyzed between these factors and dominant influencers identified. Relationships can then be established between ASR engine performance and attribute performance metrics that quantify the link between the two. This helps aid proper target setting and hardware selection to meet the customer satisfaction goals for both teams
Wheeler, JoshuaRichardson, BrigitteAmman, ScottJi, AnHuber, JohnRangarajan, Ranjani
This paper describes two case studies in which multiple microphone processing (beamforming) and microphone location were evaluated to determine their impact on improving embedded automatic speech recognition (ASR) in a vehicle hands-free environment. While each of these case studies was performed using slightly different evaluation set-ups, some specific and general conclusions can be drawn to help guide engineers in selecting the proper microphone location and configuration in a vehicle for the improvement of ASR. There were some outcomes that were common to both dual microphone solutions. When considering both solutions, neither was equally effective across all background noise sources. Both systems appear to be far more effective for noise conditions in which higher frequency energy is present, such as that due to high levels of wind noise and/or HVAC (heating, ventilation and air conditioning) blower noise. Microphone location was also shown to have a substantial effect on the
Amman, ScottHuber, JohnCharette, Francoisrichardson, BrigitteWheeler, Joshua
With the development of automotive HMI and mobile internet, many interactive modes are available for drivers to fulfill the in-vehicle secondary tasks, e.g. dialing, volume adjustment, music playing. For driving safety and drivers’ high expectation for HMI, it is urgent to effectively evaluate interactive mode with good efficiency, safety and good user experience for each secondary tasks, e.g. steering wheel buttons, voice control. This study uses a static driving simulation cockpit to provide driving environment, and sets up a high-fidelity driving cockpit based on OKTAL SacnerStudio and three-dimensional modeling technology. The secondary tasks supported by HMI platform are designed by customer demands research. The secondary task test is carried out based on usability test theory, and the influence on driving safety by different interactive modes is analyzed. By F-ANP fuzzy network analysis method, the different influence factors of secondary task interactive modes are taken into
Ma, JunGong, ZaiyanDong, Yiwei
Sub-audible speech is a new form of human communication that uses tiny neural impulses (EMG signals) in the human vocal tract instead of audible sounds. These EMG signals arise from commands sent by the brain’s speech center to tongue and larynx muscles that enable production of audible sounds. Sub-audible speech arises from EMG signals intercepted before an audible sound is produced and, in many instances, allows inference of the corresponding word or sound. Where sub-audible speech is received and appropriately processed, production of recognizable sounds is no longer important. Further, the presence of noise and of intelligibility barriers, such as accents associated with the audible speech, no longer hinder communication
Hands-free phone use is the most utilized use case for vehicles equipped with infotainment systems with external microphones that support connection to phones and implement speech recognition. Critically then, achieving hands-free phone call quality in a vehicle is problematic due to the extremely noisy nature of the vehicle environment. Noise generated by wind, mechanical and structural, tire to road, passengers, engine/exhaust, HVAC air pressure and flow are all significant contributors and sources of noise. Other factors influencing the quality of the phone call include microphone placement, cabin acoustics, seat position of the talker, noise reduction of the hands-free system, etc. This paper describes the work done to develop procedures and metrics to quantify the effects that influence the hands-free phone call quality. It will be shown that a listening study of using 49 evaluators, indicated that the ETSI EG 202 396-3EG (VoIP Standard) for SMOS (Speech Mean Opinion Score) and
Amman, ScottCharette, FrancoisNicastri, PaulHuber, JohnRichardson, BrigittePuskorius, GintGur, YukselCooprider, Anthony
The scope of this document is a technology-neutral approach to speech input and audible output system guidelines applicable for OEM and aftermarket systems in light vehicles. These may be stand-alone interfaces or the speech aspects of multi-modal interfaces. This document does not apply to speech input and audible output systems used to interact with automation or automated driving systems in vehicles that are equipped with such systems while they are in use (ref. J3016:JAN2014
Driver Vehicle Interface (DVI) Committee
The need for a voice recognition system in the automotive industry is growing day by day. In our current voice recognition system, Hyundai's ‘Blue-Link’ and KIA's ‘UVO’ are developed with Microsoft which is a global software company. The system launched domestic market recently. Since usage of voice recognition system are increasing, research and development of Voice Recognition system also increase very fast. Research is mostly focus on increase recognition rate of speech. However there is no research of interior layout considering voice recognition usability. So in this research, we discover interior design factors for maximizing voice recognition usability
Choi, Mingyu
Items per page:
1 – 50 of 151