Avoiding the CRASH: A Vision Language Model Evaluation in Critical Traffic Scenarios

2025-01-8213

To be published on 04/01/2025

Event
WCX SAE World Congress Experience
Authors Abstract
Content
Autonomous Vehicles (AVs) have transformed transportation by reducing human error and enhancing traffic efficiency, driven by deep neural network (DNN) models that power image classification and object detection. However, to maintain optimal performance, these models require periodic re-training; failure to do so can result in malfunctions that may lead to accidents. Given this issue, Vision-Language Models (VLMs) such as LLaVA can effectively correlate visual and textual information while their robustness to variability enables them to generalize across diverse environments, making them highly effective for analyzing vehicle crash situations. To evaluate the decision-making capabilities of these models across common crash scenarios, a set of real-world crash incident videos was collected. By decomposing these videos into frame-by-frame images, we task the VLMs to determine the appropriate driving action at each frame: accelerate, brake, turn left, turn right, or maintain the current course. For each frame, three sets of outputs are analyzed: the actual action executed in the video, the action a human driver would likely take to avoid a crash, and the action the VLM predicts as optimal to avoid a crash. Performance metrics, including accuracy and F1 Scores, are employed to assess and compare the models’ effectiveness. Our findings reveal that VLMs demonstrate a high level of consistency and accuracy in decision-making, underscoring their potential role in autonomous driving systems (ADS), supporting both real-time decision-making for human drivers and fully autonomous operations. The results highlight the adaptability and robustness of VLMs, making them promising tools for advancing future AV technologies.
Meta TagsDetails
Citation
Fernandez, D., MohajerAnsari, P., Salarpour, A., and Pesé, M., "Avoiding the CRASH: A Vision Language Model Evaluation in Critical Traffic Scenarios," SAE Technical Paper 2025-01-8213, 2025, .
Additional Details
Publisher
Published
To be published on Apr 1, 2025
Product Code
2025-01-8213
Content Type
Technical Paper
Language
English