Benchmarking Large Language Models for Motorway Driving Scenario Understanding
2025-01-7146
02/21/2025
- Features
- Event
- Content
- Systematic testing of Automated Driving Systems (ADS) requires finding relevant test cases. The extraction of critical cases, also called edge or corner cases, from naturalistic driving data is a complex task and often prone to multiple errors. Large Language Models (LLMs) have been employed for virtual testing of ADS in recent years; however, quantitatively benchmarking LLMs’ performance in this task has been barely investigated. In this paper, based on the characteristics of different LLMs, six LLMs were selected for benchmarking the LLMs’ ability to understand ADS functional scenarios on motorways. A novel scenario classification model was introduced to enhance the granularity of data categorization for motorway driving scenarios. Different driving scenarios, described in natural language, were defined for testing the capability of these LLMs to understand various scenarios and convert them into standardized structured data. To perform the benchmarking in a standardized manner, the same prompt engineering and the same dataset were used to interact with each selected LLM and explore the LLMs’ sensitivity to language style variation. For each group of classified driving scenarios, two different formats of natural language descriptions were fed to the LLMs for splitting the testing data. The test results indicate that “gpt-4-1106-preview” model achieves the highest accuracy, followed by “gpt-3.5-turbo”, and “llama3-70b-instruct”, while other LLMs show error consistency between 40% and 60%. The LLMs “gpt-4-1106-preview” and “llama3-70b-instruct” feature lower error consistency in their outputs under the two different formats of natural language, indicating greater robustness in handling varying textual inputs. The outcome of this work contributes to applications of LLMs on scenario extraction for ADS testing.
- Pages
- 10
- Citation
- Zhou, J., Zhao, Y., Yang, A., and Eichberger, A., "Benchmarking Large Language Models for Motorway Driving Scenario Understanding," SAE Technical Paper 2025-01-7146, 2025, https://doi.org/10.4271/2025-01-7146.