Traffic collision reconstruction traditionally relies on human expertise and,
when performed properly, can be incredibly accurate. However, attempting to
perform pre-crash reconstruction, i.e., reconstructing the driver and vehicle
behaviors that preceded the actual crash, poses significantly more challenges.
This study develops a multi-agent artificial intelligence (AI) framework that
reconstructs pre-crash scenarios and infers vehicle behaviors from fragmented
collision data. We present a two-phase collaborative framework combining
reconstruction and reasoning phases. The system processes 277 rear-end lead
vehicle deceleration (LVD) collisions from the Crash Investigation Sampling
System (CISS; 2017–2022), integrating textual crash reports, structured tabular
data, and visual scene diagrams. Phase I generates natural language crash
reconstructions from multimodal inputs. Phase II performs in-depth crash
reasoning by combining these reconstructions with the temporal event data
recorder (EDR). This enables precise identification of striking and struck
vehicles while isolating the EDR records most relevant to the collision moment,
thereby revealing crucial pre-crash driving behaviors. For validation, we
applied it to all LVD cases, focusing on a subset of 39 complicated EDR cases
where multiple EDR records per collision introduced possible ambiguity (e.g.,
due to missing or conflicting data). Ground truth was established via consensus
between manual annotations (two independent researchers), with a separate large
language model (LLM) used only to flag possible conflicts for re-checking.
In the full end-to-end evaluation, the framework achieved 100% accuracy across
all 4155 trials (277 cases × 5 runs × 3 models), with three reasoning models
producing identical outputs, confirming that performance derives from the
structured prompt design rather than model-specific characteristics. In
contrast, research analysts without specialized reconstruction training achieved
92.31% accuracy on the same 39 complex cases. In separate ablation experiments
on the 39 complicated EDR cases, where one randomly selected Phase I output from
the full end-to-end evaluation was fixed as the unified input for Phase II and
each model was tested with 10 independent runs, removing the structured
reasoning anchors reduced case-level accuracy from 99.7% to 96.5%, with errors
spreading from a single output type to multiple analytical dimensions. The
system maintained robust performance even when processing incomplete data. This
zero-shot evaluation, conducted without any domain-specific training or
fine-tuning, demonstrates that the framework’s effectiveness stems from its
multi-agent architecture and prompt engineering, offering a scalable approach
for AI-assisted pre-crash analysis.