This study investigates the precursors of crashes under varying traffic states
through an in-depth analysis of freeway traffic data. This method effectively
addresses the limitations associated with using surrogate measures in traffic
safety research. We used the k-means clustering method to categorize traffic
states into three types: free flow, transitional state, and congested flow. By
employing the case-control study experimental approach, we conducted an in-depth
analysis of the traffic data. During the feature selection process, we set
matching rules to choose control group data that meet the criteria of time,
location, and traffic state. Initially, traffic flow feature variables were
constructed based on multiple dimensions, including time window width, spatial
location, traffic flow parameters, and statistical characteristics. To reduce
feature multicollinearity, we used correlation matrices and variance inflation
factors (VIF). We then applied Recursive Feature Elimination (RFE) combined with
the XGBoost model to select key features, and interpreted the impact of these
features on crash occurrence using the SHapley Additive exPlanations (SHAP)
value. Finally, we employed a logistic regression model to evaluate the selected
important features, reflecting the relationship between key features and crashes
from a broad perspective. The results indicate significant differences in the
main factors affecting crashes under different traffic conditions. In the free
flow state, the relationship between the variability of flow and speed and crash
occurrence is more significant. In the transitional state, the differences in
vehicle distribution and speed across lanes significantly affect crashes; while
in the congested flow state, the standard deviation of speeds among upstream
lanes and the average flow of downstream have a greater impact on crashes. This
study not only enhances the interpretability of traffic crash analysis methods
but also provides a basis for traffic management departments to formulate
corresponding traffic safety strategies for different scenarios.