Traffic Crash Prediction and Causal Disentanglement by Machine Learning
2026-99-0552
To be published on 07/10/2026
- Content
- Aiming at the problem of insufficient modeling of spatio-temporal heterogeneity in road traffic accident prediction, a dual task machine learning framework integrating geographical environment, location attributes and time periodicity is proposed. The dataset used in this study was derived from traffic accident records of Nanchang during 2019–2023. Firstly, geographical identifiers are generated by rounding and aggregating latitude and longitude coordinates. At the same time, the location type is processed by a one-hot encoding, so as to carry out spatial clustering analysis of accident hotspots. Compared with the North-South pattern, the contribution of geographical features shows a strong East-West trend. The kernel density heatmap identified Zone A and zone B as dual core high-risk areas. Secondly, the sinusoidal/cosine function is used to encode the time feature circularly, which effectively captures the daily change of the accident. The quantitative analysis of random forest regression model showed that time characteristics accounted for 89.2% of the variance of accident frequency interpretation, significantly exceeding the contribution of geographical factors (10.2%) and location attributes (0.6%). After hyperparameter optimization, the accuracy of XGBoost classifier in predicting serious accidents is 75.97%, and the AUC value is 0.8412, which has strong robustness, and provides reliable support for dynamic risk assessment of traffic management system.
- Citation
- Luo, J., Zhang, Y., Li, X., and Wu, R., "Traffic Crash Prediction and Causal Disentanglement by Machine Learning," The 1st International Academic Conference on Intelligent Transportation and Low-Altitude Transport (ITLAT2025), Nantong, China, June 20, 2025, .