In the testing and validation of autonomous driving systems, scenario-based
simulation is crucial to address the high costs and insufficient scene coverage
of real-road testing. However, existing simulators rely on handcrafted rules to
generate traffic scenarios, failing to capture the complexity of multi-agent
interactions and physical rationality in real traffic. This paper proposes
STGT-Gen, a data-driven Spatio-Temporal Graph Transformer framework, to generate
realistic and diverse multi-vehicle traffic scenarios by integrating
spatio-temporal interaction modeling, physical constraints, and high-definition
(HD) map information.STGT-Gen adopts an encoder-decoder architecture: The
encoder captures temporal dependencies of vehicle trajectories and spatial
interactions via a Temporal Transformer and a Spatial Graph Transformer,
respectively, while a hierarchical map encoding module fuses lane topologies and
traffic rules. The decoder ensures physical feasibility during long-term
trajectory generation through the Separating Axis Theorem (SAT) for collision
detection and dynamic constraints (acceleration and steering angle limits).
Experiments on real-world rounD and highD traffic datasets show that compared
with the LSTM baseline model and recent Transformer-based methods, STGT-Gen
achieves three-dimensional optimization: the Average Displacement Error (ADE) is
reduced by 34.6%–40.7% compared to LSTM and by 12.3%–18.5% compared to
Transformer baselines, the collision rate decreases by 62%, and the lane
deviation rate drops by 81%. These results significantly enhance the trajectory
accuracy, physical safety, and map compliance of generated scenarios, providing
an efficient solution for high-fidelity scenario testing of autonomous driving
systems.