Deep Learning-Based Improvement and Validation of Vehicles’ Speed Extraction from Aerial Videos

2025-01-7349

12/31/2025

Authors
Abstract
Content
Unmanned Aerial Vehicles (UAVs) offer high efficiency, low cost, and strong mobility, making them well-suited for traffic vehicle detection. However, dense targets, rapid scene changes, and small object sizes in aerial videos reduce detection accuracy, which in turn affects the precision of speed extraction algorithms. To address these issues, this paper proposes a speed extraction method that integrates an improved You Only Look Once Version 11 (YOLOv11) with the Deep Simple Online and Realtime Tracking (DeepSORT) algorithm. On the detection side, several architectural enhancements are introduced. A Haar wavelet-based HWD downsampling module preserves fine-grained details, a CSK2_m multi-scale convolution block with a CCFM feature fusion structure strengthens cross-scale representation, and an additional detection head at the P2 layer improves the recall of tiny objects in complex scenes. Extensive experiments on a hybrid dataset constructed from VisDrone2019 and a custom UAV dataset show that the proposed model consistently improves detection across categories. Notably, for two-wheel vehicles, precision increases by 9.0% and recall improves by 4.6%, demonstrating clear advantages in small-object detection. Comparisons with YOLOv5–YOLOv12 further confirm that the improved YOLOv11 achieves the best overall accuracy, reaching an mAP@0.5:0.95 of 54.3% while maintaining real-time inference capability at 65.9 FPS despite a moderate increase in parameters. In addition, ablation studies verify that each module contributes to performance gains, with HWD producing the largest improvement and the combined design achieving the best results. Moreover, integration with DeepSORT significantly enhances tracking, improving MOTA by 9.13% and reducing ID switches by nearly two-thirds. Finally, three real-vehicle experiments using speed data from an Integrated Inertial Navigation System (IINS) validate the method’s accuracy, achieving a minimum mean squared error of 0.324. These results validate the proposed method’s effectiveness and practicality in real-world UAV-based traffic monitoring and speed estimation tasks.
Meta TagsDetails
Pages
12
Citation
Ye, Xin, Xiaoxuan Cheng, and Xiangdong Li, "Deep Learning-Based Improvement and Validation of Vehicles’ Speed Extraction from Aerial Videos," SAE Technical Paper 2025-01-7349, 2025-, .
Additional Details
Publisher
Published
9 hours ago
Product Code
2025-01-7349
Content Type
Technical Paper
Language
English