Temporal Knowledge Distillation for Sparse Stream DETR 3D Object Detection

Authors

Yixiong Yan

Abstract

Content: Sparse Stream DETR 3D object detection has become pivotal in autonomous driving, and previous methods achieve remarkable performance by aggregating temporal information, which also face a balance problem of precision and efficiency. Knowledge distillation offers a promising solution to enhance the efficiency of a small model without incurring computational overhead; however, previous methods lack the exploration of the Temporal Distillation knowledge for the DETR detector. This paper designs a novel Temporal DETR Query Guidance paradigm to impart temporal relation knowledge from a powerful teacher model to enable the student to associate object states across time, leverage historical context. The teacher’s queries grasp the temporal knowledge through self-attention, and the backbone uses the EVA-02 large-scale image model. The student utilizes the teacher's self-attention layer and its own learnable queries to compute the attention as its guidance and mimics the feature interaction pattern within the teacher model by defining a temporal loss. Specifically, the student model uses its queries to aggregate the information in the key-value features of the teacher model to generate ideal queries and optimize the parameters of the student model by minimizing the L1 distance between the queries and the student model's own attention output. Experiments validate the efficacy of distilling temporal knowledge on the nuScenes dataset.

Meta Tags

Topics: Autonomous vehicles
Electric vehicles
Simulation and modeling
Statistical analysis
Logistics
Imaging and visualization

Details

Citation: Yan, Y., "Temporal Knowledge Distillation for Sparse Stream DETR 3D Object Detection," WCX SAE World Congress Experience, Detroit, Michigan, United States, April 14, 2026, https://doi.org/10.4271/2026-01-0023.

Additional Details