Temporal Knowledge Distillation for Sparse Stream DETR 3D Object Detection

2026-01-0023

To be published on 04/07/2026

Features
Event
Authors
Abstract
Content
Sparse Stream DETR 3D object detection has become pivotal in autonomous driving, and previous methods achieve remarkable performance by aggregating temporal information, which also face a balance problem of precision and efficiency. Knowledge distillation offers a promising solution to enhance the efficiency of a small model without incurring computational overhead; however, previous methods lack the exploration of the Temporal Distillation knowledge for the DETR detector. This paper designs a novel Temporal DETR Query Guidance paradigm to impart temporal relation knowledge from a powerful teacher model to enable the student to associate object states across time, leverage historical context. The teacher’s queries grasp the temporal knowledge through self-attention, and the backbone uses the EVA-02 large-scale image model. The student utilizes the teacher's self-attention layer and its own learnable queries to compute the attention as its guidance and mimics the feature interaction pattern within the teacher model by defining a temporal loss. Specifically, the student model uses its queries to aggregate the information in the key-value features of the teacher model to generate ideal queries and optimize the parameters of the student model by minimizing the L1 distance between the queries and the student model's own attention output. Experiments validate the efficacy of distilling temporal knowledge on the nuScenes dataset.
Meta TagsDetails
Citation
Yan, Y., "Temporal Knowledge Distillation for Sparse Stream DETR 3D Object Detection," WCX SAE World Congress Experience, Detroit, Michigan, United States, April 14, 2026, .
Additional Details
Publisher
Published
To be published on Apr 7, 2026
Product Code
2026-01-0023
Content Type
Technical Paper
Language
English