Jialian Wu1, Jiale Cao2, Liangchen Song1, Yu Wang3, Ming Yang3, Junsong Yuan1
*Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page: https://jialianwu.com/projects/TraDeS.html.*
JDT存在的问题:
re-id tracking loss 和 detection loss不兼容
re-id关注intra-class variance(类内方差)
detection 关注inter-class difference(类间差异),减少类内方差
解决方案:紧密结合跟踪到检测以及一个专门设计的reid学习方案。
总体流程:
The CVA extracts point-wise re-ID embedding features by the backbone to construct a cost volume that stores matching similarities between the embedding pairs in two frames. Then, we infer the tracking offsets from the cost volume, which are the spatio-temporal displacements of all the points, i.e., potential object centers, in two frames. The tracking offsets together with the embeddings are utilized to conduct a simple two-round long-term data association. Afterwards, the MFW takes the tracking offsets as motion cues to propagate object features from the previous frames to the current one. Finally, the propagated feature and the current feature are aggregated to derive detection and segmentation