Jialian Wu1, Jiale Cao2, Liangchen Song1, Yu Wang3, Ming Yang3, Junsong Yuan1

一、摘要

*Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page: https://jialianwu.com/projects/TraDeS.html.*

二、引言

Untitled

JDT存在的问题:

re-id tracking loss 和 detection loss不兼容

re-id关注intra-class variance(类内方差)

detection 关注inter-class difference(类间差异),减少类内方差

解决方案:紧密结合跟踪到检测以及一个专门设计的reid学习方案。

总体流程:

The CVA extracts point-wise re-ID embedding features by the backbone to construct a cost volume that stores matching similarities between the embedding pairs in two frames. Then, we infer the tracking offsets from the cost volume, which are the spatio-temporal displacements of all the points, i.e., potential object centers, in two frames. The tracking offsets together with the embeddings are utilized to conduct a simple two-round long-term data association. Afterwards, the MFW takes the tracking offsets as motion cues to propagate object features from the previous frames to the current one. Finally, the propagated feature and the current feature are aggregated to derive detection and segmentation

  1. CVA通过主干网络提取点级reid嵌入特征,构建一个成本体积(cost volume),存储两帧嵌入对之间匹配的相似性。
  2. 从成本体积(cost volume)中推断出跟踪偏移(tracking offsets),即在两帧内的所有点的时空位移,即潜在物体中心。