一、摘要

Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories. With the advancement of deep neural networks and the increasing demand for intelligent video analysis, MOT has gained significantly increased interest in the computer vision community. Embedding methods play an essential role in object location estimation and temporal identity association in MOT. Unlike other computer vision tasks, such as image classification, object detection, re-identification, and segmentation, embedding methods in MOT have large variations, and they have never been systematically analyzed and summarized. In this survey, we first conduct a comprehensive overview with in-depth analysis for embedding methods in MOT from seven different perspectives, including patch-level embedding, single-frame embedding, cross-frame joint embedding, correlation embedding, sequential embedding, tracklet embedding, and cross-track relational embedding. We further summarize the existing widely used MOT datasets and analyze the advantages of existing state-of-the-art methods according to their embedding strategies. Finally, some critical yet under-investigated areas and future research directions are discussed.

在MOT中，嵌入方法在**目标位置估计（object location estimation ）和时间身份关联（temporal identity association）**中起着重要作用
MOT的嵌入方法变化较大，并且从未系统地分析和总结过
在本文中，将从patch-level embedding、single-frame embedding、cross-frame joint embedding、correlation embedding、sequential embedding、tracklet embedding、cross-track relational embedding七个不同的角度对MOT中的嵌入方法进行了全面的概述和深入的分析。

二、研究背景

近年来出现了各种跟踪算法
- 从图聚类方法（graph clustering method）到图神经网络（graph neural networks）
- 从 tracking-by-detection 范式到 joint detection and tracking 范式 到使用多帧来提高检测性能
- 从卡尔曼滤波 到 RNN 到 LSTM
MOT 算法主要可以分别为两大类，分别是
- embedding model（嵌入模型）
- association algorithm（关联算法）

三、 Embedding Methods in MOT

3.1 Overview and Notation

本文所使用的的符号说明

一、摘要

二、研究背景

三、 Embedding Methods in MOT

3.1 Overview and Notation

3.2 Patch-Level Box Image Embedding