个人理解

创新点：
1. 设计了新的triplet loss——batch hard triplet loss，并和其它triplet loss的变体做了系统的评估。
2. 作者发现使用非平方欧氏距离作为度量函数更稳定。同时，使用非平方欧氏距离使得margin这个参数更具有可读性。
3. 作者发现，对于行人重识别来说，有必要不断地拉近同类目标的距离，并提出了soft-margin。
4. 对于是否需要pre-trained模型，进行了实验对比分析。
**为什么：目前在行人重识别领域中，社区的普遍看法是triplet loss不如 surrogate loss（classfication loss，verification loss）+metric learning，**而作者则认为使用triplet loss 训练的CNN在行人重识别任务上是优于当前最先进的方法，因此，作者对此进行了实验验证并发表了该论文！
怎么做：作者为了证明triplet loss在行人重识别上的重要性，不仅提出了batch hard triplet loss并且进行了一系列的实验，本文可浓缩成一句话：“A well designed triplet loss has a significant impact on the result”

一、摘要

In the past few years, the fifield of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person reidentifification subfifield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classififi-cation, verifification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.

在行人重识别领域中，社区的普遍看法是triplet loss不如surrogate loss（classfication loss，verification loss），然后是单独的度量学习（metric learning）步骤。
- Deep Metric Learning：深度度量学习，也就是相似度学习，该方法旨在学习一个映射函数$f_θ$将数据点从高维的向量空间$R^F$中映射到低维的向量空间$R^D$中，在新的向量空间$R^D$中，我们希望保留数据点在原向量空间$R^F$的特征，并且使得有相同label的数据点的距离更近，具有不同label 的数据点的距离更远。
- Classification Loss：将 Re-ID 的训练过程当成图像分类问题，同一个行人的不同图片当成一个类别，常见的有 Softmax 后加交叉熵损失函数。
  - 当目标很大时，会严重增加网络参数，而训练结束后很多参数都会被摒弃。
- Verification Loss：将 Re-ID 的训练当成图像匹配问题，是否属于同一个行人来进行二分类学习，常见的可以使用对比损失函数。
  - 只能成对的判断两张图片的相似度，因此很难应用到目标聚类和检索上去。因为一对一对比太慢。
- Triplet Loss：将 Re-ID 的训练当成图像检索问题，同一个行人图片的特征距离要小于不同行人的特征距离，其基本思想是，通过预定义的边缘（margin），正对之间的距离应该小于负对样本之间的距离。Triplet loss 包含一个anchor sample，一个 positive sample（与anchor sample为同一个ID），还有一个 negative sample。Triplet loss最早来源于Google的FaceNet，Triplet loss的想法很简单：类内距离趋小，类间距离趋大。是当前应用很广泛的一种损失函数。在FaceNet中，通过构建embedding方式，将人脸图像直接映射到欧式空间，而优化这种embedding的方法可以概括为，构建许多组三元组（Anchor，Positive，Negative），其中Anchor与Positive同label，Anchor与Negative不同label（在人脸识别里面，即就是Anchor，Positive是同一个个体，而与Negative是不同个体），通过学习优化这embedding，使得欧式空间内的Anchor与Positive 的距离比与Negative的距离要近。
  - 端到端，简单直接；自带聚类属性；特征高度嵌入，但是不好训练。
  - Triplet loss通常能比classification得到更好的feature。还有一个优点就是Triplet loss可以卡阈值，Triplet loss训练的时候需要设置一个margin，这个margin可以控制正负样本的距离，当feature 进行normalization后，可以更加方便的卡个阈值来判断是不是同一个ID。
  - 传统的Triplet loss训练需要一个三元组，achor（a）、positive（p）、negative（n）
  - Triplet loss的缺点在于随机从训练集中挑选三张图片，那么可能挑选出来的可能是很简单的样本组合，即很像的正样本和很不像的负样本，且收敛慢，而且比classification更容overfitting。
因此，作者认为，让网络一直学习简单的样本，会限制网络的泛化能力。因此，作者提出了一种在线batch hard sample mining的改进版Triplet loss，大量实验表明，这种改进版的方法效果非常好。

个人理解

一、摘要

二、研究背景