**作者：**Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie

发布时间：2017

发布期刊：CVPR

**机构：**Facebool

论文全称：Feature Pyramid Networks for Object Detection

论文地址：https://arxiv.org/abs/1612.03144

代码：

https://github.com/facebookresearch/detectron

https://github.com/open-mmlab/mmdetection

地位：特征金字塔

一、摘要

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A topdown architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows signifificant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 6 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available

特征金字塔是识别系统中检测不同尺度物体的基本组成部分
最近的深度学习对象检测器避免了金字塔特征表示，部分原因是它们需要计算量和内存开销大
本文利用深度卷积网络固有的多尺度金字塔层次，构造了具有最低限度额外开销的特征金字塔。开发了一种具有横向连接的上下结构体系结构，用于构建所有尺度上的高级语义特征图。这种架构被称为特征金字塔网络(FPN)

二、研究背景

在目标检测中存在着多尺度问题（即图像中的大目标和小目标检测问题），即在物体检测里面，有限计算量情况下，网络的深度（对应到感受野）与 stride 通常是一对矛盾的东西，常用的网络结构对应的 stride 一般会比较大（如 32），而图像中的小物体甚至会小于 stride 的大小，造成的结果就是小物体的检测性能急剧下降，传统解决这个问题的思路主要有两种：

多尺度训练和测试，又称图像金字塔（如下图左所示），目前几乎所有在 ImageNet 和 COCO 检测任务上取得好成绩的方法都使用了图像金字塔方法，
1. **优点：**对每一种尺度的图像进行特征提取，能够产生多尺度的特征表示，并且所有等级的特征图都具有较强的语义信息，包括高分辨率的特征图。
2. **缺点：**推理时间大幅度增加，内存占用巨大，无法进行端到端的训练，难以在实际中应用
特征分层，即每层分别预测对应的 scale 分辨率的检测结果。（如下图右所示）
1. **优点：**产生不同空间分辨率的特征图
2. **缺点：**强行让不同层学习同样的语义信息，即引入了不同的深度而导致的较大的语义间隙，因为对于卷积神经网络而言，不同深度对应着不同层次的语义特征，浅层网络分辨率高，学的更多是细节特征，深层网络分辨率低，学的更多是语义特征