**作者：**Christian Szegedy，Vincent Vanhoucke，Sergey Ioffe，Jonathon Shlens，Zbigniew Wojna

发表期刊：Arxiv

**发表年份：**2015

论文全称：Rethinking the Inception Architecture for Computer Vision

论文地址：https://arxiv.org/abs/1512.00567

代码：https://github.com/pytorch/vision/blob/master/torchvision/models/inception.py

**地位：**本论文在GoogLeNet和BN-Inception的基础上，对Inception模块的结构、性能、参数量和计算效率进行了重新思考和重新设计。提出了Inception V2和Inception V3模型，取得了3.5%左右的Top-5错误率。

一、摘要

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational effificiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as effificiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classifification challenge validation set demonstrate substantial gains over the state of the art: 21*.*2% *top-1 and 5.*6% *top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.*5% *top-5 error and 17.*3% *top-*1 error.

非常深的卷积神经网络成为计算机视觉领域的主流（主要指2014年的VGG模型和GoogLeNet模型）
移动设备这种终端计算的设备（对功耗、延迟、带宽的要求比较高的设备）对于网络的要求是性能高、参数少、计算量小，本文就是通过使用分解卷积和大量正则化技术来提高深度卷积网络的性能以及减少其计算量

二、背景

CNN在计算机视觉领域的成功应用促进了对高性能CNN模块（例如Inception，resnet残差模块）本身的研究
好的分类模型可以迁移到各种计算机视觉领域应用，例如语义分割、目标检测、人体姿态估计、超分辨率等
- 这是因为计算机视觉的这些应用都需要靠CNN提取图像特征，即使用CNN作为骨干网络
VGG模型虽然模型简单，性能好，但是其架构非常臃肿，消耗算力大，大概有1亿3000万个参数