论文全称：Going deeper with convolutions

作者：Christian Szegedy，Wei Liu，Yangqing Jia，Pierre Sermane，Scott Reed，Dragomir Anguelov，Dumitru Erhan，Vincent Vanhoucke，Andrew Rabinovich

论文地址：https://arxiv.org/abs/1409.4842

单位：

发表机构：Arxiv

**发表年份：**2014

一、摘要

We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

设计出Inception v1，并使用该网络参加 ImageNet Large-Scale Visual Recognition Challenge 2014（ILSVRC2014）的分类和检测比赛，并取得了冠军
Inception模块在增加网络深度和宽度的同时减少参数量和计算量
本文用Inception模块构建的网络称为GoogLeNet，共22层

二、Introduction

在2011-2014年这三年间，计算机视觉得到快速的发展，这不仅得益于硬件，庞大的数据集和更庞大复杂的模型，更得益于新的模型思路和改进结构
GoogLeNet（ILSVRC2014比赛冠军）比AlexNet（ILSVRC2013比赛冠军）少12倍参数量，但更准确
在2014年时，当时比较厉害的目标检测方法就是由RBG（Girshick）大神提出的R-CNN
在构造更加优秀的模型时，不能一味的追求精度提升，更要兼顾计算效率，能耗，内存占用。
作者在提出Inception模块时的重要启发文献：
1. 《Network in network》这篇文献提到了使用1×1卷积进行降维或升维，使用Global Average pooling（全局平均池化层）取代全连接层
2. 《Provable Bounds for learning Some Deep Representations》这篇理论研究的文献指出可以使用稀疏、分散的网络取代庞大密集臃肿的网络