大数据机器学习实验室

【论文翻译】Fully Convolutional Networks for Semantic Segmentation

论文题目：Fully Convolutional Networks for Semantic Segmentation
论文来源:Fully Convolutional Networks for Semantic Segmentation_2015_CVPR
翻译人：BDML@CQUT实验室

Fully Convolutional Networks for Semantic Segmentation

用于语义分割的全卷积网络

Jonathan Long∗ Evan Shelhamer∗ Trevor Darrell

Abstract

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixelsto-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efﬁcient inference and learning. We deﬁne and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classiﬁcation networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by ﬁne-tuning [3] to the segmentation task. We then deﬁne a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, ﬁne layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one ﬁfth of a second for a typical image.

摘要

卷积网络是可以产生特征层次结构的强大的计算机视觉模型。我们发现卷积网络本身经过端到端、像素到像素的训练，在语义分割方面超过了最新技术。我们的主要观点是建立”可接受任意大小的输入并通过有效的推理和学习产生相应大小的输出“的“全卷积”的网络。我们定义和详细说明了全卷积网络的空间，解释了它们在空间密集的预测任务中的应用，并阐述了与先前模型的联系。我们将当代的分类网络（AlexNet [20]，VGG net [31]和GoogLeNet [32]）改造为完全卷积网络，并通过细调[3]将其学习的表示转移到分割任务中。然后我们定义了一个跳跃结构，该体系结构将来自较深的粗糙层的语义信息与来自较浅的精细层的外观信息相结合，以产生准确而详细的分割。我们的全卷积网络实现了PASCAL VOC（与2012年平均IU达到62.2％相比改进率为20％）、NYUDv2和SIFT Flow的最先进分割，而对于典型图像，推理所需时间不到五分之一秒。

1. Introduction

Convolutional networks are driving advances in recognition. Convnets are not only improving for whole-image classiﬁcation [20, 31, 32], but also making progress on local tasks with structured output. These include advances in bounding box object detection [29, 10, 17], part and keypoint prediction [39, 24], and local correspondence [24, 8].

The natural next step in the progression from coarse to ﬁne inference is to make a prediction at every pixel. Prior approaches have used convnets for semantic segmentation [27, 2, 7, 28, 15, 13, 9], in which each pixel is labeled with the class of its enclosing object or region, but with shortcomings that this work addresses.

We show that a fully convolutional network (FCN) trained end-to-end, pixels-to-pixels on semantic segmentation exceeds the state-of-the-art without further machinery. To our knowledge, this is the ﬁrst work to train FCNs end-to-end (1) for pixelwise prediction and (2) from supervised pre-training. Fully convolutional versions of existing networks predict dense outputs from arbitrary-sized inputs. Both learning and inference are performed whole-image-ata-time by dense feedforward computation and backpropagation. In-network upsampling layers enable pixelwise prediction and learning in nets with subsampled pooling.

This method is efﬁcient, both asymptotically and absolutely, and precludes the need for the complications in other works. Patchwise training is common [27, 2, 7, 28, 9], but lacks the efﬁciency of fully convolutional training. Our approach does not make use of pre- and post-processing complications, including superpixels [7, 15], proposals [15, 13], or post-hoc reﬁnement by random ﬁelds or local classiﬁers [7, 15]. Our model transfers recent success in classiﬁcation [20, 31, 32] to dense prediction by reinterpreting classiﬁcation nets as fully convolutional and ﬁne-tuning from their learned representations. In contrast, previous works have applied small convnets without supervised pre-training [7, 28, 27].

Semantic segmentation faces an inherent tension between semantics and location: global information resolves what while local information resolves where. Deep feature hierarchies encode location and semantics in a nonlinear local-to-global pyramid. We deﬁne a skip architecture to take advantage of this feature spectrum that combines deep, coarse, semantic information and shallow, ﬁne, appearance information in Section 4.2 (see Figure 3).

In the next section, we review related work on deep classiﬁcation nets, FCNs, and recent approaches to semantic segmentation using convnets. The following sections explain FCN design and dense prediction tradeoffs, introduce our architecture with in-network upsampling and multilayer combinations, and describe our experimental framework. Finally, we demonstrate state-of-the-art results on PASCAL VOC 2011-2, NYUDv2, and SIFT Flow.

1. 介绍

卷积网络正在推动识别技术的进步。卷积网络不仅改善了全图像分类[20，31，32]，而且在具有结构化输出的本地任务上也取得了进展。这些包括边界框对象检测[29、10、17]、部分和关键点预测[39、24]以及局部对应[24、8]方面的进步。

从粗略推理到精细推理的下一步自然就是要对每个像素进行预测。先前的方法已经使用卷积语义分割[27、2、7、28、15、13、9]，其中每个像素都用其封闭的对象或区域的类别标记，但是存在该工作要解决的缺点。

我们发现在语义分割上经过端到端、像素到像素训练的完全卷积网络（FCN）超过了最新的技术，而无需其他机制。据我们所知，这是端到端训练FCN的第一项工作（1）进行像素化预测；（2）进行有监督的预训练。现有网络的完全卷积版本可以预测任意大小输入的密集输出。通过密集的前馈计算和反向传播，学习和推理均在整个图像上进行。网络内上采样层可通过子采样池在网络中实现像素级预测和学习。

这种方法在渐近和绝对上都是有效的，并且排除了对其他工作的复杂性。逐批训练很常见[27、2、7、28、9]，但缺乏完全卷积训练的效率。我们的方法没有利用前后处理的复杂性，包括超像素[7，15]、建议[15，13]或通过随机字段或局部分类器进行事后细化[7，15]。通过将分类网络重新解释为完全卷积并根据其学习表示进行微调，我们的模型将分类[20，31，32]的最新成功转移到密集预测。相比之下，以前的工作在没有监督预训练的情况下应用了小型卷积网络[7，28，27]。

语义分割在语义和位置之间存在固有矛盾：全局信息解决了什么，局部信息又解决了什么。深度特征层次结构在非线性的局部到全局的金字塔中对语义和位置进行编码。在4.2节中，我们定义了一个跳跃体系结构，来充分利用这种结合了深层、粗略和语义的信息和浅层、精细、表征的信息的特征谱图（请参见图3）。

下一部分，我们将回顾有关深度分类网络、FCN和使用卷积语义分割的最新方法的相关工作。以下各节介绍了FCN设计和密集的预测权衡，介绍了具有网络内上采样和多层组合的体系结构，并描述了我们的实验框架。最后，我们展示了PASCAL VOC 2011-2、NYUDv2和SIFT Flow的最新结果。

2. Related work

Our approach draws on recent successes of deep nets for image classiﬁcation [20, 31, 32] and transfer learning [3, 38]. Transfer was ﬁrst demonstrated on various visual recognition tasks [3, 38], then on detection, and on both instance and semantic segmentation in hybrid proposalclassiﬁer models [10, 15, 13]. We now re-architect and ﬁne-tune classiﬁcation nets to direct, dense prediction of semantic segmentation. We chart the space of FCNs and situate prior models, both historical and recent, in this framework.

Fully convolutional networks To our knowledge, the idea of extending a convnet to arbitrary-sized inputs ﬁrst appeared in Matan et al. [26], which extended the classic LeNet [21] to recognize strings of digits. Because their net was limited to one-dimensional input strings, Matan et al. used Viterbi decoding to obtain their outputs. Wolf and Platt [37] expand convnet outputs to 2-dimensional maps of detection scores for the four corners of postal address blocks. Both of these historical works do inference and learning fully convolutionally for detection. Ning et al. [27] deﬁne a convnet for coarse multiclass segmentation of C. elegans tissues with fully convolutional inference.

Fully convolutional computation has also been exploited in the present era of many-layered nets. Sliding window detection by Sermanet et al. [29], semantic segmentation by Pinheiro and Collobert [28], and image restoration by Eigen et al. [4] do fully convolutional inference. Fully convolutional training is rare, but used effectively by Tompson et al. [35] to learn an end-to-end part detector and spatial model for pose estimation, although they do not exposit on or analyze this method.

Alternatively, He et al. [17] discard the nonconvolutional portion of classiﬁcation nets to make a feature extractor. They combine proposals and spatial pyramid pooling to yield a localized, ﬁxed-length feature for classiﬁcation. While fast and effective, this hybrid model cannot be learned end-to-end.

Dense prediction with convnets Several recent works have applied convnets to dense prediction problems, including semantic segmentation by Ning et al. [27], Farabet et al. [7], and Pinheiro and Collobert [28]; boundary prediction for electron microscopy by Ciresan et al. [2] and for natural images by a hybrid convnet/nearest neighbor model by Ganin and Lempitsky [9]; and image restoration and depth estimation by Eigen et al. [4, 5]. Common elements of these approaches include

small models restricting capacity and receptive ﬁelds;
patchwise training [27, 2, 7, 28, 9];
post-processing by superpixel projection, random ﬁeld regularization, ﬁltering, or local classiﬁcation [7, 2, 9];
input shifting and output interlacing for dense output [29, 28, 9];
multi-scale pyramid processing [7, 28, 9];
saturating tanh nonlinearities [7, 4, 28]; and
ensembles [2, 9],

whereas our method does without this machinery. However, we do study patchwise training 3.4 and “shift-and-stitch” dense output 3.2 from the perspective of FCNs. We also discuss in-network upsampling 3.3, of which the fully connected prediction by Eigen et al. [5] is a special case.

Unlike these existing methods, we adapt and extend deep classiﬁcation architectures, using image classiﬁcation as supervised pre-training, and ﬁne-tune fully convolutionally to learn simply and efﬁciently from whole image inputs and whole image ground thruths.

Hariharan et al. [15] and Gupta et al. [13] likewise adapt deep classiﬁcation nets to semantic segmentation, but do so in hybrid proposal-classiﬁer models. These approaches ﬁne-tune an R-CNN system [10] by sampling bounding boxes and/or region proposals for detection, semantic segmentation, and instance segmentation. Neither method is learned end-to-end. They achieve state-of-the-art segmentation results on PASCAL VOC and NYUDv2 respectively, so we directly compare our standalone, end-to-end FCN to their semantic segmentation results in Section 5.

We fuse features across layers to deﬁne a nonlinear localto-global representation that we tune end-to-end. In contemporary work Hariharan et al. [16] also use multiple layers in their hybrid model for semantic segmentation.

2. 相关工作

我们的方法借鉴了最近已成功的用于图像分类[20，31，32]和迁移学习[3，38]的深度网络。迁移这个概念首先应用于各种视觉识别任务，然后在检测领域也得到使用，并在混合融合proposal-classification模型中进行实例操作和语义分割操作[10, 15, 13]。首先在各种视觉识别任务上证明了转移[3，38]，然后在混合提议分类器模型[10，15，13]中在检测以及实例和语义分割上进行了演示。我们现在重新设计和微调分类网络来指导语义分割的密集预测。我们绘制了FCN的空间框架并在此框架中放置了过去和近期的一些模型。

全卷积网络 据我们所知，扩展到任意大小输入的想法首先是由Matan等人[26]提出。它扩展了经典的LeNet网络结构[21]来识别数字字符串(主要是手写体)。由于他们的网络仅限于一维的输入字符串，Matan等人使用维特比解码来获得它们的输出。Wolf和Platt [37]将邮箱地址输出扩展为邮政地址块四个角的检测分数的二维地图。这些历史操作都是为了检测而进行推理和学习的全卷积。Ning等人 [27]用完全卷积推断定义线虫组织的粗糙细胞的分类分割。

全卷积计算在当今许多的多层次网络中也有使用。比如Sermanet等人[29]的滑动窗口检测,Pinheiro和Collobert [28]的语义分割以及Eigen等人的图像恢复[4]都使用了全卷积推理。全卷积训练是很少见的，但Tompson等人[35]有效地使用了学习一种端到端的局部检测和姿态估计的空间模型方法。尽管他们不去解释或分析这种方法。

除此之外，He等人[17]丢弃分类网络的非卷积部分来制作特征提取器。他们将proposals和空间金字塔池合并在一起，以产生用于分类的本地化的固定长度特征。尽管快速且有效，但是这种混合模型不能进行端到端的学习。

基于卷积网的dense prediction 近期的一些工作已经将卷积网应用于dense prediction问题，其中包括Ning等人[27]的语义分割，Farabet等人[7]以及Pinheiro和Collobert[28]；Ciresan等人[2]的电子显微镜边界预测以及Ganin和Lempitsky[9]的通过混合卷积网和最邻近模型的处理自然场景图像;还有Eigen等人[4,5]的图像修复和深度估计。这些方法的相同点包括如下：

限制容量和接受范围的小模型;
patchwise 学习[27, 2, 7, 28, 9];
通过超像素投影、随机场正则化、滤波或局部分类进行后处理[7, 2, 9];
输入移位和dense输出的隔行交错输出[29, 28, 9];
多尺度金字塔处理[7, 28, 9];
饱和双曲线正切非线性[7, 4, 28];
集成[2, 9]

而我们的方法没有这个机制。但是，我们从FCN的角度来研究patchwise训练(3.4节)和“shift-and-stitch”dense输出(3.2节)。我们还讨论了网内上采样(3.3节)，其中Eigen等人[5]完全连接的预测是一个特例。

与这些现有的方法不同，我们使用图像分类作为监督式预训练来调整和扩展深度分类体系结构，并通过全卷积地进行微调以从整个图像输入和整个图像ground truths学习中简单高效地学习。

Hariharan等人[15]和Gupta等人[13]也改编深度分类网到语义分割，但是也在混合proposal-classifier模型中这么做了。这些方法通过采样边界框和region proposal进行微调了R-CNN系统[10],用于检测、语义分割和实例分割。这两种办法都不能进行端到端的学习。他们分别在PASCAL VOC和NYUDv2实现了最好的分割效果，所以在第5节中我们直接将我们的独立的、端到端的FCN和他们的语义分割结果进行比较。

我们将各个层的特征融合在一起来定义一个非线性局部到全局的表示，我们可以调整端到端来去协调。在现在的工作中Hariharan等人[16]在他们的混合模型中也使用多层进行语义分割。

3. Fully convolutional networks

Each layer of data in a convnet is a three-dimensional array of size h × w × d, where h and w are spatial dimensions, and d is the feature or channel dimension. The ﬁrst layer is the image, with pixel size h × w, and d color channels. Locations in higher layers correspond to the locations in the image they are path-connected to, which are called their receptive ﬁelds.

Convnets are built on translation invariance. Their basic components (convolution, pooling, and activation functions) operate on local input regions, and depend only on relative spatial coordinates. Writing for the data vector at location (i, j) in a particular layer, and for the following layer, these functions compute outputs y ij by

where k is called the kernel size, s is the stride or subsampling factor, and determines the layer type: a matrix multiplication for convolution or average pooling, a spatial max for max pooling, or an elementwise nonlinearity for an activation function, and so on for other types of layers.

This functional form is maintained under composition, with kernel size and stride obeying the transformation rule

While a general deep net computes a general nonlinear function, a net with only layers of this form computes a nonlinear ﬁlter, which we call a deep ﬁlter or fully convolutional network. An FCN naturally operates on an input of any size, and produces an output of corresponding (possibly resampled) spatial dimensions.

A real-valued loss function composed with an FCN deﬁnes a task. If the loss function is a sum over the spatial dimensions of the ﬁnal layer,

its gradient will be a sum over the gradients of each of its spatial components. Thus stochastic gradient descent on L computed on whole images will be the same as stochastic gradient descent on L, taking all of the ﬁnal layer receptive ﬁelds as a minibatch.

When these receptive ﬁelds overlap signiﬁcantly, both feedforward computation and backpropagation are much more efﬁcient when computed layer-by-layer over an entire image instead of independently patch-by-patch.

We next explain how to convert classiﬁcation nets into fully convolutional nets that produce coarse output maps. For pixelwise prediction, we need to connect these coarse outputs back to the pixels. Section 3.2 describes a trick, fast scanning [11], introduced for this purpose. We gain insight into this trick by reinterpreting it as an equivalent network modiﬁcation. As an efﬁcient, effective alternative, we introduce deconvolution layers for upsampling in Section 3.3. In Section 3.4 we consider training by patchwise sampling, and give evidence in Section 4.3 that our whole image training is faster and equally effective.

3. 全卷积网络

卷积网络中的每一层数据都是尺寸为h×w×d的三维数组，其中h和w是空间维度，d是特征或通道维度。第一层是图像，像素大小为h×w，以及d个颜色通道。较高层中的位置对应于它们路径连接的图像中的位置，这些位置称为它们的接受域。

卷积网络建立在平移不变性的基础上。它们的基本组成部分（卷积、池化和激活函数）在局部输入区域上运行，并且仅依赖于相对空间坐标。在特定层记为在坐标(i,j)的数据向量，在following layer有，计算公式如下:

其中k称为卷积核大小，s是步长或二次采样因子，决定图层类型：一个卷积的矩阵乘或者是平均池化，用于最大池的最大空间值或者是一个激励函数的一个非线性elementwise，亦或是层的其他种类等等。

当卷积核尺寸和步长遵从转换规则，这个函数形式被表述为如下形式：

虽然一般深度网络计算一般非线性函数，但只有这种形式的层的网络计算非线性滤波器，我们称之为深度滤波器或全卷积网络。FCN自然地对任何大小的输入进行操作，并产生相应的（可能重新采样的）空间维度的输出。

一个实值损失函数有FCN定义了task。如果损失函数是一个最后一层的空间维度总和，

，它的梯度将是它的每层空间组成梯度总和。所以在全部图像上的基于L的随机梯度下降计算将和基于l'的梯度下降结果一样，将最后一层的所有接收域作为minibatch（分批处理）。

在这些接收域重叠很大的情况下，前反馈计算和反向传播计算整图的叠层都比独立的patch-by-patch有效的多。

接下来我们将解释如何将分类网转换为生成粗略输出图的全卷积网。对于像素级预测，我们需要将这些粗略输出连接回像素。第3.2节描述了一个技巧，快速扫描[11]，为此目的而引入。我们通过将其重新解释为等效的网络修改来深入了解这一技巧。作为一种有效的替代方法，我们在3.3节介绍了用于上采样的去卷积层。在第3.4节中，我们考虑采用patchwise抽样进行训练，并在第4.3节中给出证据，来证明我们的整个图像训练速度更快且同样有效。

3.1. Adapting classiﬁers for dense prediction

Typical recognition nets, including LeNet [21], AlexNet [20], and its deeper successors [31, 32], ostensibly take ﬁxed-sized inputs and produce non-spatial outputs. The fully connected layers of these nets have ﬁxed dimensions and throw away spatial coordinates. However, these fully connected layers can also be viewed as convolutions with kernels that cover their entire input regions. Doing so casts them into fully convolutional networks that take input of any size and output classiﬁcation maps. This transformation is illustrated in Figure 2.

Furthermore, while the resulting maps are equivalent to the evaluation of the original net on particular input patches, the computation is highly amortized over the overlapping regions of those patches. For example, while AlexNet takes 1.2 ms (on a typical GPU) to infer the classiﬁcation scores of a 227×227 image, the fully convolutional net takes 22 ms to produce a 10×10 grid of outputs from a 500×500 image, which is more than 5 times faster than the na¨ıve approach 1 .

The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation. With ground truth available at every output cell, both the forward and backward passes are straightforward, and both take advantage of the inherent computational efﬁciency (and aggressive optimization) of convolution. The corresponding backward times for the AlexNet example are 2.4 ms for a single image and 37 ms for a fully convolutional 10 × 10 output map, resulting in a speedup similar to that of the forward pass.

While our reinterpretation of classiﬁcation nets as fully convolutional yields output maps for inputs of any size, the output dimensions are typically reduced by subsampling. The classiﬁcation nets subsample to keep ﬁlters small and computational requirements reasonable. This coarsens the output of a fully convolutional version of these nets, reducing it from the size of the input by a factor equal to the pixel stride of the receptive ﬁelds of the output units.

3.1. 适用分类器用于dense prediction

典型的识别网络，包括LeNet [21]、AlexNet [20]及其更深的继承者[31,32]，表面上采用固定大小的输入并产生非空间输出。这些网全连接的层具有固定的尺寸并丢弃空间坐标。然而这些完全连接的层也可以被视为与覆盖整个输入区域的内核的卷积。这样做将它们转换为完全卷积网络，可以输入任意大小和输出分类图。图2说明了这种转换。

此外，当作为结果的图在特殊的输入patches上等同于原始网络的估计，计算是高度摊销的在那些patches的重叠域上。例如当AlexNet花费了1.2ms（在标准的GPU上)推算一个227*227图像的分类得分，全卷积网络花费22ms从一张500*500的图像上产生一个10*10的输出网格，比朴素法快了5倍多。

这些卷积化模式的空间输出图可以作为一个很自然的选择对于dense问题，比如语义分割。每个输出单元ground truth可用，正推法和逆推法都是直截了当的，都利用了卷积的固有的计算效率(和可极大优化性)。对于AlexNet例子相应的逆推法的时间为单张图像时间2.4ms，全卷积的10*10输出图为37ms，结果是相对于顺推法速度加快了。

当我们将分类网络重新解释为任意输出尺寸的全卷积域输出图，输出维数也通过下采样显著的减少了。分类网络下采样使filter保持小规模同时计算要求合理。这使全卷积式网络的输出结果变得粗糙，通过输入尺寸因为一个和输出单元的接收域的像素步长等同的因素来降低它。

3.2. Shift-and-stitch is ﬁlter rarefaction

Dense predictions can be obtained from coarse outputs by stitching together output from shifted versions of the input. If the output is downsampled by a factor of f, shift the input x pixels to the right and y pixels down, once for every (x, y) s.t. 0 ≤ x, y < f. Process each of these f 2 inputs, and interlace the outputs so that the predictions correspond to the pixels at the centers of their receptive ﬁelds.

Although performing this transformation na¨ıvely increases the cost by a factor of f 2 , there is a well-known trick for efﬁciently producing identical results [11, 29] known to the wavelet community as the a` trous algorithm [25]. Consider a layer (convolution or pooling) with input stride s, and a subsequent convolution layer with ﬁlter weights f ij (eliding the irrelevant feature dimensions). Setting the lower layer’s input stride to 1 upsamples its output by a factor of s. However, convolving the original ﬁlter with the upsampled output does not produce the same result as shift-and-stitch, because the original ﬁlter only sees a reduced portion of its (now upsampled) input. To reproduce the trick, rarefy the ﬁlter by enlarging it as

(with i and j zero-based). Reproducing the full net output of the trick involves repeating this ﬁlter enlargement layerby-layer until all subsampling is removed. (In practice, this can be done efﬁciently by processing subsampled versions of the upsampled input.)

Decreasing subsampling within a net is a tradeoff: the ﬁlters see ﬁner information, but have smaller receptive ﬁelds and take longer to compute. The shift-and-stitch trick is another kind of tradeoff: the output is denser without decreasing the receptive ﬁeld sizes of the ﬁlters, but the ﬁlters are prohibited from accessing information at a ﬁner scale than their original design.

Although we have done preliminary experiments with this trick, we do not use it in our model. We ﬁnd learning through upsampling, as described in the next section, to be more effective and efﬁcient, especially when combined with the skip layer fusion described later on.

3.2. Shift-and-stitch是滤波稀疏的

Dense prediction能从粗糙输出中通过从输入的平移版本中将输出拼接起来获得。如果输出是因为一个因子f降低采样，平移输入的x像素到左边，y像素到下面，一旦对于每个(x,y)满足0<=x,y<=f.处理个输入，并将输出交错以便预测和它们接收域的中心像素一致。

尽管单纯地执行这种转换增加了f^2的这个因素的代价，有一个非常有名的技巧用来高效的产生完全相同的结果[11,29]，这个在小波领域被称为多孔算法[25]。考虑一个层（卷积或者池化）中的输入步长s,和后面的滤波权重为的卷积层（忽略不相关的特征维数）。设置更低层的输入步长到L上采样它的输出影响因子为s。然而将原始的滤波和上采样的输出卷积并没有产生和shift-and-stitch相同的结果，因为原始的滤波只看得到（已经上采样）输入的简化的部分。为了重现这种技巧，通过扩大来稀疏滤波，如下:

重现该技巧的全网输出需要重复一层一层放大这个filter知道所有的下采样被移除。（在练习中，处理上采样输入的下采样版本可能会更高效。）

在网内减少二次采样是一种折衷的做法：filter能看到更细节的信息，但是接受域更小而且需要花费很长时间计算。Shift-and-stitch技巧是另外一种折衷做法：输出更加密集且没有减小filter的接受域范围，但是相对于原始的设计filter不能感受更精细的信息。

尽管我们已经利用这个技巧做了初步的实验，但是我们没有在我们的模型中使用它。正如在下一节中描述的，我们发现从上采样中学习更有效和高效，特别是接下来要描述的结合了跨层融合。

3.3. Upsampling is backwards strided convolution

Another way to connect coarse outputs to dense pixels is interpolation. For instance, simple bilinear interpolation computes each output y ij from the nearest four inputs by a linear map that depends only on the relative positions of the input and output cells.

In a sense, upsampling with factor f is convolution with a fractional input stride of 1/f. So long as f is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f. Such an operation is trivial to implement, since it simply reverses the forward and backward passes of convolution. Thus upsampling is performed in-network for end-to-end learning by backpropagation from the pixelwise loss.

Note that the deconvolution ﬁlter in such a layer need not be ﬁxed (e.g., to bilinear upsampling), but can be learned. A stack of deconvolution layers and activation functions can even learn a nonlinear upsampling.

In our experiments, we ﬁnd that in-network upsampling is fast and effective for learning dense prediction. Our best segmentation architecture uses these layers to learn to upsample for reﬁned prediction in Section 4.2.

3.3. 上采样是后向卷积

将粗输出连接到密集像素的另一种方法是内插。例如，简单的双线性插值通过线性映射来计算来自最近四个输入的每个输出，线性映射仅依赖于输入单元和输出单元的相对位置。

从某种意义上讲，伴随因子f的上采样是对步长为1/f的分数式输入的卷积操作。.只要是整数，上采样的一种自然方法就是向后卷积（有时称为反卷积），其输出步幅为。这样的操作实现起来微不足道，因为它简单地反转了卷积的前进和后退过程。因此，上采样是在网络中进行的，通过从像素方向的损失向后传播进行端到端学习。

注意，这种层中的去卷积滤波器不需要是固定的（例如对于双线性上采样），但是可以被学习。一堆去卷积层和激活函数甚至可以学习非线性上采样。

在我们的实验中，我们发现网络上采样对于学习密集预测是快速有效的。我们最好的分段体系结构使用这些层来学习在4.2节中进行精确预测的上采样。

3.4. Patchwise training is loss sampling

In stochastic optimization, gradient computation is driven by the training distribution. Both patchwise training and fully convolutional training can be made to produce any distribution, although their relative computational efﬁciency depends on overlap and minibatch size. Whole image fully convolutional training is identical to patchwise training where each batch consists of all the receptive ﬁelds of the units below the loss for an image (or collection of images). While this is more efﬁcient than uniform sampling of patches, it reduces the number of possible batches. However, random selection of patches within an image may be recovered simply. Restricting the loss to a randomly sampled subset of its spatial terms (or, equivalently applying a DropConnect mask [36] between the output and the loss) excludes patches from the gradient computation.

If the kept patches still have signiﬁcant overlap, fully convolutional computation will still speed up training. If gradients are accumulated over multiple backward passes, batches can include patches from several images.2

Sampling in patchwise training can correct class imbalance [27, 7, 2] and mitigate the spatial correlation of dense patches [28, 15]. In fully convolutional training, class balance can also be achieved by weighting the loss, and loss sampling can be used to address spatial correlation.

We explore training with sampling in Section 4.3, and do not ﬁnd that it yields faster or better convergence for dense prediction. Whole image training is effective and efﬁcient.

3.4. Patchwise训练是一种损失采样

在随机优化中，梯度计算是由训练分布支配的。Patchwise 训练和全卷积训练能被用来产生任意分布，尽管他们相对的计算效率依赖于重叠域和minibatch的大小。在每一个由所有的单元接受域组成的批次在图像的损失之下（或图像的集合）整张图像的全卷积训练等同于patchwise训练。当这种方式比patches的均匀取样更加高效的同时，它减少了可能的批次数量。然而在一张图片中随机选择patches可能更容易被重新找到。限制基于它的空间位置随机取样子集产生的损失（或者可以说应用输入和输出之间的DropConnect mask [36] ）排除来自梯度计算的patches。

如果保存下来的patches依然有重要的重叠，全卷积计算依然将加速训练。如果梯度在多重逆推法中被积累，batches能包含几张图的patches。

Patcheswise训练中的采样能纠正分类失调 [27,7,2] 和减轻密集空间相关性的影响[28,15]。在全卷积训练中，分类平衡也能通过给损失赋权重实现，对损失采样能被用来标识空间相关。

我们研究了4.3节中的伴有采样的训练，没有发现对于dense prediction它有更快或是更好的收敛效果。全图式训练是有效且高效的。

4. 分割架构

我们将ILSVRC分类投射到FCN中，并将它们用于网络上采样和像素损失的密集预测。我们通过微调分割进行训练。接下来，我们在图层之间添加跨层来融合粗略，语义和局部的外观信息。这种跨越式的结构可以端到端地学习来改进输出的语义和空间精度。

为了这项调查，我们为PASCAL VOC 2011分割挑战赛来进行训练和验证。我们用逐像素多项式逻辑损失进行训练，并用联合的平均像素交叉点的标准度量来验证，其中包括背景在内的所有类别的均值。该训练忽略在groud truth实况中被掩盖（模棱两可或很难辨认）的像素。

4.1. From classiﬁer to dense FCN

We begin by convolutionalizing proven classiﬁcation architectures as in Section 3. We consider the AlexNet 3 architecture [20] that won ILSVRC12, as well as the VGG nets [31] and the GoogLeNet 4 [32] which did exceptionally well in ILSVRC14. We pick the VGG 16-layer net 5 , which we found to be equivalent to the 19-layer net on this task. For GoogLeNet, we use only the ﬁnal loss layer, and improve performance by discarding the ﬁnal average pooling layer. We decapitate each net by discarding the ﬁnal classiﬁer layer, and convert all fully connected layers to convolutions. We append a 1 × 1 convolution with channel dimension 21 to predict scores for each of the PASCAL classes (including background) at each of the coarse output locations, followed by a deconvolution layer to bilinearly upsample the coarse outputs to pixel-dense outputs as described in Section 3.3. Table 1 compares the preliminary validation results along with the basic characteristics of each net. We report the best results achieved after convergence at a ﬁxed learning rate (at least 175 epochs).

Fine-tuning from classiﬁcation to segmentation gave reasonable predictions for each net. Even the worst model achieved ∼ 75% of state-of-the-art performance. The segmentation-equipped VGG net (FCN-VGG16) already appears to be state-of-the-art at 56.0 mean IU on val, compared to 52.6 on test [15]. Training on extra data raises FCN-VGG16 to 59.4 mean IU and FCN-AlexNet to 48.0 mean IU on a subset of val 7 . Despite similar classiﬁcation accuracy, our implementation of GoogLeNet did not match the VGG16 segmentation result.

4.1. 从分类器到密集的FCN

我们首先对第三部分中经过验证的分类体系结构进行卷积处理。我们认为赢得ILSVRC12的AlexNet3体系结构[31]，以及在ILSVRC14中的VGG网络[32]和GoogLeNet4 [35]做的很不错。我们选择了VGG的16层net5，我们发现它等同于19层网络的分类效果。对于GoogLeNet，我们只使用最终的损失层，并通过丢弃最后的平均池化层来提高性能。我们通过丢弃最终的分类器层来斩断每个网络的开始，并将所有的全连接层转换为卷积。我们在信道维数21上附加1×1卷积来预测每个粗略输出位置的每个PASCAL类别（包括背景）的分数，然后是一个去卷积层，将粗略输出双线性上采样为像所描述的像素密集输出在3.3节中。表1比较了初步验证结果和每个网络的基本特征。我们发现以固定学习率（至少175个epochs）收敛后取得的最佳成果。

从分类到分割的微调给每个网络提供了合理的预测。即使是最糟糕的模型也达到了75％的表现。配备分段的VGG网络（FCN-VGG16）已经在表1中。我们修改和扩展了三个分类网格。我们通过PASCAL VOC 2011验证集上的均值交叉点平均交叉比和推理时间（在NVIDIA Tesla K40c上对500×500输入进行20次试验的平均值）比较性能。我们在密集预测方面详细介绍了适应网络的结构：参数层的数量，输出单元的接受场大小和网内最粗糙的步幅。（这些数字能够以固定的学习速度获得最佳性能，而不是最佳性能。）

4.2. Combining what and where

We deﬁne a new fully convolutional net (FCN) for segmentation that combines layers of the feature hierarchy and reﬁnes the spatial precision of the output. See Figure 3.

While fully convolutionalized classiﬁers can be ﬁnetuned to segmentation as shown in 4.1, and even score highly on the standard metric, their output is dissatisfyingly coarse (see Figure 4). The 32 pixel stride at the ﬁnal prediction layer limits the scale of detail in the upsampled output.

We address this by adding skips [1] that combine the ﬁnal prediction layer with lower layers with ﬁner strides. This turns a line topology into a DAG, with edges that skip ahead from lower layers to higher ones (Figure 3). As they see fewer pixels, the ﬁner scale predictions should need fewer layers, so it makes sense to make them from shallower net outputs. Combining ﬁne layers and coarse layers lets the model make local predictions that respect global structure. By analogy to the jet of Koenderick and van Doorn [19], we call our nonlinear feature hierarchy the deep jet.

We ﬁrst divide the output stride in half by predicting from a 16 pixel stride layer. We add a 1 × 1 convolution layer on top of pool4 to produce additional class predictions. We fuse this output with the predictions computed on top of conv7 (convolutionalized fc7) at stride 32 by adding a 2× upsampling layer and summing 6 both predictions (see Figure 3). We initialize the 2× upsampling to bilinear interpolation, but allow the parameters to be learned as described in Section 3.3. Finally, the stride 16 predictions are upsampled back to the image. We call this net FCN-16s. FCN-16s is learned end-to-end, initialized with the parameters of the last, coarser net, which we now call FCN-32s. The new parameters acting on pool4 are zeroinitialized so that the net starts with unmodiﬁed predictions. The learning rate is decreased by a factor of 100.

Learning this skip net improves performance on the validation set by 3.0 mean IU to 62.4. Figure 4 shows improvement in the ﬁne structure of the output. We compared this fusion with learning only from the pool4 layer, which resulted in poor performance, and simply decreasing the learning rate without adding the skip, which resulted in an insigniﬁcant performance improvement without improving the quality of the output.

We continue in this fashion by fusing predictions from pool3 with a 2× upsampling of predictions fused from pool4 and conv7, building the net FCN-8s. We obtain a minor additional improvement to 62.7 mean IU, and ﬁnd a slight improvement in the smoothness and detail of our output. At this point our fusion improvements have met diminishing returns, both with respect to the IU metric which emphasizes large-scale correctness, and also in terms of the improvement visible e.g. in Figure 4, so we do not continue fusing even lower layers.

Reﬁnement by other means Decreasing the stride of pooling layers is the most straightforward way to obtain ﬁner predictions. However, doing so is problematic for our VGG16-based net. Setting the pool5 stride to 1 requires our convolutionalized fc6 to have kernel size 14 × 14 to maintain its receptive ﬁeld size. In addition to their computational cost, we had difﬁculty learning such large ﬁlters. We attempted to re-architect the layers above pool5 with smaller ﬁlters, but did not achieve comparable performance; one possible explanation is that the ILSVRC initialization of the upper layers is important.

Another way to obtain ﬁner predictions is to use the shiftand-stitch trick described in Section 3.2. In limited experiments, we found the cost to improvement ratio from this method to be worse than layer fusion.

4.2. 结合什么和结合哪里

我们为分割定义了一个新的全卷积网络（FCN），它结合了特征层次结构的层次并提高了输出的空间精度。参见图3。

虽然全卷积化的分类器可以像4.1中所示的那样进行细化分割，甚至在标准度量上得分很高，但它们的输出却非常粗糙（见图4）。最终预测层的32像素跨度限制了上采样输出的尺寸的细节范围。

我们提出增加结合了最后预测层和有更细小步长的更低层的跨层信息[1]，将一个线划拓扑结构转变成DAG(有向无环图)，并且边界将从更底层向前跳跃到更高（图3）。因为它们只能获取更少的像素点，更精细的尺寸预测应该需要更少的层，所以从更浅的网中将它们输出是有道理的。结合了精细层和粗糙层让模型能做出遵从全局结构的局部预测。与Koenderick 和an Doorn [19]的jet类似，我们把这种非线性特征层称之为deep jet。

我们首先根据16个像素的步幅层进行预测使输出步幅分为两半。我们在pool4顶部添加一个1×1卷积层，以产生其他类别预测。通过将2x上采样层相加并加总这6个预测，我们将此输出与在第32步在conv7（卷积化的fc7）顶部计算的预测相融合（请参见图3）。我们将2x上采样初始化为双线性插值，但允许按照第3.3节中的描述学习参数。最后，将步幅16的预测上采样回图像。我们称此为FCN-16s。通过端到端学习FCN-16，并使用最后一个更粗糙的网络（现在称为FCN-32）的参数进行初始化。作用于pool4的新参数将被初始化为零，从而使网络以未修改的预测开始。学习率降低了100倍。

学习此跳跃网络可将验证集的性能提高3.0，平均IU达到62.4。图4显示了输出的精细结构的改进。我们将这种融合与仅从pool4层进行的学习进行了比较，这导致性能较差，并且仅在不添加跳过的情况下降低了学习率，从而在不提高输出质量的情况下显着提高了性能。

我们以这种方式继续进行工作，将pool3中的预测与pool4和conv7中的预测进行2倍的上采样融合，构建净FCN-8。我们将平均IU值略微提高了62.7，并在输出的平滑度和细节方面略有提高。在这一点上，我们的融合改进遇到了收益递减的问题，无论是在强调大规模正确性的IU度量方面，还是在可见的改进方面，例如在图4中，因此我们不会继续融合甚至更低的层。

通过其他方式进行优化 降低合并层的步幅是获得精细预测的最直接方法。但是这样做对于基于VGG16的网络来说是有问题的。将pool5的跨度设置为1要求我们卷积的fc6具有14×14的内核大小，以保持其接收域大小。除了它们的计算成本外，我们还很难学习这么大的滤波器。我们试图用较小的过滤器重新构造pool5之上的层，但是没有达到可比的性能。一种可能的解释是高层的ILSVRC初始化很重要。

获得精细预测的另一种方法是使用第3.2节中描述的移位和缝合技巧。在有限的实验中，我们发现此方法的成本改进率比层融合差。

4.3. Experimental framework

Optimization We train by SGD with momentum. We use a minibatch size of 20 images and ﬁxed learning rates of 10 −3 , 10 −4 , and 5 −5 for FCN-AlexNet, FCN-VGG16, and FCN-GoogLeNet, respectively, chosen by line search. We use momentum 0.9, weight decay of 5 −4 or 2 −4 , and doubled learning rate for biases, although we found training to be sensitive to the learning rate alone. We zero-initialize the class scoring layer, as random initialization yielded neither better performance nor faster convergence. Dropout was included where used in the original classiﬁer nets.

Fine-tuning We ﬁne-tune all layers by backpropagation through the whole net. Fine-tuning the output classiﬁer alone yields only 70% of the full ﬁne-tuning performance as compared in Table 2. Training from scratch is not feasible considering the time required to learn the base classiﬁcation nets. (Note that the VGG net is trained in stages, while we initialize from the full 16-layer version.) Fine-tuning takes three days on a single GPU for the coarse FCN-32s version, and about one day each to upgrade to the FCN-16s and FCN-8s versions.

More Training Data The PASCAL VOC 2011 segmentation training set labels 1112 images. Hariharan et al. [14] collected labels for a larger set of 8498 PASCAL training images, which was used to train the previous state-of-the-art system, SDS [15]. This training data improves the FCNVGG16 validation score 7 by 3.4 points to 59.4 mean IU.

Patch Sampling As explained in Section 3.4, our full image training effectively batches each image into a regular grid of large, overlapping patches. By contrast, prior work randomly samples patches over a full dataset [27, 2, 7, 28, 9], potentially resulting in higher variance batches that may accelerate convergence [22]. We study this tradeoff by spatially sampling the loss in the manner described earlier, making an independent choice to ignore each ﬁnal layer cell with some probability 1 − p. To avoid changing the effective batch size, we simultaneously increase the number of images per batch by a factor 1/p. Note that due to the efﬁciency of convolution, this form of rejection sampling is still faster than patchwise training for large enough values of p (e.g., at least for p > 0.2 according to the numbers in Section 3.1). Figure 5 shows the effect of this form of sampling on convergence. We ﬁnd that sampling does not have a signiﬁcant effect on convergence rate compared to whole image training, but takes signiﬁcantly more time due to the larger number of images that need to be considered per batch. We therefore choose unsampled, whole image training in our other experiments.

Class Balancing Fully convolutional training can balance classes by weighting or sampling the loss. Although our labels are mildly unbalanced (about 3/4 are background), we ﬁnd class balancing unnecessary.

Dense Prediction The scores are upsampled to the input dimensions by deconvolution layers within the net. Final layer deconvolutional ﬁlters are ﬁxed to bilinear interpolation, while intermediate upsampling layers are initialized to bilinear upsampling, and then learned.

Augmentation We tried augmenting the training data by randomly mirroring and “jittering” the images by translating them up to 32 pixels (the coarsest scale of prediction) in each direction. This yielded no noticeable improvement.

Implementation All models are trained and tested with Caffe [18] on a single NVIDIA Tesla K40c. Our models and code are publicly available at http://fcn.berkeleyvision.org.

4.3. 实验框架

优化我们通过SGD进行有动力的培训。对于行搜索选择的FCN-AlexNet，FCN-VGG16和FCN-GoogLeNet，我们分别使用20张图像的小批量大小和，和的固定学习率。尽管我们发现训练仅对学习速率敏感，但我们使用动量0.9，重量衰减或以及将学习速率提高了一倍。我们将类计分层初始化为零，因为随机初始化既不会产生更好的性能，也不会带来更快的收敛。Dropout被包含在用于原始分类的网络中。

微调我们通过整个网络的反向传播对所有层进行微调。与表2相比，仅对输出分类器进行微调就只能产生全部微调性能的70％。从零开始的培训考虑到学习基础分类网所需的时间是不可行的。（请注意：VGG网络是分阶段训练的，而我们是从完整的16层版本进行初始化的。）对于粗略的FCN-32s版本，在单个GPU上进行微调需要三天，而升级到FGG-32s版本则需要大约一天。FCN-16s和FCN-8s版本。

更多训练数据 PASCAL VOC 2011细分培训设置了1112张图像标签。Hariharan等人[14]收集了用于更大数量的8498 PASCAL训练图像的标签，这些图像用于训练以前的最新系统SDS [15]。训练数据将FCV-VGG16得分提高了3.4个百分点到59.4。

补丁采样 如第3.4节所述，我们的完整图像训练可以将每个图像有效地批量成一个大的，重叠的补丁的规则网格。相比之下，先前的工作在整个数据集上随机采样补丁[27、2、7、28、9]，可能会导致更高的方差批次，从而可能加速收敛[22]。我们通过以前面描述的方式对损失进行空间采样来研究这种折衷，并做出独立选择，以1 − p的概率忽略每个最终层单元。为了避免更改有效的批次大小，我们同时将每批次的图像数量增加了1 / p。请注意，由于卷积效率高，对于足够大的p值（例如，至少根据3.1节中的p> 0.2而言），这种形式的拒绝采样仍比分片训练更快。图5显示了这种形式的抽样对收敛的影响。我们发现，与整个图像训练相比，采样对收敛速度没有显着影响，但是由于每批需要考虑的图像数量更多，因此采样花费的时间明显更多。因此我们在其他实验中选择未采样的整体图像训练。

类平衡 完全卷积训练可以通过加权或采样损失来平衡类。尽管我们的标签略有不平衡（大约是背景的3/4），但我们发现类平衡不是必要的。

密集预测 通过网络中的反卷积层将分数上采样到输入维度。最终层反卷积滤波器固定为双线性插值，而中间上采样层则初始化为双线性上采样然后学习。

增强我们试图通过随机镜像和“抖动”图像来增强训练数据，方法是将图像在每个方向上最多转换为32个像素（最粗的预测比例）。这没有产生明显的改善。

实施所有模型都在单个NVIDIA Tesla K40c上使用Caffe [18]进行了培训和测试。我们的模型和代码可在http://fcn.berkeleyvision.org上公开获得。

5. Results

We test our FCN on semantic segmentation and scene parsing, exploring PASCAL VOC, NYUDv2, and SIFT Flow. Although these tasks have historically distinguished between objects and regions, we treat both uniformly as pixel prediction. We evaluate our FCN skip architecture on each of these datasets, and then extend it to multi-modal input for NYUDv2 and multi-task prediction for the semantic and geometric labels of SIFT Flow.

Metrics We report four metrics from common semantic segmentation and scene parsing evaluations that are variations on pixel accuracy and region intersection over union (IU). Let be the number of pixels of class i predicted to belong to class j, where there are different classes, and let be the total number of pixels of class i. We compute:

PASCAL VOC Table 3 gives the performance of our FCN-8s on the test sets of PASCAL VOC 2011 and 2012, and compares it to the previous state-of-the-art, SDS [15], and the well-known R-CNN [10]. We achieve the best results on mean IU 8 by a relative margin of 20%. Inference time is reduced 114× (convnet only, ignoring proposals and reﬁnement) or 286× (overall).

NYUDv2 [30] is an RGB-D dataset collected using the Microsoft Kinect. It has 1449 RGB-D images, with pixelwise labels that have been coalesced into a 40 class semantic segmentation task by Gupta et al. [12]. We report results on the standard split of 795 training images and 654 testing images. (Note: all model selection is performed on PASCAL 2011 val.) Table 4 gives the performance of our model in several variations. First we train our unmodiﬁed coarse model (FCN-32s) on RGB images. To add depth information, we train on a model upgraded to take four-channel RGB-D input (early fusion). This provides little beneﬁt, perhaps due to the difﬁcultly of propagating meaningful gradients all the way through the model. Following the success of Gupta et al. [13], we try the three-dimensional HHA encoding of depth, training nets on just this information, as well as a “late fusion” of RGB and HHA where the predictions from both nets are summed at the ﬁnal layer, and the resulting two-stream net is learned end-to-end. Finally we upgrade this late fusion net to a 16-stride version.

SIFT Flow is a dataset of 2,688 images with pixel labels for 33 semantic categories (“bridge”, “mountain”, “sun”), as well as three geometric categories (“horizontal”, “vertical”, and “sky”). An FCN can naturally learn a joint representation that simultaneously predicts both types of labels. We learn a two-headed version of FCN-16s with semantic and geometric prediction layers and losses. The learned model performs as well on both tasks as two independently trained models, while learning and inference are essentially as fast as each independent model by itself. The results in Table 5, computed on the standard split into 2,488 training and 200 test images, 9 show state-of-the-art performance on both tasks.

5. 结果

我们测试FCN的语义分割和场景解析，探索PASCAL VOC，NYUDv2和SIFT Flow。尽管这些任务历来在对象和区域之间有所区别，但我们将两者均视为像素预测。我们在每个数据集上评估我们的FCN跳过体系结构，然后将其扩展到NYUDv2的多模式输入，以及SIFT Flow的语义和几何标签的多任务预测。

指标我们采用了来自常见语义分割和场景解析评估的四个指标，它们是像素精度和联合区域交集（IU）的变化。令为预测属于类别j的类别i的像素数，其中有个不同的类别，令为类别的总数。类I的像素。我们计算：

PASCAL VOC 表3给出了FCN-8在PASCAL VOC 2011和2012测试台上的性能，并将其与以前的最新SDS [15]和性能良好的产品进行了比较。已知的R-CNN [10]。我们在平均IU 8上实现了20％的相对裕度的最佳结果。推理时间减少了114倍（仅convnet，不考虑建议和完善）或286倍（总体）。

NYUDv2[30] 是使用Microsoft Kinect收集的RGB-D数据集。它具有1449个RGB-D图像，其像素点标签已由Gupta等人合并为40类语义分割任务[12]。我们报告了795张训练图像和654张测试图像的标准分割结果。（注意：所有模型的选择均在PASCAL 2011 val上进行。）表4给出了几种模型的性能。首先，我们在RGB图像上训练未修改的粗略模型（FCN-32s）。为了增加深度信息，我们训练了一个升级后的模型以采用四通道RGB-D输入（早期融合）。这几乎没有好处，这可能是由于很难在模型中一直传播有意义的梯度。继Gupta等人[13]的成功，我们尝试对深度进行三维HHA编码，仅在此信息上训练网络，以及RGB和HHA的“后期融合”，其中来自两个网络的预测在最终层相加，并得出两流网络是端到端学习的。最后我们将这个后期的融合网升级到16步的版本。

SIFT Flow 是一个包含2688个图像的数据集，带有33个语义类别（“桥”、“山”、“太阳”）以及三个几何类别（“水平”、“垂直”和“ 天空”）。FCN可以自然地学习可以同时预测两种标签类型的联合表示。我们学习了带有语义和几何预测层以及损失的FCN-16的两头版本。学习的模型在两个任务上的表现都好于两个独立训练的模型，而学习和推理在本质上与每个独立模型一样快。表5中的结果根据标准分为2488个训练图像和200张测试图像进行了计算，其中9个显示了这两项任务的最新性能。

6. Conclusion

Fully convolutional networks are a rich class of models, of which modern classiﬁcation convnets are a special case. Recognizing this, extending these classiﬁcation nets to segmentation, and improving the architecture with multi-resolution layer combinations dramatically improves the state-of-the-art, while simultaneously simplifying and speeding up learning and inference.

Acknowledgements This work was supported in part by DARPA’s MSEE and SMISC programs, NSF awards IIS1427425, IIS-1212798, IIS-1116411, and the NSF GRFP, Toyota, and the Berkeley Vision and Learning Center. We gratefully acknowledge NVIDIA for GPU donation. We thank Bharath Hariharan and Saurabh Gupta for their advice and dataset tools. We thank Sergio Guadarrama for reproducing GoogLeNet in Caffe. We thank Jitendra Malik for his helpful comments. Thanks to Wei Liu for pointing out an issue wth our SIFT Flow mean IU computation and an error in our frequency weighted mean IU formula.

6. 结论

完全卷积网络是一类丰富的模型，现代分类卷积就是其中的特例。认识到这一点，将这些分类网络扩展到分段，并通过多分辨率图层组合改进体系结构，可以极大地改善现有技术，同时简化并加快学习和推理速度。

致谢这项工作得到了DARPA的MSEE和SMISC计划的部分支持，NSF授予IIS1427425、IIS-1212798、IIS-1116411以及NSF GRFP、丰田和伯克利视觉与学习中心。我们非常感谢NVIDIA对GPU的捐赠。感谢Bharath Hariharan和Saurabh Gupta的建议和数据集工具。感谢Sergio Guadarrama在Caffe中复制GoogLeNet。我们感谢Jitendra Malik的有益评论。感谢Liu Wei指出了我们的SIFT Flow平均IU计算问题以及我们的频率加权平均IU公式中的错误。

你可能感兴趣的:(【论文翻译】Fully Convolutional Networks for Semantic Segmentation)

AAAI2024论文解读|Memory-Efficient Reversible Spiking Neural Networks-water-merged paixiaoxin 文献阅读论文合集脉冲神经网络可逆架构内存效率深度学习训练优化 AAAI
论文标题Memory-EfficientReversibleSpikingNeuralNetworks内存高效可逆脉冲神经网络论文链接Memory-EfficientReversibleSpikingNeuralNetworks论文下载论文作者HongZhang,YuZhang内容简介本文提出了一种可逆脉冲神经网络（RevSNN），旨在降低脉冲神经网络（SNNs）在训练过程中对中间激活和膜电位的内
深入详解神经网络的基础知识、工作原理以及应用【一】猿享天开人工智能基础知识学习深度学习神经网络人工智能
目录引言1.神经网络基础1.1感知器模型1.2多层感知器（MLP）示例：2.前馈神经网络（FeedforwardNeuralNetworks,FFNN）2.1结构与特点2.2训练过程2.3优化方法3.卷积神经网络（CNN）3.1基本概念3.2层类型3.3网络架构3.4应用领域3.5示例代码示例描述：4.循环神经网络（RNN）4.1基本概念4.2RNN结构4.3应用领域4.4示例代码示例描述：5.深
docker-compose 部署Kong、PG、Konga qiandeqiande docker kong 容器
version:'2'networks:kong-net:driver:bridgeservices:kong-database:image:postgres:9.6container_name:kong-databaserestart:alwaysnetworks:-kong-netenvironment:POSTGRES_USER:kongPOSTGRES_DB:kongPOSTGRES_PA
SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain 于初见月 paper 计算机视觉
AbstractWiththeemergenceofGaussianSplats,recenteffortshavefocusedonlarge-scalescenegeometricreconstruction.However,mostoftheseeffortseitherconcentrateonmemoryreductionorspatialspacedivision,neglecting
用GANs生成艺术作品的创新探索：人工智能与艺术的奇妙碰撞 Echo_Wish 前沿技术人工智能人工智能 gan python
用GANs生成艺术作品的创新探索：人工智能与艺术的奇妙碰撞随着人工智能技术的飞速发展，生成对抗网络（GenerativeAdversarialNetworks，GANs）在图像生成、视频生成、音频合成等领域展现出了惊人的创造力。特别是在艺术创作方面，GANs以其独特的生成能力，为艺术家和创作者提供了新的灵感和工具。本文将探讨GANs在艺术作品生成中的应用与创新，并通过具体代码示例展示其实现过程。一
docker起的geoserver怎么访问docker启动的postgis数据库 niuge No.1 docker 数据库 eureka
背景：本地docker启动geoserver、pg数据库，pg数据库安装postgis拓展。第一步：查询docker启动的容器dockerps-a第二步：查询应用的ip如我的docker容器名称是第一步查询的postgres，那么查询的命令是：dockerinspect-f"{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}"postgr
ECCV 2024 | CC-SAM：用于超声图像分割的跨特征注意力和上下文的SAM 小白学视觉计算机顶会论文解读人工智能 ccf-a ECCV 计算机顶会深度学习
论文信息题目：CC-SAM:SAMwithCross-featureAttentionandContextforUltrasoundImageSegmentationCC-SAM：用于超声图像分割的跨特征注意力和上下文的SAM作者：ShreyankNGowda和DavidA.Clifton论文创新点变分注意力融合模块（VariationalAttentionFusionModule）：作者提出了一
论文笔记 U-Net: Convolutional Networks for Biomedical Image Segmentation 城南皮卡丘 #深度学习 caffe 人工智能
摘要：人们普遍认为，深度网络的成功训练需要数千个带注释的训练样本。在本文中，我们提出了一种网络和训练策略，该策略依赖于大量使用数据增强来更有效地使用可用的注释样本。该体系结构包括用于捕获上下文的收缩路径和用于实现精确定位的对称扩展路径。我们表明，这样的网络可以从很少的图像进行端到端训练，并且在ISBI挑战中优于先前的最佳方法（滑动窗口卷积网络）,用于分割电子显微堆栈中的神经元结构。使用在透射光显微
使用Neo4j-Semantic-Ollama构建智能交互代理 safHTEAHE neo4j oracle 数据库 python
在现代应用中，结合图数据库与语义层的智能代理能够极大提升数据交互的效率和灵活性。本文将展示如何使用Neo4j-Semantic-Ollama模板来开发一个能够通过语义层与Neo4j图数据库交互的智能代理。我们将关注其如何利用Mixtral作为JSON格式的代理，通过用户意图与数据库进行交互。技术背景介绍Neo4j是一种高性能的图数据库，它适合于处理结构复杂的数据。在此基础上，加入语义层可以让应用程
深入解析CANN算子开发：TBE与AI CPU算子类型及其开发方法全指南快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能
深入解析CANN算子开发：TBE与AICPU算子类型及其开发方法全指南在现代AI计算领域中，高效的算子开发对于优化深度学习模型的推理与训练至关重要。CANN（ComputeArchitectureforNeuralNetworks）作为华为AscendAI处理器的开发平台，提供了两种类型的算子开发支持：TBE算子和AICPU算子。每种算子类型针对不同的计算任务和硬件架构，开发者需要根据具体场景选择
基于TSN的实时通信网络延迟评估技术神一样的老师论文阅读分享网络
论文标题：ATSN-basedTechniqueforReal-TimeLatencyEvaluationinCommunicationNetworks作者信息：AlbertoMorato,ClaudioZunino,ManuelCheminod,StefanoVitturi，来自意大利国家研究委员会，CNR-IEIIT。电子邮件:{alberto.morato,claudio.zunino,ma
基于TSN的混合5G网络中的确定性通信研究需求与综述神一样的老师论文阅读分享网络
ResearchDemandandReviewonDeterministicCommunicationinHybrid5GnetworksbasedonTSNMahmoudAlqudahUniversityofSiegenSiegen,Germanymahmoud.alqudah@uni-siegen.deRomanObermaisserUniversityofSiegenSiegen,Germa
VYOS容器运行Carbonyl 终端浏览器 GTaylor carbonyl 终端浏览器无处不容器 VYOS容器
GitHubhttps://github.com/fathyb/carbonyl手动运行版addcontainerimagefathyb/carbonylsetcontainernamecarbonylallow-host-networkssetcontainernamecarbonylcap-add'net-admin'setcontainernamecarbonylcap-add'net-bi
与机器学习的邂逅--自适应神经网络结构的深度解析想成为高手499 机器学习与人工智能机器学习神经网络人工智能
引言随着人工智能的发展，神经网络已成为许多应用领域的重要工具。自适应神经网络（AdaptiveNeuralNetworks，ANN）因其出色的学习能力和灵活性，逐渐成为研究的热点。本文将详细探讨自适应神经网络的基本概念、工作原理、关键技术、C++实现示例及其应用案例，最后展望未来的发展趋势。自适应神经网络的基本概念什么是自适应神经网络？自适应神经网络是一种能够根据输入数据的变化和环境的动态特性自动
自适应神经网络架构：原理解析与代码示例 chian-ocean 机器学习神经网络人工智能深度学习
个人主页：chian-ocean文章专栏自适应神经网络结构：深入探讨与代码实现1.引言随着深度学习的不断发展，传统神经网络模型在处理复杂任务时的局限性逐渐显现。固定的网络结构和参数对于动态变化的环境和多样化的数据往往难以适应，导致了过拟合或欠拟合的问题。自适应神经网络（AdaptiveNeuralNetworks,ANN）为此提供了一种新的解决方案，它可以根据数据特征和训练情况自动调整网络结构，从
【Block总结】CDFA，对比驱动特征聚合模块|即插即用，极大增强特征表达！ AI浩 Block总结计算机视觉深度学习目标检测
论文信息标题:ConDSeg:AGeneralMedicalImageSegmentationFrameworkviaContrast-DrivenFeatureEnhancement作者:MengqiLei,HaochenWu,XinhuaLv,XinWang机构:中国地质大学（武汉），百度公司发表时间:2024年12月11日会议:AAAI2025论文:https://arxiv.org/pdf
【论文翻译】GOT-OCR论文翻译——General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model 机器白学论文翻译 ocr 论文阅读论文翻译
论文原文链接：https://arxiv.org/abs/2409.01704特别声明，本文不做任何商业用途，仅作为个人学习相关论文的翻译记录。本文对原文内容直译，一切以论文原文内容为准，对原文作者表示最大的敬意。如有任何侵权请联系我下架相关文章。目录通用OCR理论：通过统一的端到端模型迈向OCR-2.00摘要1引言2相关工作2.1传统OCR2.2基于LVLM的OCR3通用OCR理论3.1框架3.
drogon orm分页问题，req->getJsonObject()为空会导致Segmentation fault zh7314
2024年6月22日17:14:12req->getJsonObject()获取json数据的时候，如果没有提前判断if(req->getJsonObject()==nullptr){throwstd::invalid_argument("参数json不能为空");}autojsonPtr=req->getJsonObject();官方文档：https://github.com/drogonfra
3DUnetCNN 项目常见问题解决方案魏纯漫
3DUnetCNN项目常见问题解决方案3DUnetCNNPytorch3DU-NetConvolutionNeuralNetwork(CNN)designedformedicalimagesegmentation项目地址:https://gitcode.com/gh_mirrors/3d/3DUnetCNN项目基础介绍3DUnetCNN是一个基于PyTorch的3DU-Net卷积神经网络（CNN）
3D U-Net CNN医学图像分割项目教程尤辰城Agatha
3DU-NetCNN医学图像分割项目教程3DUnetCNNPytorch3DU-NetConvolutionNeuralNetwork(CNN)designedformedicalimagesegmentation项目地址:https://gitcode.com/gh_mirrors/3d/3DUnetCNN1.项目介绍3DU-NetCNN是由Ellisdg开发的Python实现，专门用于医学图像
华为CANN架构与Ascend C算子开发 z1931195 华为
CANN架构CANN（ComputeArchitectureforNeuralNetworks）是华为专为应对人工智能场景而推出的一种新型异构计算架构。在当前快速发展的AI技术背景下，CANN致力于提供一种高效且灵活的解决方案，以支持多种AI框架的应用。其设计不仅仅关注于上层应用的兼容性，同时也服务于底层AI处理器的优化和编程需求，发挥了承上启下的关键作用，成为华为昇腾AI处理器计算效率提升的核心
MindIE+MindFormers推理方案指导人工智能pytorch
组件介绍CANNCANN是什么异构计算架构CANN（ComputeArchitectureforNeuralNetworks）是昇腾针对AI场景推出的异构计算架构，向上支持多种AI框架，包括MindSpore、PyTorch、TensorFlow等，向下服务AI处理器与编程，发挥承上启下的关键作用，是提升昇腾AI处理器计算效率的关键平台。同时针对多样化应用场景，提供多层次编程接口，支持用户快速构建
【PCL】Segmentation 模块—— 圆柱模型分割（Cylinder model segmentation） old_power PCL 计算机视觉 3D c++
1、简介PCL（PointCloudLibrary）中的圆柱模型分割CylinderModelSegmentation是一种从点云数据中提取圆柱体模型的技术。它通过识别点云中符合圆柱体几何形状的部分，将圆柱体从其他几何形状中分离出来。1.1主要步骤预处理：对点云进行去噪、下采样等操作，以减少数据量并提升处理效率。法线估计：计算点云中每个点的法线，用于后续的模型拟合。模型拟合：使用RANSAC（随机
北大新模型FAN：新型神经网络架构，填补周期性特征建模空白海森大数据神经网络人工智能深度学习
在科学研究和技术发展的浪潮中，周期性现象作为自然界和人类社会的普遍规律，一直备受关注。从天文学中的行星运动到经济学中的商业周期，周期性无处不在，深刻影响着我们的生活和思考方式。然而，传统的神经网络模型，如多层感知器（MLP）和Transformer，在周期性建模方面却存在明显不足。面对这一挑战，北京大学李戈教授的团队提出了一种创新性的网络架构——FourierAnalysisNetworks（FA
卷积神经网络（CNN）：深度学习中的核心模型任义礼智信深度学习 cnn 人工智能
引言卷积神经网络（ConvolutionalNeuralNetworks,CNNs）是深度学习领域的一种重要模型，广泛应用于图像处理、计算机视觉、自然语言处理等多个领域。CNN凭借其卓越的特征提取能力和参数共享机制，已成为计算机视觉任务中最主流的算法之一。本文将深入探讨CNN的基本原理、结构组件、应用场景及其发展方向。CNN的基本原理CNN是一种特殊的前馈神经网络（FeedforwardNeura
论文翻译：A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly CSPhD-winston-杨帆论文翻译 LLMs-鲁棒性语言模型人工智能自然语言处理
Asurveyonlargelanguagemodel(LLM)securityandprivacy:TheGood,TheBad,andTheUglyhttps://www.sciencedirect.com/science/article/pii/S266729522400014X文章目录关于大型语言模型（LLM）安全性和隐私的调查：好的、坏的和丑陋的摘要1.引言2.背景2.1大型语言模型（L
RTDETR融合[WACV 2024]的MetaSeg中的gmb模块今天炼丹了吗 RT-DETR涨点改进专栏 RT-DETR
RT-DETR使用教程：RT-DETR使用教程RT-DETR改进汇总贴：RT-DETR更新汇总贴《MetaSeg:MetaFormer-basedGlobalContexts-awareNetworkforEfficientSemanticSegmentation》一、模块介绍论文链接：https://arxiv.org/abs/2408.07576代码链接：https://github.com/
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
Presto【基础 01】简介+架构+数据源+数据模型 2401_84254343 程序员架构
一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
[C++11] 移动语意和移动构造函数图王大胜计算机学科基础开发语言 c++移动语意移动构造函数
说明：移动语义（MoveSemantics）是C++11引入的一个重要概念，旨在提高大型对象（特别是那些涉及资源管理的对象）的复制效率。移动语义允许资源从一个对象“移动”到另一个对象，而不是进行昂贵的复制操作。这种机制通过右值引用（right-valuereference）和移动构造函数（moveconstructor）以及移动赋值操作符（moveassignmentoperator）来实现。而移
JVM StackMapTable 属性的作用及理解 lijingyao8206 jvm 字节码 Class文件 StackMapTable
在Java 6版本之后JVM引入了栈图(Stack Map Table)概念。为了提高验证过程的效率，在字节码规范中添加了Stack Map Table属性，以下简称栈图，其方法的code属性中存储了局部变量和操作数的类型验证以及字节码的偏移量。也就是一个method需要且仅对应一个Stack Map Table。在Java 7版
回调函数调用方法百合不是茶 java
最近在看大神写的代码时,.发现其中使用了很多的回调 ,以前只是在学习的时候经常用到 ,现在写个笔记记录一下代码很简单: MainDemo :调用方法得到方法的返回结果
[时间机器]制造时间机器需要一些材料 comsci 制造
根据我的计算和推测,要完全实现制造一台时间机器,需要某些我们这个世界不存在的物质和材料... 甚至可以这样说,这种材料和物质,我们在反应堆中也无法获得......
开口埋怨不如闭口做事邓集海邓集海做人做事工作
“开口埋怨，不如闭口做事。”不是名人名言，而是一个普通父亲对儿子的训导。但是，因为这句训导，这位普通父亲却造就了一个名人儿子。这位普通父亲造就的名人儿子，叫张明正。　　　　张明正出身贫寒，读书时成绩差，常挨老师批评。高中毕业，张明正连普通大学的分数线都没上。高考成绩出来后，平时开口怨这怨那的张明正，不从自身找原因，而是不停地埋怨自己家庭条件不好、埋怨父母没有给他创造良好的学习环境。　　　　
jQuery插件开发全解析，类级别与对象级别开发 IT独行者 jquery 开发插件　函数
jQuery插件的开发包括两种：一种是类级别的插件开发，即给 jQuery添加新的全局函数，相当于给 jQuery类本身添加方法。 jQuery的全局函数就是属于 jQuery命名空间的函数，另一种是对象级别的插件开发，即给 jQuery对象添加方法。下面就两种函数的开发做详细的说明。 1 、类级别的插件开发类级别的插件开发最直接的理解就是给jQuer
Rome解析Rss 413277409 Rome解析Rss
import java.net.URL; import java.util.List; import org.junit.Test; import com.sun.syndication.feed.synd.SyndCategory; import com.sun.syndication.feed.synd.S
RSA加密解密无量加密解密 rsa
RSA加密解密代码代码有待整理 package com.tongbanjie.commons.util; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerat
linux 软件安装遇到的问题 aichenglong linux 遇到的问题 ftp
1 ftp配置中遇到的问题 500 OOPS: cannot change directory 出现该问题的原因:是SELinux安装机制的问题.只要disable SELinux就可以了修改方法:1 修改/etc/selinux/config 中SELINUX=disabled 2 source /etc
面试心得 alafqq 面试
最近面试了好几家公司。记录下；支付宝，面试我的人胖胖的，看着人挺好的；博彦外包的职位，面试失败；阿里金融，面试官人也挺和善，只不过我让他吐血了。。。由于印象比较深，记录下； 1，自我介绍 2，说下八种基本类型；（算上string。楼主才答了3种，哈哈，string其实不是基本类型，是引用类型） 3，什么是包装类，包装类的优点； 4，平时看过什么书？NND，什么书都没看过。。照样
java的多态性探讨百合不是茶 java
java的多态性是指main方法在调用属性的时候类可以对这一属性做出反应的情况 //package 1; class A{ public void test(){ System.out.println("A"); } } class D extends A{ public void test(){ S
网络编程基础篇之JavaScript-学习笔记 bijian1013 JavaScript
1.documentWrite <html> <head> <script language="JavaScript"> document.write("这是电脑网络学校"); document.close(); </script> </h
探索JUnit4扩展：深入Rule bijian1013 JUnit Rule 单元测试
本文将进一步探究Rule的应用，展示如何使用Rule来替代@BeforeClass，@AfterClass，@Before和@After的功能。在上一篇中提到，可以使用Rule替代现有的大部分Runner扩展，而且也不提倡对Runner中的withBefores()，withAfte
[CSS]CSS浮动十五条规则 bit1129 css
这些浮动规则，主要是参考CSS权威指南关于浮动规则的总结，然后添加一些简单的例子以验证和理解这些规则。 1. 所有的页面元素都可以浮动 2. 一个元素浮动后，会成为块级元素，比如<span>,a, strong等都会变成块级元素 3.一个元素左浮动，会向最近的块级父元素的左上角移动，直到浮动元素的左外边界碰到块级父元素的左内边界；如果这个块级父元素已经有浮动元素停靠了
【Kafka六】Kafka Producer和Consumer多Broker、多Partition场景 bit1129 partition
0.Kafka服务器配置 3个broker 1个topic，6个partition，副本因子是2 2个consumer，每个consumer三个线程并发读取 1. Producer package kafka.examples.multibrokers.producers; import java.util.Properties; import java.util.
zabbix_agentd.conf配置文件详解 ronin47 zabbix 配置文件
Aliaskey的别名，例如 Alias=ttlsa.userid:vfs.file.regexp[/etc/passwd,^ttlsa:.:([0-9]+),,,,\1]，或者ttlsa的用户ID。你可以使用key：vfs.file.regexp[/etc/passwd,^ttlsa:.: ([0-9]+),,,,\1]，也可以使用ttlsa.userid。备注: 别名不能重复，但是可以有多个
java--19.用矩阵求Fibonacci数列的第N项 bylijinnan fibonacci
参考了网上的思路，写了个Java版的： public class Fibonacci { final static int[] A={1,1,1,0}; public static void main(String[] args) { int n=7; for(int i=0;i<=n;i++){ int f=fibonac
Netty源码学习-LengthFieldBasedFrameDecoder bylijinnan java netty
先看看LengthFieldBasedFrameDecoder的官方API http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html API举例说明了LengthFieldBasedFrameDecoder的解析机制，如下：实
AES加密解密 chicony 加密解密
AES加解密算法，使用Base64做转码以及辅助加密： package com.wintv.common; import javax.crypto.Cipher; import javax.crypto.spec.IvParameterSpec; import javax.crypto.spec.SecretKeySpec; import sun.misc.BASE64Decod
文件编码格式转换 ctrain 编码格式
package com.test; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
mysql 在linux客户端插入数据中文乱码 daizj mysql 中文乱码
1、查看系统客户端，数据库，连接层的编码查看方法： http://daizj.iteye.com/blog/2174993 进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+------
好代码是廉价的代码 dcj3sjt126com 程序员读书
长久以来我一直主张：好代码是廉价的代码。当我跟做开发的同事说出这话时，他们的第一反应是一种惊愕，然后是将近一个星期的嘲笑，把它当作一个笑话来讲。当他们走近看我的表情、知道我是认真的时，才收敛一点。当最初的惊愕消退后，他们会用一些这样的话来反驳： “好代码不廉价，好代码是采用经过数十年计算机科学研究和积累得出的最佳实践设计模式和方法论建立起来的精心制作的程序代码。” 我只
Android网络请求库——android-async-http dcj3sjt126com android
在iOS开发中有大名鼎鼎的ASIHttpRequest库，用来处理网络请求操作，今天要介绍的是一个在Android上同样强大的网络请求库android-async-http，目前非常火的应用Instagram和Pinterest的Android版就是用的这个网络请求库。这个网络请求库是基于Apache HttpClient库之上的一个异步网络请求处理库，网络处理均基于Android的非UI线程，通
ORACLE 复习笔记之SQL语句的优化 eksliang SQL优化 Oracle sql语句优化 SQL语句的优化
转载请出自出处：http://eksliang.iteye.com/blog/2097999 SQL语句的优化总结如下 sql语句的优化可以按照如下六个步骤进行：合理使用索引避免或者简化排序消除对大表的扫描避免复杂的通配符匹配调整子查询的性能 EXISTS和IN运算符下面我就按照上面这六个步骤分别进行总结：
浅析：Android 嵌套滑动机制（NestedScrolling） gg163 android 移动开发滑动机制嵌套
谷歌在发布安卓 Lollipop版本之后，为了更好的用户体验，Google为Android的滑动机制提供了NestedScrolling特性 NestedScrolling的特性可以体现在哪里呢？ 比如你使用了Toolbar，下面一个ScrollView，向上滚
使用hovertree菜单作为后台导航 hvt JavaScript jquery .net hovertree asp.net
hovertree是一个jquery菜单插件，官方网址：http://keleyi.com/jq/hovertree/ ，可以登录该网址体验效果。 0.1.3版本：http://keleyi.com/jq/hovertree/demo/demo.0.1.3.htm hovertree插件包含文件： http://keleyi.com/jq/hovertree/css
SVG 教程（二）矩形天梯梦 svg
SVG <rect> SVG Shapes SVG有一些预定义的形状元素，可被开发者使用和操作：矩形 <rect> 圆形 <circle> 椭圆 <ellipse> 线 <line> 折线 <polyline> 多边形 <polygon> 路径 <path>
一个简单的队列 luyulong java 数据结构队列
public class MyQueue { private long[] arr; private int front; private int end; // 有效数据的大小 private int elements; public MyQueue() { arr = new long[10]; elements = 0; front
基础数据结构和算法九：Binary Search Tree sunwinner Algorithm
A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all
项目出现的一些问题和体会 Steven-Walker DAO Web servlet
第一篇博客不知道要写点什么，就先来点近阶段的感悟吧。这几天学了servlet和数据库等知识，就参照老方的视频写了一个简单的增删改查的，完成了最简单的一些功能，使用了三层架构。 dao层完成的是对数据库具体的功能实现，service层调用了dao层的实现方法，具体对servlet提供支持。 &
高手问答：Java老A带你全面提升Java单兵作战能力！ ITeye管理员 java
本期特邀《Java特种兵》作者：谢宇，CSDN论坛ID: xieyuooo 针对JAVA问题给予大家解答，欢迎网友积极提问，与专家一起讨论! 作者简介：淘宝网资深Java工程师，CSDN超人气博主，人称“胖哥”。 CSDN博客地址： http://blog.csdn.net/xieyuooo 作者在进入大学前是一个不折不扣的计算机白痴，曾经被人笑话过不懂鼠标是什么，