Jasper0420

深度学习论文阅读图像分类篇（六）：SENet《Squeeze-and-Excitation Networks》

Abstract 摘要
1. Introduction 引言
2. Related Work 相关工作
3. Squeeze-and-Excitation Blocks
- 3.1. Squeeze: Global Information Embedding 全局信息嵌入
- 3.2. Excitation: Adaptive Recalibration 自适应重新校正
- 3.3 Exemplars: SE-Inception and SE-ResNet 模型：SE-Inception 和 SE-ResNet
4. Model and Computational Complexity 模型和计算复杂度
5. Implementation 实现
6 Experiments 实验
- 6.1 ImageNet Classification ImageNet 分类
- 6.2 Scene Classification 场景分类
- 6.3 Analysis and Discussion Reduction ratio 分析和讨论减少比率
7. Conclusion 结论

Abstract 摘要

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, much existing work has shown the benefits of enhancing spatial encoding. In this work, we focus on channels and propose a novel architectural unit, which we term the “Squeeze-and-Excitation”(SE) block, that adaptively recalibrates channelwise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at slight computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a ∼25% relative improvement over the winning entry of 2016.
卷积神经网络建立在卷积运算的基础上，通过融合局部感受野内的空间信息和通道信息来提取信息特征。为了提高网络的表示能力，许多现有的工作已经表明增强空间编码的好处。在这项工作中，我们专注于通道，并提出了一种新颖的架构单元，我们称之为“Squeezeand-Excitation”（SE）模块，通过显式地建模通道之间的相互依赖关系，自适应地重新校准通道式的特征响应。通过将这些块堆叠在一起，我们证明了我们可以构建 SENet 架构，在具有挑战性的数据集中可以进行泛化地非常好。关键的是，我们发现 SE 模块以微小的计算成本为现有最先进的深层架构产生了显著的性能改进。SENets 是我们 ILSVRC 2017 分类提交的基础，它赢得了第一名，并将 top-5 错误率显著减少到 2.251%，相对于 2016 年的获胜团队取得了约 25%的相对改进。

1. Introduction 引言

Convolutional neural networks (CNNs) have proven to be effective models for tackling a variety of visual tasks [19, 23, 29, 41]. For each convolutional layer, a set of filters are learned to express local spatial connectivity patterns along input channels. In other words, convolutional filters are expected to be informative combinations by fusing spatial and channel-wise information together, while restricted in local receptive fields. By stacking a series of convolutional layers interleaved with non-linearities and downsampling, CNNs are capable of capturing hierarchical patterns with global receptive fields as powerful image descriptions. Recent work has demonstrated the performance of networks can be improved by explicitly embedding learning mechanisms that help capture spatial correlations without requiring additional supervision. One such approach was popularised by the Inception architectures [14, 39], which showed that the network can achieve competitive accuracy by embedding multi-scale processes in its modules. More recent work has sought to better model spatial dependence [1, 27] and incorporate spatial attention [17].
卷积神经网络（CNNs）已被证明是解决各种视觉任务的有效模型[19,23,29,41]。对于每个卷积层，沿着输入通道学习一组滤波器来表达局部空间连接模式。换句话说，期望卷积滤波器通过融合空间信息和信道信息进行信息组合，而受限于局部感受野。通过叠加一系列非线性和下采样交织的卷积层，CNN 能够捕获具有全局感受野的分层模式作为强大的图像描述。最近的工作已经证明，网络的性能可以通过显式地嵌入学习机制来改善，这种学习机制有助于捕捉空间相关性而不需要额外的监督。Inception 架构推广了一种这样的方法[14,39]，这表明网络可以通过在其模块中嵌入多尺度处理来取得有竞争力的准确度。最近的工作在寻找更好地模型空间依赖[1,27]，结合空间注意力[17]。
In contrast to these methods, we investigate a different aspect of architectural design —— the channel relationship, by introducing a new architectural unit, which we term the “Squeeze-and-Excitation” (SE) block. Our goal is to improve the representational power of a network by explicitly modelling the interdependencies between the channels of its convolutional features. To achieve this, we propose a mechanism that allows the network to perform feature recalibration, through which it can learn to use global information to selectively emphasise informative features and suppress less useful ones.
与这些方法相反，通过引入新的架构单元，我们称之为“Squeezeand-Excitation”(SE)块，我们研究了架构设计的一个不同方向——通道关系。我们的目标是通过显式地建模卷积特征通道之间的相互依赖性来提高网络的表示能力。为了达到这个目的，我们提出了一种机制，使网络能够执行特征重新校准，通过这种机制可以学习使用全局信息来选择性地强调信息特征并抑制不太有用的特征。
The basic structure of the SE building block is illustrated in Fig.1. For any given transformation Ftr:
X→UFtr:X → U, X ∈ RW′×H′×C′,U ∈ RW×H×CX∈RW′×H′×C′,U∈RW×H×C, (e.g. a convolution or a set of convolutions), we can construct a corresponding SE block to perform feature recalibration as follows. The features UU are first passed through a squeeze operation, which aggregates the feature maps across spatial dimensions W×HW× H to produce a channel descriptor. This descriptor embeds the global distribution of channel-wise feature responses, enabling information from the global receptive field of the network to be leveraged by its lower layers. This is followed by an excitation operation, in which sample-specific activations, learned for each channel by a selfgating mechanism based on channel dependence, govern the excitation of each channel. The feature maps UU are then reweighted to generate the output of the SE block which can then be fed directly into subsequent layers.
SE 构建块的基本结构如图 1 所示。对于任何给定的变换 Ftr:X→UFtr:X→U, X∈RW′×H′×C′,U∈RW×H×CX∈RW′×H′× C′,U∈RW×H×C，(例如卷积或一组卷积)，我们可以构造一个相应的 SE 块来执行特征重新校准，如下所示。特征 UU 首先通过 squeeze 操作，该操作跨越空间维度 W×HW×H 聚合特征映射来产生通道描述符。这个描述符嵌入了通道特征响应的全局分布，使来自网络全局感受野的信息能够被其较低层利用。这之后是一个 excitation 操作，其中通过基于通道依赖性的自门机制为每个通道学习特定采样的激活，控制每个通道的激励。然后特征映射 UU 被重新加权以生成 SE 块的输出，然后可以将其直接输入到随后的层中。

An SE network can be generated by simply stacking a collection of SE building blocks. SE blocks can also be used as a drop-in replacement for the original block at any depth in the architecture. However, while the template for the building block is generic, as we show in Sec. 6.3, the role it performs at different depths adapts to the needs of the network. In the early layers, it learns to excite informative features in a class agnostic manner, bolstering the quality of the shared lower level representations. In later layers, the SE block becomes increasingly specialised, and responds to different inputs in a highly class-specific manner. Consequently, the benefits of feature recalibration conducted by SE blocks can be accumulated through the entire network.
SE 网络可以通过简单地堆叠 SE 构建块的集合来生成。SE 块也可以用作架构中任意深度的原始块的直接替换。然而，虽然构建块的模板是通用的，正如我们 6.3 节中展示的那样，但它在不同深度的作用适应于网络的需求。在前面的层中，它学习以类不可知的方式激发信息特征，增强共享的较低层表示的质量。在后面的层中，SE 块越来越专业化，并以高度类特定的方式响应不同的输入。因此，SE 块进行特征重新校准的好处可以通过整个网络进行累积。
The development of new CNN architectures is a challenging engineering task, typically involving the selection of many new hyperparameters and layer configurations. By contrast, the design of the SE block outlined above is simple, and can be used directly with existing state-of-the-art architectures whose convolutional layers can be strengthened by direct replacement with their SE counterparts. Moreover, as shown in Sec. 4, SE blocks are computationally lightweight and impose only a slight increase in model complexity and computational burden. To support these claims, we develop several SENets, namely SE-ResNet, SEInception, SE-ResNeXt and SE-Inception-ResNet and provide an extensive evaluation of SENets on the ImageNet 2012 dataset [30]. Further, to demonstrate the general applicability of SE blocks, we also present results beyond ImageNet, indicating that the proposed approach is not restricted to a specific dataset or a task.
新 CNN 架构的开发是一项具有挑战性的工程任务，通常涉及许多新的超参数和层配置的选择。相比之下，上面概述的 SE 块的设计是简单的，并且可以直接与现有的最新架构一起使用，其卷积层可以通过直接用对应的 SE 层来替换从而进行加强。另外，如第四节所示， SE 块在计算上是轻量级的，并且在模型复杂性和计算负担方面仅稍微增加。为了支持这些声明，我们开发了一些 SENets，即 SE-ResNet， SE-Inception，SE-ResNeXt 和 SE-Inception-ResNet，并在 ImageNet 2012 数据集[30]上对 SENets 进行了广泛的评估。此外，为了证明 SE 块的一般适用性，我们还呈现了 ImageNet 之外的结果，表明所提出的方法不受限于特定的数据集或任务。
Using SENets, we won the first place in the ILSVRC 2017 classification competition. Our top performing model ensemble achieves a 2.251%2.251% top-5 error on the test set. This represents a ∼25%∼25% relative improvement in comparison to the winner entry of the previous year (with a top-55 error of 2.991%2.991%). Our models and related materials have been made available to the research community.
使用 SENets，我们赢得了 ILSVRC 2017 分类竞赛的第一名。我们的表现最好的模型集合在测试集上达到了 2.251%2.251%的 top-5 错误率。与前一年的获奖者（2.991%2.991%的 top-5 错误率）相比，这表示∼25%∼25%的相对改进。我们的模型和相关材料已经提供给研究界。

2. Related Work 相关工作

Deep architectures. A wide range of work has shown that restructuring the architecture of a convolutional neural network in a manner that eases the learning of deep features can yield substantial improvements in performance. VGGNets [35] and Inception models [39] demonstrated the benefits that could be attained with an increased depth, significantly outperforming previous approaches on ILSVRC 2014. Batch normalization (BN) [14] improved gradient propagation through deep networks by inserting units to regulate layer inputs stabilising the learning process, which enables further experimentation with a greater depth. He et al. [9, 10] showed that it was effective to train deeper networks by restructuring the architecture to learn residual functions through the use of identity-based skip connections which ease the flow of information across units. More recently, reformulations of the connections between network layers [5, 12] have been shown to further improve the learning and representational properties of deep networks.
深层架构。大量的工作已经表明，以易于学习深度特征的方式重构卷积神经网络的架构可以大大提高性能。VGGNets[35]和 Inception 模型[39]证明了深度增加可以获得的好处，明显超过了 ILSVRC 2014 之前的方法。批标准化（BN）[14]通过插入单元来调节层输入稳定学习过程，改善了通过深度网络的梯度传播，这使得可以用更深的深度进行进一步的实验。He 等人[9,10]表明，通过重构架构来训练更深层次的网络是有效的，通过使用基于恒等映射的跳跃连接来学习残差函数，从而减少跨单元的信息流动。最近，网络层间连接的重新表示[5,12] 已被证明可以进一步改善深度网络的学习和表征属性。
An alternative line of research has explored ways to tune the functional form of the modular components of a network. Grouped convolutions can be used to increase cardinality (the size of the set of transformations) [13, 43] to learn richer representations. Multi-branch convolutions can be interpreted as a generalisation of this concept, enabling more flexible compositions of convolutional operators [14, 38, 39, 40]. Cross-channel correlations are typically mapped as new combinations of features, either independently of spatial structure [6, 18] or jointly by using standard convolutional filters [22] with 1×11×1 convolutions, while much of this work has concentrated on the objective of reducing model and computational complexity. This approach reflects an assumption that channel relationships can be formulated as a composition of instanceagnostic functions with local receptive fields. In contrast, we claim that providing the network with a mechanism to explicitly model dynamic, nonlinear dependencies between channels using global information can ease the learning process, and significantly enhance the representational power of the network.
另一种研究方法探索了调整网络模块化组件功能形式的方法。可以用分组卷积来增加基数（一组变换的大小）[13,43]以学习更丰富的表示。多分支卷积可以解释为这个概念的概括，使得卷积算子可以更灵活的组合[14,38,39,40]。跨通道相关性通常被映射为新的特征组合，或者独立的空间结构[6,18]，或者联合使用标准卷积滤波器[22]和 1×11×1 卷积，然而大部分工作的目标是集中在减少模型和计算复杂度上面。这种方法反映了一个假设，即通道关系可以被表述为具有局部感受野的实例不可知的函数的组合。相比之下，我们声称为网络提供一种机制来显式建模通道之间的动态、非线性依赖关系，使用全局信息可以减轻学习过程，并且显著增强网络的表示能力。
Attention and gating mechanisms. Attention can be viewed, broadly, as a tool to bias the allocation of available processing resources towards the most informative components of an input signal. The development and understanding of such mechanisms has been a longstanding area of research in the neuroscience community [15, 16, 28] and has seen significant interest in recent years as a powerful addition to deep neural networks [20, 25]. Attention has been shown to improve performance across a range of tasks, from localisation and understanding in images [3, 17] to sequence-based models [2, 24]. It is typically implemented in combination with a gating function (e.g. a softmax or sigmoid) and sequential techniques [11, 37]. Recent work has shown its applicability to tasks such as image captioning [4, 44] and lip reading [7], in which it is exploited to efficiently aggregate multi-modal data. In these applications, it is typically used on top of one or more layers representing higher-level abstractions for adaptation between modalities. Highway networks [36] employ a gating mechanism to regulate the shortcut connection, enabling the learning of very deep architectures. Wang et al. [42] introduce a powerful trunk-and-mask attention mechanism using an hourglass module [27], inspired by its success in semantic segmentation. This high capacity unit is inserted into deep residual networks between intermediate stages. In contrast, our proposed SE-block is a lightweight gating mechanism, specialised to model channel-wise relationships in a computationally efficient manner and designed to enhance the representational power of modules throughout the network.
注意力和门机制。从广义上讲，可以将注意力视为一种工具，将可用处理资源的分配偏向于输入信号的信息最丰富的组成部分。这种机制的发展和理解一直是神经科学社区的一个长期研究领域 [15,16,28]，并且近年来作为一个强大补充，已经引起了深度神经网络的极大兴趣[20,25]。注意力已经被证明可以改善一系列任务的性能，从图像的定位和理解[3,17]到基于序列的模型[2,24]。它通常结合门功能（例如 softmax 或 sigmoid）和序列技术来实现[11,37]。最近的研究表明，它适用于像图像标题[4,44]和口头阅读[7]等任务，其中利用它来有效地汇集多模态数据。在这些应用中，它通常用在表示较高级别抽象的一个或多个层的顶部，以用于模态之间的适应。高速网络[36] 采用门机制来调节快捷连接，使得可以学习非常深的架构。王等人[42] 受到语义分割成功的启发，引入了一个使用沙漏模块[27]的强大的 trunk-and-mask 注意力机制。这个高容量的单元被插入到中间阶段之间的深度残差网络中。相比之下，我们提出的 SE 块是一个轻量级的门机制，专门用于以计算有效的方式对通道关系进行建模，并设计用于增强整个网络中模块的表示能力。

3. Squeeze-and-Excitation Blocks

The Squeeze-and-Excitation block is a computational unit which can be constructed for any given transformation Ftr:
X→U,X∈RW′×H′×C′,U ∈RW×H×CFtr:X→U,X∈RW′×H′×C′,U∈RW×H×C. For simplicity of exposition, in the notation that follows we take FtrFtr to be a standard convolutional operator. Let V=[v1,v2,…,vC]V=[v1,v2, … ,vC] denote the learned set of filter kernels, where vcvc refers to the parameters of the cc-th filter. We can then write the outputs of FtrFtr as U=[u1,u2,…,uC]U=[u1,u2,…,uC] where uc=vc∗X=∑s=1C′vsc∗xs.uc=vc∗X=∑s=1C′vcs∗xs. Here ∗ ∗ denotes convolution, vc=[v1c,v2c,…,vC′c]vc=[vc1,vc2,…,vcC′] and X=[x1,x2,… ,xC′]X=[x1,x2,…,xC′] (to simplify the notation, bias terms are omitted). Here vscvcs is a 22D spatial kernel, and therefore represents a single channel of vcvc which acts on the corresponding channel of XX. Since the output is produced by a summation through all channels, the channel dependencies are implicitly embedded in vcvc, but these dependencies are entangled with the spatial correlation captured by the filters. Our goal is to ensure that the network is able to increase its sensitivity to informative features so that they can be exploited by subsequent transformations, and to suppress less useful ones. We propose to achieve this by explicitly modelling channel interdependencies to recalibrate filter responses in two steps, squeeze and excitation, before they are fed into next transformation. A diagram of an SE building block is shown in Fig.1.
Squeeze-and-Excitation 块是一个计算单元，可以为任何给定的变换构建：Ftr:X→U,X∈RW′×H′×C′,U∈RW×H×CFtr:X→U,X∈RW′ ×H′×C′,U∈RW×H×C。为了简化说明，在接下来的表示中，我们将 FtrFtr 看作一个标准的卷积算子。V=[v1,v2,…,vC]V=[v1,v2,…,vC] 表示学习到的一组滤波器核，vcvc 指的是第 cc 个滤波器的参数。然后我们可以将 FtrFtr 的输出写作 U=[u1,u2,…,uC]U=[u1,u2,…,uC]，其中 uc=vc∗X=∑s=1C′vsc∗xs.uc=vc∗X=∑s=1C′vcs∗xs. 这里∗ ∗表示卷积，vc=[v1c,v2c,…,vC′c]vc=[vc1,vc2,…,vcC′]， X=[x1,x2,…,xC′]X=[x1,x2,…,xC′]（为了简洁表示，忽略偏置项）。这里 vscvcs 是 22D 空间核，因此表示 vcvc 的一个单通道，作用于对应的通道 XX。由于输出是通过所有通道的和来产生的，所以通道依赖性被隐式地嵌入到 vcvc 中，但是这些依赖性与滤波器捕获的空间相关性纠缠在一起。我们的目标是确保能够提高网络对信息特征的敏感度，以便后续转换可以利用这些功能，并抑制不太有用的功能。我们建议通过显式建模通道依赖性来实现这一点，以便在进入下一个转换之前通过两步重新校准滤波器响应，两步为：squeeze 和 excitation。 SE 构建块的图如图 1 所示。

3.1. Squeeze: Global Information Embedding 全局信息嵌入

In order to tackle the issue of exploiting channel dependencies, we first consider the signal to each channel in the output features. Each of the learned filters operate with a local receptive field and consequently each unit of the transformation output UU is unable to exploit contextual information outside of this region. This is an issue that becomes more severe in the lower layers of the network whose receptive field sizes are small.
为了解决利用通道依赖性的问题，我们首先考虑输出特征中每个通道的信号。每个学习到的滤波器都对局部感受野进行操作，因此变换输出 UU 的每个单元都无法利用该区域之外的上下文信息。在网络较低的层次上其感受野尺寸很小，这个问题变得更严重。
To mitigate this problem, we propose to squeeze global spatial information into a channel descriptor. This is achieved by using global average pooling to generate channel-wise statistics. Formally, a statistic z ∈ RCz ∈ RC is generated by shrinking UU through spatial dimensions W×HW×H, where the cc-th element of zz is calculated by: zc=Fsq(uc)=1W×H∑i=1W∑j=1Huc(i,j).zc=Fsq(uc)=1W×H∑i=1W ∑j=1Huc(i,j).
为了减轻这个问题，我们提出将全局空间信息压缩成一个通道描述符。这是通过使用全局平均池化生成通道统计实现的。形式上，统计 z∈RCz∈RC 是通过在空间维度 W×HW×H 上收缩 UU 生成的，其中 zz 的第 cc 个元素通过下式计算： zc=Fsq(uc)=1W×H∑i=1W∑j=1Huc(i,j).zc=Fsq(uc)=1W×H∑i=1W ∑j=1Huc(i,j).
Discussion. The transformation output UU can be interpreted as a collection of the local descriptors whose statistics are expressive for the whole image. Exploiting such information is prevalent in feature engineering work [31, 34, 45]. We opt for the simplest, global average pooling, while more sophisticated aggregation strategies could be employed here as well.
讨论。转换输出 UU 可以被解释为局部描述子的集合，这些描述子的统计信息对于整个图像来说是有表现力的。特征工程工作中 [31,34,45]普遍使用这些信息。我们选择最简单的全局平均池化，同时也可以采用更复杂的汇聚策略。

3.2. Excitation: Adaptive Recalibration 自适应重新校正

To make use of the information aggregated in the squeeze operation, we follow it with a second operation which aims to fully capture channelwise dependencies. To fulfil this objective, the function must meet two criteria: first, it must be flexible (in particular, it must be capable of learning a nonlinear interaction between channels) and second, it must learn a non-mutually-exclusive relationship as multiple channels are allowed to be emphasised opposed to one-hot activation. To meet these criteria, we opt to employ a simple gating mechanism with a sigmoid activation: s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z))s=Fex(z,W)=σ(g(z,W))=σ(W 2δ(W1z)) where δδ refers to the ReLU[26] function, W1 ∈ RCr×CW1 ∈ RCr×C and W2∈RC×CrW2∈RC×Cr. To limit model complexity and aid generalisation, we parameterise the gating mechanism by forming a bottleneck with two fully-connected (FC) layers around the non-linearity, i.e. a dimensionality-reduction layer with parameters W1W1 with reduction ratio rr (we set it to be 16, and this parameter choice is discussed in Sec.6.3), a ReLU and then a dimensionality-increasing layer with parameters W2W2. The final output of the block is obtained by rescaling the transformation output UU with the activations: x˜c=Fscale(uc,sc)=sc⋅ucx˜c=Fscale(uc,sc)=sc⋅uc where X˜=[x˜1,x˜2,…,x˜C]X_=[x1,x_2,…,xC] and Fscale(uc,sc) Fscale(uc,sc) refers to channel-wise multiplication between the feature map uc∈RW×Huc∈RW×Hand the scalar scsc.
为了利用压缩操作中汇聚的信息，我们接下来通过第二个操作来全面捕获通道依赖性。为了实现这个目标，这个功能必须符合两个标准：第一，它必须是灵活的（特别是它必须能够学习通道之间的非线性交互）；第二，它必须学习一个非互斥的关系，因为独热激活相反，这里允许强调多个通道。为了满足这些标准，我们选择采用一个简单的门机制，并使用 sigmoid 激活： s=Fex(z,W)=σ(g(z,W))=σ(W2δ(W1z))s=Fex(z,W)=σ(g(z,W))=σ(W 2δ(W1z)) ，其中 δδ 是指 ReLU[26]函数，W1∈RCr×CW1∈RCr×C 和 W2 ∈RC×CrW2∈RC×Cr。为了限制模型复杂度和辅助泛化，我们通过在非线性周围形成两个全连接（FC）层的瓶颈来参数化门机制，即降维层参数为 W1W1，降维比例为 rr（我们把它设置为 16，这个参数选择在 6.3 节中讨论），一个 ReLU，然后是一个参数为 W2W2 的升维层。块的最终输出通过重新调节带有激活的变换输出 UU 得到： x˜c=Fscale(uc,sc)=sc⋅ucx˜c=Fscale(uc,sc)=sc⋅uc 其中 X˜=[x˜1,x˜2,…,x˜C]X_=[x1,x_2,…,xC] 和 Fscale(uc,sc)Fscale(uc,sc)指的是特征映射 uc∈RW×Huc∈RW×H 和标量 scsc 之间的对应通道乘积。
Discussion. The activations act as channel weights adapted to the input-specific descriptor zz. In this regard, SE blocks intrinsically introduce dynamics conditioned on the input, helping to boost feature discriminability.
讨论。激活作为适应特定输入描述符 zz 的通道权重。在这方面， SE 块本质上引入了以输入为条件的动态特性，有助于提高特征辨别力。

3.3 Exemplars: SE-Inception and SE-ResNet 模型：SE-Inception 和 SE-ResNet

The flexibility of the SE block means that it can be directly applied to transformations beyond standard convolutions. To illustrate this point, we develop SENets by integrating SE blocks into two popular network families of architectures, Inception and ResNet. SE blocks are constructed for the Inception network by taking the transformation FtrFtr to be an entire Inception module (see Fig.2). By making this change for each such module in the architecture, we construct an SE-Inception network.
SE 块的灵活性意味着它可以直接应用于标准卷积之外的变换。为了说明这一点，我们通过将 SE 块集成到两个流行的网络架构系列 Inception 和 ResNet 中来开发 SENets。通过将变换 FtrFtr 看作一个整体的 Inception 模块（参见图 2），为 Inception 网络构建 SE 块。通过对架构中的每个模块进行更改，我们构建了一个 SE-Inception 网络。

Residual networks and their variants have shown to be highly effective at learning deep representations. We develop a series of SE blocks that integrate with ResNet [9], ResNeXt [43] and Inception-ResNet [38] respectively. Fig.3 depicts the schema of an SE-ResNet module. Here, the SE block transformation FtrFtr is taken to be the non-identity branch of a residual module. Squeeze and excitation both act before summation with the identity branch.
残留网络及其变种已经证明在学习深度表示方面非常有效。我们开发了一系列的 SE 块，分别与 ResNet[9]，ResNeXt[43]和 InceptionResNet[38]集成。图 3 描述了 SE-ResNet 模块的架构。在这里，SE 块变换 FtrFtr 被认为是残差模块的非恒等分支。压缩和激励都在恒等分支相加之前起作用。

4. Model and Computational Complexity 模型和计算复杂度

An SENet is constructed by stacking a set of SE blocks. In practice, it is generated by replacing each original block (i.e. residual block) with its corresponding SE counterpart (i.e. SE-residual block). We describe the architecture of SE-ResNet-50 and SE-ResNeXt-50 in Table 1.
SENet 通过堆叠一组 SE 块来构建。实际上，它是通过用原始块的 SE 对应部分（即 SE 残差块）替换每个原始块（即残差块）而产生的。我们在表 1 中描述了 SE-ResNet-50 和 SE-ResNeXt-50 的架构。

表 1 (左)ResNet-50，(中)SE-ResNet-50，(右)具有 32×4d32×4d 模板的 SE-ResNeXt-50。在括号内列出了残差构建块特定参数设置的形状和操作，并且在外部呈现了一个阶段中堆叠块的数量。fc 后面的内括号表示 SE 模块中两个全连接层的输出维度。
For the proposed SE block to be viable in practice, it must provide an acceptable model complexity and computational overhead which is important for scalability. To illustrate the cost of the module, we take the comparison between ResNet-50 and SE-ResNet-50 as an example, where the accuracy of SE-ResNet-50 is obviously superior to ResNet-50 and approaching a deeper ResNet-101 network (shown in Table 2). ResNet-50 requires ∼∼3.86 GFLOPs in a single forward pass for a 224×224224× 224 pixel input image. Each SE block makes use of a global average pooling operation in the squeeze phase and two small fully connected layers in the excitation phase, followed by an inexpensive channel-wise scaling operation. In aggregate, SE-ResNet-50 requires ∼∼3.87 GFLOPs, corresponding to only a 0.26%0.26% relative increase over the original ResNet-50.
在实践中提出的 SE 块是可行的，它必须提供可接受的模型复杂度和计算开销，这对于可伸缩性是重要的。为了说明模块的成本，作为例子我们比较了 ResNet-50 和 SE-ResNet-50，其中 SE-ResNet-50 的精确度明显优于 ResNet-50，接近更深的 ResNet-101 网络（如表 2 所示）。对于 224×224224×224 像素的输入图像，ResNet-50 单次前向传播需要∼∼ 3.86 GFLOP。每个 SE 块利用压缩阶段的全局平均池化操作和激励阶段中的两个小的全连接层，接下来是廉价的通道缩放操作。总的来说，SE-ResNet-50 需要∼∼ 3.87 GFLOP，相对于原始的 ResNet-50 只相对增加了 0.26%0.26%。

表 2。ImageNet 验证集上的单裁剪图像错误率（％）和复杂度比较。original 列是指原始论文中报告的结果。为了进行公平比较，我们重新训练了基准模型，并在 re-implementation 列中报告分数。SENet 列是指已添加 SE 块后对应的架构。括号内的数字表示与重新实现的基准数据相比的性能改善。†表示该模型已经在验证集的非黑名单子集上进行了评估（在[38]中有更详细的讨论），这可能稍微改善结果。
In practice, with a training mini-batch of 256256 images, a single pass forwards and backwards through ResNet-50 takes 190190ms, compared to 209209ms for SE-ResNet-50 (both timings are performed on a server with 88 NVIDIA Titan X GPUs). We argue that it is a reasonable overhead as global pooling and small inner-product operations are less optimised in existing GPU libraries. Moreover, due to its importance for embedded device applications, we also benchmark CPU inference time for each model: for a 224×224224 × 224pixel input image, ResNet-50 takes 164164ms, compared to for SE-ResNet-5050. The small additional computational overhead required by the SE block is justified by its contribution to model performance (discussed in detail in Sec. 6).
在实践中，训练的批数据大小为 256 张图像，ResNet-50 的一次前向传播和反向传播花费 190190 ms，而 SE-ResNet-50 则花费 209209ms（两个时间都在具有 88 个 NVIDIA Titan X GPU 的服务器上执行）。我们认为这是一个合理的开销，因为在现有的 GPU 库中，全局池化和小型内积操作的优化程度较低。此外，由于其对嵌入式设备应用的重要性，我们还对每个模型的 CPU 推断时间进行了基准测试：对于 224×224224×224 像素的输入图像，ResNet-50 花费了 164164ms，相比之下，SE-ResNet-5050 花费了 167167ms。SE 块所需的小的额外计算开销对于其对模型性能的贡献来说是合理的（在第 6 节中详细讨论）。
Next, we consider the additional parameters introduced by the proposed block. All additional parameters are contained in the two fully connected layers of the gating mechanism, which constitute a small fraction of the total network capacity. More precisely, the number of additional parameters introduced is given by: 2r∑s=1SNs⋅Cs22r∑s=1SNs⋅Cs2 where rr denotes the reduction ratio (we set rr to 1616 in all our experiments), SS refers to the number of stages (where each stage refers to the collection of blocks operating on feature maps of a common spatial dimension), CsCs denotes the dimension of the output channels for stage ss and NsNs refers to the repeated block number. In total, SEResNet-50 introduces ∼∼2.5 million additional parameters beyond the ∼∼25 million parameters required by ResNet-50, corresponding to a ∼10%∼10% increase in the total number of parameters. The majority of these additional parameters come from the last stage of the network, where excitation is performed across the greatest channel dimensions. However, we found that the comparatively expensive final stage of SE blocks could be removed at a marginal cost in performance (<0.1%<0.1%top-1 error on ImageNet dataset) to reduce the relative parameter increase to ∼4%∼4%, which may prove useful in cases where parameter usage is a key consideration.
接下来，我们考虑所提出的块引入的附加参数。所有附加参数都包含在门机制的两个全连接层中，构成网络总容量的一小部分。更确切地说，引入的附加参数的数量由下式给出： 2r∑s=1SNs⋅Cs22r∑s=1SNs⋅Cs2 其中 rr 表示减少比率（我们在所有的实验中将 rr 设置为 1616）， SS 指的是阶段数量（每个阶段是指在共同的空间维度的特征映射上运行的块的集合），CsCs 表示阶段 ss 的输出通道的维度，NsNs 表示重复的块编号。总的来说，SE-ResNet-50 在 ResNet-50 所要求的 ∼∼2500 万参数之外引入了∼∼250 万附加参数，相对增加了∼10%∼10% 的参数总数量。这些附加参数中的大部分来自于网络的最后阶段，其中激励在最大的通道维度上执行。然而，我们发现 SE 块相对昂贵的最终阶段可以在性能的边际成本（ImageNet 数据集上<0.1%<0.1%的top-1 错误率）上被移除，将相对参数增加减少到∼4%∼4%，这在参数使用是关键考虑的情况下可能证明是有用的。

5. Implementation 实现

During training, we follow standard practice and perform data augmentation with random-size cropping [39] to 224×224224×224 pixels (299×299299 × 299 for Inception-ResNet-v2 [38] and SE-InceptionResNet-v2) and random horizontal flipping. Input images are normalised through mean channel subtraction. In addition, we adopt the data balancing strategy described in [32] for mini-batch sampling to compensate for the uneven distribution of classes. The networks are trained on our distributed learning system “ROCS” which is capable of handing efficient parallel training of large networks. Optimisation is performed using synchronous SGD with momentum 0.9 and a mini-batch size of 1024 (split into subbatches of 32 images per GPU across 4 servers, each containing 8 GPUs). The initial learning rate is set to 0.6 and decreased by a factor of 10 every 30 epochs. All models are trained for 100 epochs from scratch, using the weight initialisation strategy described in [8].
在训练过程中，我们遵循标准的做法，使用随机大小裁剪[39]到 224×224224×224 像素（299×299299×299 用于 Inception-ResNetv2[38]和 SE-Inception-ResNet-v2）和随机的水平翻转进行数据增强。输入图像通过通道减去均值进行归一化。另外，我们采用[32]中描述的数据均衡策略进行小批量采样，以补偿类别的不均匀分布。网络在我们的分布式学习系统“ROCS”上进行训练，能够处理大型网络的高效并行训练。使用同步 SGD 进行优化，动量为 0.9，小批量数据的大小为 1024（在 4 个服务器的每个 GPU 上分成 32 张图像的子批次，每个服务器包含 8 个 GPU）。初始学习率设为 0.6，每 30 个迭代周期减少 10 倍。使用[8]中描述的权重初始化策略，所有模型都从零开始训练 100 个迭代周期。

6 Experiments 实验

In this section we conduct extensive experiments on the ImageNet 2012 dataset [30] for the purposes: first, to explore the impact of the proposed SE block for the basic networks with different depths and second, to investigate its capacity of integrating with current state-of-the-art network architectures, which aim to a fair comparison between SENets and non-SENets rather than pushing the performance. Next, we present the results and details of the models for ILSVRC 2017 classification task. Furthermore, we perform experiments on the Places365-Challenge scene classification dataset [48] to investigate how well SENets are able to generalise to other datasets. Finally, we investigate the role of excitation and give some analysis based on experimental phenomena.
在这一部分，我们在 ImageNet 2012 数据集上进行了大量的实验 [30]，其目的是：首先探索提出的 SE 块对不同深度基础网络的影响；其次，调查它与最先进的网络架构集成后的能力，旨在公平比较 SENets 和非 SENets，而不是推动性能。接下来，我们将介绍 ILSVRC 2017 分类任务模型的结果和详细信息。此外，我们在 Places365- Challenge 场景分类数据集[48]上进行了实验，以研究 SENets 是否能够很好地泛化到其它数据集。最后，我们研究激励的作用，并根据实验现象给出了一些分析。

6.1 ImageNet Classification ImageNet 分类

The ImageNet 2012 dataset is comprised of 1.28 million training images and 50K validation images from 1000 classes. We train networks on the training set and report the top-1 and the top-5 errors using centre crop evaluations on the validation set, where 224×224224×224 pixels are cropped from each image whose shorter edge is first resized to 256 (299×299299×299 from each image whose shorter edge is first resized to 352 for Inception-ResNet-v2 and SE-Inception-ResNet-v2).
ImageNet 2012 数据集包含来自 1000 个类别的 128 万张训练图像和 5 万张验证图像。我们在训练集上训练网络，并在验证集上使用中心裁剪图像评估来报告 top-1 和 top-5 错误率，其中每张图像短边首先归一化为 256，然后从每张图像中裁剪出 224×224224×224 个像素，（对于 Inception-ResNet-v2 和 SE-Inception-ResNet-v2，每幅图像的短边首先归一化到 352，然后裁剪出 299×299299×299 个像素）。
Network depth. We first compare the SE-ResNet against a collection of standard ResNet architectures. Each ResNet and its corresponding SEResNet are trained with identical optimisation schemes. The performance of the different networks on the validation set is shown in Table 2, which shows that SE blocks consistently improve performance across different depths with an extremely small increase in computational complexity.
网络深度。我们首先将 SE-ResNet 与一系列标准 ResNet 架构进行比较。每个 ResNet 及其相应的 SE-ResNet 都使用相同的优化方案进行训练。验证集上不同网络的性能如表 2 所示，表明 SE 块在不同深度上的网络上计算复杂度极小增加，始终提高性能。
Remarkably, SE-ResNet-50 achieves a single-crop top-5 validation error of 6.62%6.62%, exceeding ResNet-50 (7.48%7.48%) by 0.86%0.86% and approaching the performance achieved by the much deeper ResNet-101 network (6.52%6.52% top-5 error) with only half of the computational overhead (3.873.87 GFLOPs vs. 7.587.58 GFLOPs). This pattern is repeated at greater depth, where SE-ResNet-101 (6.07%6.07% top-55 error) not only matches, but outperforms the deeper ResNet-152 network (6.34%6.34% top-5 error) by 0.27%0.27%. Fig.4 depicts the training and validation curves of SE-ResNets and ResNets, respectively. While it should be noted that the SE blocks themselves add depth, they do so in an extremely computationally efficient manner and yield good returns even at the point at which extending the depth of the base architecture achieves diminishing returns. Moreover, we see that the performance improvements are consistent through training across a range of different depths, suggesting that the improvements induced by SE blocks can be used in combination with adding more depth to the base architecture.
值得注意的是，SE-ResNet-50 实现了单裁剪图像 6.62%6.62%的 top-5 验证错误率，超过了 ResNet-50（7.48%7.48%）0.86%0.86%，接近更深的 ResNet-101 网络（6.52%6.52%的 top-5 错误率），且只有 ResNet-101 一半的计算开销（3.873.87 GFLOPs vs. 7.587.58 GFLOPs）。这种模式在更大的深度上重复，SE-ResNet-101（6.07%6.07%的 top-5 错误率）不仅可以匹配，而且超过了更深的 ResNet-152 网络（6.34%6.34%的 top-5 错误率）。图 4 分别描绘了 SE-ResNets 和 ResNets 的训练和验证曲线。虽然应该注意 SE 块本身增加了深度，但是它们的计算效率极高，即使在扩展的基础架构的深度达到收益递减的点上也能产生良好的回报。而且，我们看到通过对各种不同深度的训练，性能改进是一致的，这表明 SE 块引起的改进可以与增加基础架构更多深度结合使用。

Integration with modern architectures. We next investigate the effect of combining SE blocks with another two state-of-the-art architectures, Inception-ResNet-v2 [38] and ResNeXt [43]. The Inception architecture constructs modules of convolutions as multibranch combinations of factorised filters, reflecting the Inception hypothesis [6] that spatial correlations and cross-channel correlations can be mapped independently. In contrast, the ResNeXt architecture asserts that richer representations can be obtained by aggregating combinations of sparsely connected (in the channel dimension) convolutional features. Both approaches introduce prior-structured correlations in modules. We construct SENet equivalents of these networks, SE-Inception-ResNet-v2 and SE-ResNeXt (the configuration of SE-ResNeXt-50 (32×4d32×4d) is given in Table 1). Like previous experiments, the same optimisation scheme is used for both the original networks and their SENet counterparts.
与现代架构集成。接下来我们将研究 SE 块与另外两种最先进的架构 Inception-ResNet-v2[38]和 ResNeXt[43]的结合效果。Inception 架构将卷积模块构造为分解滤波器的多分支组合，反映了 Inception 假设[6]，可以独立映射空间相关性和跨通道相关性。相比之下，ResNeXt 体架构断言，可以通过聚合稀疏连接（在通道维度中）卷积特征的组合来获得更丰富的表示。两种方法都在模块中引入了先前结构化的相关性。我们构造了这些网络的 SENet 等价物，SE-Inception-ResNet-v2 和 SE-ResNeXt（表 1 给出了 SE-ResNeXt-50（32×4d32×4d）的配置。像前面的实验一样，原始网络和它们对应的 SENet 网络都使用相同的优化方案。
The results given in Table 2 illustrate the significant performance improvement induced by SE blocks when introduced into both architectures. In particular, SE-ResNeXt-50 has a top-5 error of 5.49%5.49% which is superior to both its direct counterpart ResNeXt50 (5.90%5.90% top-5 error) as well as the deeper ResNeXt-101 (5.57%5.57% top-5 error), a model which has almost double the number of parameters and computational overhead. As for the experiments of Inception-ResNet-v2, we conjecture the difference of cropping strategy might lead to the gap between their reported result and our re-implemented one, as their original image size has not been clarified in [38] while we crop the 299×299299×299 region from a relative larger image (where the shorter edge is resized to 352). SE-Inception-ResNet-v2 (4.79%4.79% top5 error) outperforms our reimplemented Inception-ResNet-v2 (5.21%5.21% top-5 error) by 0.42%0.42% (a relative improvement of 8.1%8.1%) as well as the reported result in [38]. The optimisation curves for each network are depicted in Fig. 5, illustrating the consistency of the improvement yielded by SE blocks throughout the training process.
表 2 中给出的结果说明在将 SE 块引入到两种架构中会引起显著的性能改善。尤其是 SE-ResNeXt-50 的 top-5 错误率是 5.49%5.49%，优于于它直接对应的 ResNeXt-50（5.90%5.90%的 top-5 错误率）以及更深的 ResNeXt-101（5.57%5.57%的 top-5 错误率），这个模型几乎有两倍的参数和计算开销。对于 Inception-ResNet-v2 的实验，我们猜测可能是裁剪策略的差异导致了其报告结果与我们重新实现的结果之间的差距，因为它们的原始图像大小尚未在[38]中澄清，而我们从相对较大的图像（其中较短边被归一化为 352）中裁剪出 299×299299 ×299 大小的区域。SE-Inception-ResNet-v2（4.79%4.79%的 top-5 错误率）比我们重新实现的 Inception-ResNet-v2（5.21%5.21%的 top-5 错误率）要低 0.42%0.42%（相对改进了 8.1%8.1%）也优于[38]中报告的结果。每个网络的优化曲线如图 5 所示，说明了在整个训练过程中 SE 块产生了一致的改进。
Finally, we assess the effect of SE blocks when operating on a nonresidual network by conducting experiments with the BN-Inception architecture [14] which provides good performance at a lower model complexity. The results of the comparison are shown in Table 2 and the training curves are shown in Fig. 6, exhibiting the same phenomena that emerged in the residual architectures. In particular, SE-BN-Inception achieves a lower top-5 error of 7.14%7.14% in comparison to BNInception whose error rate is 7.89%7.89%. These experiments demonstrate that improvements induced by SE blocks can be used in combination with a wide range of architectures. Moreover, this result holds for both residual and non-residual foundations.
最后，我们通过对 BN-Inception 架构[14]进行实验来评估 SE 块在非残差网络上的效果，该架构在较低的模型复杂度下提供了良好的性能。比较结果如表 2 所示，训练曲线如图 6 所示，表现出的现象与残差架构中出现的现象一样。尤其是与 BN-Inception 7.89%7.89%的错误率相比，SE-BN-Inception 获得了更低 7.14%7.14%的 top-5 错误。这些实验表明 SE 块引起的改进可以与多种架构结合使用。而且，这个结果适用于残差和非残差基础。

Results on ILSVRC 2017 Classification Competition. ILSVRC [30] is an annual computer vision competition which has proved to be a fertile ground for model developments in image classification. The training and validation data of the ILSVRC 2017 classification task are drawn from the ImageNet 2012 dataset, while the test set consists of an additional unlabelled 100K images. For the purposes of the competition, the top-5 error metric is used to rank entries.
ILSVRC 2017 分类竞赛的结果。ILSVRC[30]是一个年度计算机视觉竞赛，被证明是图像分类模型发展的沃土。ILSVRC 2017 分类任务的训练和验证数据来自 ImageNet 2012 数据集，而测试集包含额外的未标记的 10 万张图像。为了竞争的目的，使用 top-5 错误率度量来对输入条目进行排序。
SENets formed the foundation of our submission to the challenge where we won first place. Our winning entry comprised a small ensemble of SENets that employed a standard multi-scale and multi-crop fusion strategy to obtain a 2.251%2.251% top-5 error on the test set. This result represents a ∼25%∼25% relative improvement on the winning entry of 2016 (2.99%2.99% top-5 error). One of our high-performing networks is constructed by integrating SE blocks with a modified ResNeXt [43] (details of the modifications are provided in Appendix A). We compare the proposed architecture with the state-of-the-art models on the ImageNet validation set in Table 3. Our model achieves a top-1 error of 18.68%18.68% and a top-5 error of 4.47%4.47% using a 224×224224 ×224 centre crop evaluation on each image (where the shorter edge is first resized to 256). To enable a fair comparison with previous models, we also provide a 320×320320×320 centre crop evaluation, obtaining the lowest error rate under both the top-1 (17.28%17.28%) and the top-5 (3.79%3.79%) error metrics.
SENets 是我们在挑战中赢得第一名的基础。我们的获胜输入由一小群 SENets 组成，它们采用标准的多尺度和多裁剪图像融合策略，在测试集上获得了 2.251%2.251%的 top-5 错误率。这个结果表示在 2016 年获胜输入（2.99%2.99%的 top-5 错误率）的基础上相对改进了 ∼25%∼25% 。我们的高性能网络之一是将 SE 块与修改后的 ResNeXt[43]集成在一起构建的（附录 A 提供了这些修改的细节）。在表 3 中我们将提出的架构与最新的模型在 ImageNet 验证集上进行了比较。我们的模型在每一张图像使用 224×224224×224 中间裁剪评估（短边首先归一化到 256）取得了 18.68%18.68%的 top-1 错误率和 4.47%4.47%的 top-5 错误率。为了与以前的模型进行公平的比较，我们也提供了 320×320320×320 的中心裁剪图像评估，在 top- 1(17.28%17.28%)和 top-5(3.79%3.79%)的错误率度量中获得了最低的错误率。

6.2 Scene Classification 场景分类

Large portions of the ImageNet dataset consist of images dominated by single objects. To evaluate our proposed model in more diverse scenarios, we also evaluate it on the Places365-Challenge dataset [48] for scene classification. This dataset comprises 8 million training images and 36, 500 validation images across 365 categories. Relative to classification, the task of scene understanding can provide a better assessment of the ability of a model to generalise well and handle abstraction, since it requires the capture of more complex data associations and robustness to a greater level of appearance variation.
ImageNet 数据集的大部分由单个对象支配的图像组成。为了在更多不同的场景下评估我们提出的模型，我们还在 Places365- Challenge 数据集[48]上对场景分类进行评估。该数据集包含 800 万张训练图像和 365 个类别的 36500 张验证图像。相对于分类，场景理解的任务可以更好地评估模型泛化和处理抽象的能力，因为它需要捕获更复杂的数据关联以及对更大程度外观变化的鲁棒性。
We use ResNet-152 as a strong baseline to assess the effectiveness of SE blocks and follow the evaluation protocol in [33]. Table 4 shows the results of training a ResNet-152 model and a SE-ResNet-152 for the given task. Specifically, SE-ResNet-152 (11.01%11.01%top-5 error) achieves a lower validation error than ResNet-152 (11.61%11.61% top-5 error), providing evidence that SE blocks can perform well on different datasets. This SENet surpasses the previous state-of-the-art model Places-365-CNN [33] which has a top-5 error of 11.48%11.48% on this task.
我们使用 ResNet-152 作为强大的基线来评估 SE 块的有效性，并遵循[33]中的评估协议。表 4 显示了针对给定任务训练 ResNet-152 模型和 SE-ResNet-152 的结果。具体而言，SE-ResNet-152（11.01%11.01% 的 top-5 错误率）取得了比 ResNet-152（11.61%11.61%的 top-5 错误率）更低的验证错误率，证明了 SE 块可以在不同的数据集上表现良好。这个 SENet 超过了先前的最先进的模型 Places-365-CNN [33]，它在这个任务上有 11.48%11.48%的 top-5 错误率。

6.3 Analysis and Discussion Reduction ratio 分析和讨论减少比率

The reduction ratio rr introduced in Eqn. (5) is an important hyperparameter which allows us to vary the capacity and computational cost of the SE blocks in the model. To investigate this relationship, we conduct experiments based on the SE-ResNet-50 architecture for a range of different rr values. The comparison in Table 5 reveals that performance does not improve monotonically with increased capacity. This is likely to be a result of enabling the SE block to overfit the channel interdependencies of the training set. In particular, we found that setting r=16r=16 achieved a good tradeoff between accuracy and complexity and consequently, we used this value for all experiments.
公式（5）中引入的减少比率 rr 是一个重要的超参数，它允许我们改变模型中 SE 块的容量和计算成本。为了研究这种关系，我们基于 SE-ResNet-50 架构进行了一系列不同 rr 值的实验。表 5 中的比较表明，性能并没有随着容量的增加而单调上升。这可能是使 SE 块能够过度拟合训练集通道依赖性的结果。尤其是我们发现设置 r=16r=16 在精度和复杂度之间取得了很好的平衡，因此我们将这个值用于所有的实验。

The role of Excitation. While SE blocks have been empirically shown to improve network performance, we would also like to understand how the self-gating excitation mechanism operates in practice. To provide a clearer picture of the behaviour of SE blocks, in this section we study example activations from the SE-ResNet-50 model and examine their distribution with respect to different classes at different blocks. Specifically, we sample four classes from the ImageNet dataset that exhibit semantic and appearance diversity, namely goldfish, pug, plane and cliff (example images from these classes are shown in Fig. 7). We then draw fifty samples for each class from the validation set and compute the average activations for fifty uniformly sampled channels in the last SE block in each stage (immediately prior to downsampling) and plot their distribution in Fig. 8. For reference, we also plot the distribution of average activations across all 1000 classes.
激励的作用。虽然 SE 块从经验上显示出其可以改善网络性能，但我们也想了解自门激励机制在实践中是如何运作的。为了更清楚地描述 SE 块的行为，本节我们研究 SE-ResNet-50 模型的样本激活，并考察它们在不同块不同类别下的分布情况。具体而言，我们从 ImageNet 数据集中抽取了四个类，这些类表现出语义和外观多样性，即金鱼，哈巴狗，刨和悬崖（图 7 中显示了这些类别的示例图像）。然后，我们从验证集中为每个类抽取 50 个样本，并计算每个阶段最后的 SE 块中 50 个均匀采样通道的平均激活（紧接在下采样之前），并在图 8 中绘制它们的分布。作为参考，我们也绘制所有 1000 个类的平均激活分布。

We make the following three observations about the role of Excitation in SENets. First, the distribution across different classes is nearly identical in lower layers, e.g. SE_2_3. This suggests that the importance of feature channels is likely to be shared by different classes in the early stages of the network. Interestingly however, the second observation is that at greater depth, the value of each channel becomes much more class-specific as different classes exhibit different preferences to the discriminative value of features e.g. SE_4_6 and SE_5_1. The two observations are consistent with findings in previous work [21, 46], namely that lower layer features are typically more general (i.e. class agnostic in the context of classification) while higher layer features have greater specificity. As a result, representation learning benefits from the recalibration induced by SE blocks which adaptively facilitates feature extraction and specialisation to the extent that it is needed. Finally, we observe a somewhat different phenomena in the last stage of the network. SE_5_2 exhibits an interesting tendency towards a saturated state in which most of the activations are close to 1 and the remainder are close to 0. At the point at which all activations take the value 1, this block would become a standard residual block. At the end of the network in the SE_5_3 (which is immediately followed by global pooling prior before classifiers), a similar pattern emerges over different classes, up to a slight change in scale (which could be tuned by the classifiers). This suggests that SE_5_2 and SE_5_3 are less important than previous blocks in providing recalibration to the network. This finding is consistent with the result of the empirical investigation in Sec. 4 which demonstrated that the overall parameter count could be significantly reduced by removing the SE blocks for the last stage with only a marginal loss of performance (< 0.1%0.1% top-1 error).
我们对 SENets 中 Excitation 的作用提出以下三点看法。首先，不同类别的分布在较低层中几乎相同，例如，SE_2_3。这表明在网络的最初阶段特征通道的重要性很可能由不同的类别共享。然而有趣的是，第二个观察结果是在更大的深度，每个通道的值变得更具类别特定性，因为不同类别对特征的判别性值具有不同的偏好。SE_4_6 和SE_5_1。这两个观察结果与以前的研究结果一致[21,46]，即低层特征通常更普遍（即分类中不可知的类别），而高层特征具有更高的特异性。因此，表示学习从 SE 块引起的重新校准中受益，其自适应地促进特征提取和专业化到所需要的程度。最后，我们在网络的最后阶段观察到一个有些不同的现象。SE_5_2 呈现出朝向饱和状态的有趣趋势，其中大部分激活接近于 1，其余激活接近于 0。在所有激活值取 1 的点处，该块将成为标准残差块。在网络的末端 SE_5_3 中（在分类器之前紧接着是全局池化），类似的模式出现在不同的类别上，尺度上只有轻微的变化（可以通过分类器来调整）。这表明，SE_5_2 和 SE_5_3 在为网络提供重新校准方面比前面的块更不重要。这一发现与第四节实证研究的结果是一致的，这表明，通过删除最后一个阶段的 SE 块，总体参数数量可以显著减少，性能只有一点损失（<0.1%0.1%的 top1 错误率）

7. Conclusion 结论

In this paper we proposed the SE block, a novel architectural unit designed to improve the representational capacity of a network by enabling it to perform dynamic channel-wise feature recalibration. Extensive experiments demonstrate the effectiveness of SENets which achieve stateof-the-art performance on multiple datasets. In addition, they provide some insight into the limitations of previous architectures in modelling channelwise feature dependencies, which we hope may prove useful for other tasks requiring strong discriminative features. Finally, the feature importance induced by SE blocks may be helpful to related fields such as network pruning for compression.
在本文中，我们提出了 SE 块，这是一种新颖的架构单元，旨在通过使网络能够执行动态通道特征重新校准来提高网络的表示能力。大量实验证明了 SENets 的有效性，其在多个数据集上取得了最先进的性能。此外，它们还提供了一些关于以前架构在建模通道特征依赖性上的局限性的洞察，我们希望可能证明 SENets 对其它需要强判别性特征的任务是有用的。最后，由 SE 块引起的特征重要性可能有助于相关领域，例如为了压缩的网络修剪。

你可能感兴趣的:(深度学习论文阅读,深度学习,机器学习,python,算法)

python中列表排序 hedgehog" python python list
Python中列表的排序方法1.sort()方法2.sorted()方法========================================1.sort()函数，无返回值主要参数：（1）key:用来进行比较的元素，指定可迭代对象的一个元素作为参数来进行排序。（2）reverse:排序规则。reverse=True降序排序reverse=False升序排序（默认）示例1：list1=[5
python 列表排序 rainynights Python
在我们实际使用中，对于列表的操作是十分常见的。对于列表的数据，在很多特殊的情况下我们需要对列表内的数据进行排列以达到我们特定的显示需求。今天，我们一起看一下python中关于列表排序的一些知识。有些时候我们希望对列表进行排序后，列表可以保存我们排序后的结果，但是很多情况下我们只是希望通过列表的排序，临时的显示排序结果而已。所以对于列表的排序可以分为永久性的排序和临时性的排序。sort()sort(
使用Python和LangChain构建检索增强生成（RAG）应用的详细指南 m0_57781768 python langchain 搜索引擎
使用Python和LangChain构建检索增强生成（RAG）应用的详细指南引言在人工智能和自然语言处理领域，利用大语言模型（LLM）构建复杂的问答（Q&A）系统是一个重要应用。检索增强生成（RetrievalAugmentedGeneration，RAG）是一种技术，通过将模型知识与额外数据结合来增强LLM的能力，使其能够回答关于特定源信息的问题。这些应用不仅限于公开数据，还可以处理私有数据和模
华为OD机试 - 相对开音节 - 正则表达式（Python/JS/C/C++ 2024 E卷 100分）哪吒华为od 正则表达式 python
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述相对开音节构成的结构为辅音+元音（aeiou）+辅音(r除外)+
华为OD机试 - 数列描述 - 动态规划（Python/JS/C/C++ 2024 B卷 100分）哪吒华为od 动态规划 python
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述有一个数列a[N](N=60)，从a[0]开始，每一项都是一个数
华为OD机试 - 输出单向链表中倒数第k个结点 - 双指针（Python/JS/C/C++ 2024 B卷 100分）哪吒华为od 链表 python
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述输入一个单向链表，输出该链表中倒数第k个结点，链表的倒数第1个结
华为OD机试 - 图片整理（Python/JS/C/C++ 2024 B卷 100分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述Lily上课时使用字母数字图片教小朋友们学习英语单词，每次都需要
华为OD机试 - 宜居星球改造计划 - 图的多源BFS（Python/JS/C/C++ 2023 B卷 100分）哪吒华为od 宽度优先 python
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述2XXX年，人类通过对火星的大气进行宜居改造分析，使得火星已在理
华为OD机试 - 红黑图（Python/JS/C/C++ 2023 B卷100分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述众所周知红黑树是一种平衡树，它最突出的特性就是不能有两个相邻的红
华为OD机试 - DNA序列（Python/JS/C/C++ 2023 B卷 100分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述一个DNA序列由A/C/G/T四个字母的排列组合组成。G和C的比
华为OD机试 - 书籍叠放 - 逻辑分析（Python/JS/C/C++ 2024 B卷 200分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述书籍的长、宽都是整数对应(l,w)。如果书A的长宽度都比B长宽大
华为OD机试 - 购买水果最便宜的方案 - 数组（Python/JS/C/C++ 2024 C卷 100分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述有m个水果超市在1-n个小时的不同时间段提供不同价格的打折水果，
华为OD机试 - 目录删除 - 深度优先搜索dfs算法（Python/JS/C/C++ 2024 B卷 200分）哪吒算法华为od 深度优先
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述某文件系统中有N个目录，每个目录都有一个独一无二的ID。每个目录
华为OD机试 - 寻找最富裕的小家庭（Python/JS/C/C++ 2024 D卷 100分）哪吒华为od python javascript
华为OD机试2024E卷题库疯狂收录中，刷题点这里专栏导读本专栏收录于《华为OD机试真题（Python/JS/C/C++）》。刷的越多，抽中的概率越大，私信哪吒，备注华为OD，加入华为OD刷题交流群，每一题都有详细的答题思路、详细的代码注释、3个测试用例、为什么这道题采用XX算法、XX算法的适用场景，发现新题目，随时更新。一、题目描述在一棵树中，每个节点代表一个家庭成员，节点的数字表示其个人的财富
B2143 进制转换 1101.01 算法 c++
题目描述用递归算法将一个十进制整数X（1≤X≤109）转换成任意进制数M（2≤M≤16，M为整数）。输入格式一行两个数，第一个十进制整数X，第二个为进制M。输出格式输出结果。输入输出样例输入#1复制3116输出#1复制1F说明/提示样例解释。将十进制31转化为十六进制数。#includeusingnamespacestd;chars[16]={'0','1','2','3','4','5','6'
学习111 麋鹿叔叔学习
项目名称项目简介主要功能技术原理GitHub地址browser-use智能浏览器工具，让AI像人类一样操作浏览器，实现网页自动化网页浏览与操作、多标签页管理、视觉识别与内容提取、操作记录与重复执行、自定义动作支持、主流LLM模型支持为大语言模型服务的创新Python工具库GitHubEkoFellouAI推出的生产就绪型JavaScript框架，基于自然语言驱动创建智能代理支持所有平台，提供统一便
不用再当“技术宅“！这个AI神器让我5分钟变身人工智能达人阳光永恒736 AI工具人工智能 deepseek 一键包本地部署 AI资源
最近我在朋友圈刷到好多朋友都在玩AI画图、AI写诗，看得我心痒痒。可每次想自己试试，打开教程就被满屏的代码吓退——"Python环境配置"、"CUDA驱动安装"这些词比数学作业还让人头疼。直到我发现了一个叫DeepSeek本地部署一键包的神器，我的AI探索之旅终于变得像搭乐高一样简单！夸克网盘分享一、原来AI离我们这么近上周三放学路上，我看见隔壁班的小美用AI给自己照片生成古风造型，这让我突然意识
创建Datas 一一代码 python
核心数据结构创建DataFrame```pythonimportpandasaspd#从字典创建DataFramedata={'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'City':['NewYork','LosAngeles','Chicago']}df=pd.DataFrame(data)print(df)```输出：```NameAg
3.19学习总结 2402_88131930 学习
学习了Java中的面向对象的知识点完成一道算法题，找树左下角的值，错误的以为左下角只能是最底层的左节点，但指的是最底层最左边的节点
Umi-OCR 实践教程：离线、免费、高效的图像文字识别工具几道之旅人工智能智能体及数字员工 ocr 人工智能
一、工具简介Umi-OCR是一款开源、免费且支持离线运行的OCR（光学字符识别）工具，适用于Windows和Linux系统。它基于深度学习技术，能够高效提取图像中的文字，支持多语言识别、批量处理、截屏识别等功能，尤其适合对隐私敏感或网络受限的场景。核心亮点：离线运行：无需联网，保护隐私。多引擎支持：提供Paddle（高性能）和Rapid（低配兼容）两种引擎。批量处理：支持图片、PDF、电子书等多格
python将网银web工程转换成客户端electron工程案例银行金融科技人工智能机器学习 DeepSeek electron
以下是一个将网银Web工程转换为Electron客户端的技术方案，结合Python和Electron实现桌面端增强功能：bash#项目结构webank-electron/├──main/#Electron主进程代码│├──main.js│└──python_server.py├──renderer/#网页渲染进程│└──webank-web/#原始网银Web工程├──package.json└──
基于ChatGPT、GIS与Python机器学习的地质灾害风险评估、易发性分析、信息化建库及灾后重建高级实践 weixin_贾防洪评价风险评估滑坡泥石流地质灾害
第一章、ChatGPT、DeepSeek大语言模型提示词与地质灾害基础及平台介绍【基础实践篇】1、什么是大模型？大模型（LargeLanguageModel,LLM）是一种基于深度学习技术的大规模自然语言处理模型。代表性大模型：GPT-4、BERT、T5、ChatGPT等。特点：多任务能力：可以完成文本生成、分类、翻译、问答等任务。上下文理解：能理解复杂的上下文信息。广泛适配性：适合科研、教育、行
栈力扣hot100热门面试算法题面试基础核心思路背题滑动窗口最大值字符串解码每日温度柱状图中最大矩形有效的括号最小栈尘土哥算法 leetcode 面试
栈栈的核心思路：每个数都要进栈or队列，但是要及时维护栈or队列，当某元素没有存在的意义时就删掉，关键是思考栈尾什么时候有用与没用。滑动窗口最大值https://leetcode.cn/problems/sliding-window-maximum/题解链接https://leetcode.cn/problems/sliding-window-maximum/solutions/3067170/d
anythingLLM 使用教程惟贤箬溪穷玩Ai AIGC 人工智能
一、anythingLLM简介anythingLLM是一款灵活且功能强大的语言模型，它基于先进的深度学习架构构建，旨在为用户提供多样化的自然语言处理服务。其设计理念注重通用性和可扩展性，能够适应多种领域和任务，无论是文本生成、智能问答，还是翻译、摘要提取等，都能展现出出色的性能。与同类模型相比，anythingLLM具有训练数据丰富、模型优化程度高的优势，能够生成更符合逻辑、更具实用性的文本内容。
使用OTP动态令牌认证 yangtom249 Python python
为加强网络安全管理，降低帐号被冒用、盗用等带来的风险，有些系统启用OTP手机令牌双因子认证登录，即在原有用户名+密码认证的基础上，增加OTP动态口令认证。基于OTP算法的动态令牌加强了帐号的安全性，简单易用。1、什么是OTP动态令牌认证？OTP（One-TimePassword）是一种基于共享密钥和时间戳算法的一次性密码。一般每30或60秒产生一个新口令，在客户端的动态口令和服务器的动态口令验证时
深度解析大模型推理框架：原理、应用与实践百度_开发者中心人工智能大模型自然语言处理
在当今数据驱动的时代，大模型推理框架已经成为人工智能领域的重要支柱。本文将通过简明扼要、清晰易懂的方式，带领读者深入了解大模型推理框架的原理、应用领域和实践经验，帮助读者更好地掌握这一技术，并在实际工作中发挥其价值。一、大模型推理框架简介大模型推理框架是指一种基于深度学习技术的推理框架，主要用于解决大规模数据集下的复杂问题。该框架通过对海量数据进行高效的训练和推理，能够快速地对各种复杂场景进行分析
大模型推理框架：从理论到实践的全面解析百度_开发者中心人工智能大模型自然语言处理
在数据驱动的时代，深度学习技术已经渗透到各个行业，从图像识别到自然语言处理，从推荐系统到智能客服，其应用无处不在。然而，深度学习模型的训练和推理过程往往涉及大量数据和复杂计算，传统的计算框架难以满足需求。因此，大模型推理框架应运而生，成为解决这一问题的关键。一、大模型推理框架基本概念大模型推理框架是一种基于深度学习技术的推理框架，它通过对海量数据进行高效的训练和推理，能够快速地对各种复杂场景进行分
广州各大IT公司情况调查总结 Monika Zhang 就业面试攻略其他
腾讯微信地址：广东省广州市海珠区新港中路397号TIT创意园B1-B3号使用C语言，C#居多门槛比较高字节跳动广州市天河区珠江东路6号广州周大福金融中心15层01-06室应聘比较注重算法阿里广州市海珠区阅江西路唯品会总部大厦西侧约170米不需要机试，面试难度比较高，注重技术深度，要有一技之长华为广州市黄埔区黄埔东路与红荔西路交叉路口往南约80米需要机试，三道算法题，400分，150分及格，多刷题不
PHP与Java的区别分析 Monika Zhang java 架构设计 php java 开发语言
一、语言特点php：一种的像Python的动态弱语言类型的服务器脚本语言，不需要编译代码；它是专为Web开发目的而开发和设计的，而且简单容易上手。Java：是一种通用的面向对象编程语言，属于强势优选语言类型，在执行前必须先正确编译。是面向对象的和人类可读的；支持服务器端和客户端；可用于开发独立应用程序或基于Web的应用程序，上手比PHP难。二、语法1.PHP是一种脚本语言，代码在服务器上执行，而结
闭包的概念总结与分析 Monika Zhang java java
1定义闭包又称词法闭包闭包最早定义为一种包含和的实体.在计算机科学中，闭包（英语：Closure），又称词法闭包（LexicalClosure）或函数闭包（functionclosures），是引用了自由变量的函数。解释一：闭包是引用了自由变量的函数，这个被引用的变量将和这个函数一同存在。解释二：闭包是函数和相关引用环境组成的实体。注：：除了局部变量的其他变量《Python核心编程》对闭包的解释:
java线程的无限循环和退出 3213213333332132 java
最近想写一个游戏，然后碰到有关线程的问题，网上查了好多资料都没满足。突然想起了前段时间看的有关线程的视频，于是信手拈来写了一个线程的代码片段。希望帮助刚学java线程的童鞋 package thread; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date
tomcat 容器 BlueSkator tomcat Web servlet
Tomcat的组成部分 1、server A Server element represents the entire Catalina servlet container. (Singleton) 2、service service包括多个connector以及一个engine，其职责为处理由connector获得的客户请求。 3、connector 一个connector
php递归,静态变量,匿名函数使用 dcj3sjt126com PHP 递归函数匿名函数静态变量引用传参
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body>
属性颜色字体变化周华华 JavaScript
function changSize(className){ var diva=byId("fot") diva.className=className; } </script> <style type="text/css"> .max{ background: #900; color:#039;
将properties内容放置到map中 g21121 properties
代码比较简单： private static Map<Object, Object> map; private static Properties p; static { //读取properties文件 InputStream is = XXX.class.getClassLoader().getResourceAsStream("xxx.properti
[简单]拼接字符串 53873039oycg 字符串
工作中遇到需要从Map里面取值拼接字符串的情况，自己写了个，不是很好，欢迎提出更优雅的写法，代码如下： import java.util.HashMap; import java.uti
Struts2学习云端月影
最近开始关注struts2的新特性，从这个版本开始，Struts开始使用convention-plugin代替codebehind-plugin来实现struts的零配置。配置文件精简了，的确是简便了开发过程，但是，我们熟悉的配置突然disappear了，真是一下很不适应。跟着潮流走吧，看看该怎样来搞定convention-plugin。使用Convention插件，你需要将其JAR文件放
Java新手入门的30个基本概念二 aijuans java 新手 java 入门
基本概念:　　1.OOP中唯一关系的是对象的接口是什么,就像计算机的销售商她不管电源内部结构是怎样的,他只关系能否给你提供电就行了,也就是只要知道can or not而不是how and why.所有的程序是由一定的属性和行为对象组成的,不同的对象的访问通过函数调用来完成,对象间所有的交流都是通过方法调用,通过对封装对象数据,很大限度上提高复用率。　　2.OOP中最重要的思想是类,类是模板是蓝图,
jedis 简单使用 antlove java redis cache command jedis
jedis.RedisOperationCollection.java package jedis; import org.apache.log4j.Logger; import redis.clients.jedis.Jedis; import java.util.List; import java.util.Map; import java.util.Set; pub
PL/SQL的函数和包体的基础百合不是茶 PL/SQL编程函数包体显示包的具体数据包
由于明天举要上课,所以刚刚将代码敲了一遍PL/SQL的函数和包体的实现(单例模式过几天好好的总结下再发出来);以便明天能更好的学习PL/SQL的循环,今天太累了,所以早点睡觉,明天继续PL/SQL总有一天我会将你永远的记载在心里,,, 函数; 函数:PL/SQL中的函数相当于java中的方法;函数有返回值定义函数的 --输入姓名找到该姓名的年薪 create or re
Mockito(二)--实例篇 bijian1013 持续集成 mockito 单元测试
学习了基本知识后，就可以实战了，Mockito的实际使用还是比较麻烦的。因为在实际使用中，最常遇到的就是需要模拟第三方类库的行为。比如现在有一个类FTPFileTransfer，实现了向FTP传输文件的功能。这个类中使用了a
精通Oracle10编程SQL(7)编写控制结构 bijian1013 oracle 数据库 plsql
/* *编写控制结构 */ --条件分支语句 --简单条件判断 DECLARE v_sal NUMBER(6,2); BEGIN select sal into v_sal from emp where lower(ename)=lower('&name'); if v_sal<2000 then update emp set
【Log4j二】Log4j属性文件配置详解 bit1129 log4j
如下是一个log4j.properties的配置 log4j.rootCategory=INFO, stdout , R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appe
java集合排序笔记白糖_ java
public class CollectionDemo implements Serializable,Comparable<CollectionDemo>{ private static final long serialVersionUID = -2958090810811192128L; private int id; private String nam
java导致linux负载过高的定位方法 ronin47
定位java进程ID 可以使用top或ps -ef |grep java ![图片描述][1] 根据进程ID找到最消耗资源的java pid 比如第一步找到的进程ID为5431 执行 top -p 5431 -H ![图片描述][2] 打印java栈信息 $ jstack -l 5431 > 5431.log 在栈信息中定位具体问题将消耗资源的Java PID转
给定能随机生成整数1到5的函数，写出能随机生成整数1到7的函数 bylijinnan 函数
import java.util.ArrayList; import java.util.List; import java.util.Random; public class RandNFromRand5 { /** 题目：给定能随机生成整数1到5的函数，写出能随机生成整数1到7的函数。解法1： f(k) = (x0-1)*5^0+(x1-
PL/SQL Developer保存布局 Kai_Ge
近日由于项目需要，数据库从DB2迁移到ORCAL，因此数据库连接客户端选择了PL/SQL Developer。由于软件运用不熟悉，造成了很多麻烦，最主要的就是进入后，左边列表有很多选项，自己删除了一些选项卡，布局很满意了，下次进入后又恢复了以前的布局，很是苦恼。在众多PL/SQL Developer使用技巧中找到如下这段： &n
[未来战士计划]超能查派[剧透,慎入] comsci 计划
非常好看,超能查派,这部电影......为我们这些热爱人工智能的工程技术人员提供一些参考意见和思想........ 虽然电影里面的人物形象不是非常的可爱....但是非常的贴近现实生活.... &nbs
Google Map API V2 dai_lm google map
以后如果要开发包含google map的程序就更麻烦咯 http://www.cnblogs.com/mengdd/archive/2013/01/01/2841390.html 找到篇不错的文章，大家可以参考一下 http://blog.sina.com.cn/s/blog_c2839d410101jahv.html 1. 创建Android工程由于v2的key需要G
java数据计算层的几种解决方法2 datamachine java sql 集算器
2、SQL SQL/SP/JDBC在这里属于一类，这是老牌的数据计算层，性能和灵活性是它的优势。但随着新情况的不断出现，单纯用SQL已经难以满足需求，比如： JAVA开发规模的扩大，数据量的剧增，复杂计算问题的涌现。虽然SQL得高分的指标不多，但都是权重最高的。成熟度：5星。最成熟的。
Linux下Telnet的安装与运行 dcj3sjt126com linux telnet
Linux下Telnet的安装与运行 linux默认是使用SSH服务的而不安装telnet服务如果要使用telnet 就必须先安装相应的软件包即使安装了软件包默认的设置telnet 服务也是不运行的需要手工进行设置如果是redhat9，则在第三张光盘中找到 telnet-server-0.17-25.i386.rpm
PHP中钩子函数的实现与认识 dcj3sjt126com PHP
假如有这么一段程序： function fun(){ fun1(); fun2(); } 首先程序执行完fun1()之后执行fun2()然后fun()结束。但是，假如我们想对函数做一些变化。比如说，fun是一个解析函数，我们希望后期可以提供丰富的解析函数，而究竟用哪个函数解析，我们希望在配置文件中配置。这个时候就可以发挥钩子的力量了。我们可以在fu
EOS中的WorkSpace密码修改蕃薯耀修改WorkSpace密码
EOS中BPS的WorkSpace密码修改 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 201
SpringMVC4零配置--SpringSecurity相关配置【SpringSecurityConfig】 hanqunfeng SpringSecurity
SpringSecurity的配置相对来说有些复杂，如果是完整的bean配置，则需要配置大量的bean，所以xml配置时使用了命名空间来简化配置，同样，spring为我们提供了一个抽象类WebSecurityConfigurerAdapter和一个注解@EnableWebMvcSecurity，达到同样减少bean配置的目的，如下： applicationContex
ie 9 kendo ui中ajax跨域的问题 jackyrong AJAX跨域
这两天遇到个问题，kendo ui的datagrid，根据json去读取数据，然后前端通过kendo ui的datagrid去渲染，但很奇怪的是，在ie 10,ie 11,chrome,firefox等浏览器中，同样的程序，浏览起来是没问题的，但把应用放到公网上的一台服务器，却发现如下情况： 1） ie 9下，不能出现任何数据，但用IE 9浏览器浏览本机的应用，却没任何问题
不要让别人笑你不能成为程序员 lampcy 编程程序员
在经历六个月的编程集训之后，我刚刚完成了我的第一次一对一的编码评估。但是事情并没有如我所想的那般顺利。说实话，我感觉我的脑细胞像被轰炸过一样。手慢慢地离开键盘，心里很压抑。不禁默默祈祷：一切都会进展顺利的，对吧？至少有些地方我的回答应该是没有遗漏的，是不是？难道我选择编程真的是一个巨大的错误吗——我真的永远也成不了程序员吗？我需要一点点安慰。在自我怀疑，不安全感和脆弱等等像龙卷风一
马皇后的贤德 nannan408
马皇后不怕朱元璋的坏脾气，并敢理直气壮地吹耳边风。众所周知，朱元璋不喜欢女人干政，他认为“后妃虽母仪天下，然不可使干政事”，因为“宠之太过，则骄恣犯分，上下失序”，因此还特地命人纂述《女诫》，以示警诫。但马皇后是个例外。　　有一次，马皇后问朱元璋道：“如今天下老百姓安居乐业了吗？”朱元璋不高兴地回答：“这不是你应该问的。”马皇后振振有词地回敬道：“陛下是天下之父，
选择某个属性值最大的那条记录（不仅仅包含指定属性，而是想要什么属性都可以） Rainbow702 sql group by 最大值 max 最大的那条记录
好久好久不写SQL了，技能退化严重啊！！！直入主题：比如我有一张表，file_info，它有两个属性（但实际不只，我这里只是作说明用）： file_code, file_version 同一个code可能对应多个version 现在，我想针对每一个code，取得它相关的记录中，version 值最大的那条记录， SQL如下： select *
VBScript脚本语言 tntxia VBScript
VBScript 是基于VB的脚本语言。主要用于Asp和Excel的编程。 VB家族语言简介 Visual Basic 6.0 源于BASIC语言。由微软公司开发的包含协助开发环境的事
java中枚举类型的使用 xiao1zhao2 java enum 枚举 1.5新特性
枚举类型是j2se在1.5引入的新的类型,通过关键字enum来定义,常用来存储一些常量. 1.定义一个简单的枚举类型 public enum Sex { MAN, WOMAN } 枚举类型本质是类,编译此段代码会生成.class文件.通过Sex.MAN来访问Sex中的成员,其返回值是Sex类型. 2.常用方法静态的values()方

深度学习论文阅读图像分类篇（六）：SENet《Squeeze-and-Excitation Networks》