datamonday

【CV-Paper 20】目标检测 05：Fast R-CNN-2015

论文原文：LINK
论文被引：10839(08/09/2020)

文章目录

Fast R-CNN
Abstract
1. Introduction
- 1.1. R-CNN and SPPnet
- 1.2. Contributions
2. Fast R-CNN architecture and training
- 2.1. The RoI pooling layer
- 2.2. Initializing from pre-trained networks
- 2.3. Fine-tuning for detection
- 2.4. Scale invariance
3. Fast R-CNN detection
- 3.1. Truncated SVD for faster detection
4. Main results
- 4.1. Experimental setup
- 4.2. VOC 2010 and 2012 results
- 4.3. VOC 2007 results
- 4.4. Training and testing time
- 4.5. Which layers to fine-tune?
5. Design evaluation
- 5.1. Does multi-task training help?
- 5.2. Scale invariance: to brute force or finesse?
- 5.3. Do we need more training data?
- 5.4. Do SVMs outperform softmax?
- 5.5. Are more proposals always better?
- 5.6. Preliminary MS COCO results
6. Conclusion

Fast R-CNN

Abstract

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16network 9× faster than R-CNN, is 213× faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3× faster , tests 10× faster , and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https: //github.com/rbgirshick/fast-rcnn.

本文提出了一种快速的基于区域的卷积网络方法（Fast R-CNN）进行目标检测。Fast R-CNN以先前的工作为基础，使用深度卷积网络对目标建议进行有效分类。与以前的工作相比，Fast R-CNN采用了多项创新，可以提高训练和测试速度，同时还可以提高检测精度。Fast R-CNN比R-CNN训练非常深的VGG16网络快9倍，在测试时快213倍，在PASCAL VOC 2012上达到更高的mAP。与SPPnet相比，Fast R-CNN训练VGG16快3倍，测试速度提高了10倍，并且更加准确。Fast R-CNN是使用Python和C ++（使用Caffe）实现的，并且可以在https://github.com/rbgirshick/fast-rcnn的开源MIT许可下获得。

1. Introduction

Recently, deep ConvNets [14, 16] have significantly improved image classification [14] and object detection [9, 19] accuracy. Compared to image classification, object detection is a more challenging task that requires more complex methods to solve. Due to this complexity, current approaches (e.g., [9, 11, 19, 25]) train models in multi-stage pipelines that are slow and inelegant.

最近，深层ConvNets [14，16]显着提高了图像分类[14]和对象检测[9，19]的准确性。与图像分类相比，目标检测是一项更具挑战性的任务，需要更复杂的方法来解决。由于这种复杂性，目前的方法（例如[9、11、19、25]）在速度较慢且不佳的多阶段管道中训练模型。

Complexity arises because detection requires the accurate localization of objects, creating two primary challenges. First, numerous candidate object locations (often called “proposals”) must be processed. Second, these candidates provide only rough localization that must be refined to achieve precise localization. Solutions to these problems often compromise speed, accuracy, or simplicity.

复杂性之所以出现是因为检测需要精确定位对象，从而带来两个主要挑战。首先，必须处理许多候选对象位置（通常称为“建议”）。其次，这些候选对象仅提供粗略的定位，必须对其进行细化以实现精确的定位。这些问题的解决方案通常会损失速度，准确性或简单性。

In this paper, we streamline the training process for stateof-the-art ConvNet-based object detectors [9, 11]. We propose a single-stage training algorithm that jointly learns to classify object proposals and refine their spatial locations.

在本文中，我们简化了基于ConvNet的最新对象检测器的训练过程[9，11]。我们提出了一种单阶段训练算法，该算法可以共同学习以对对象建议进行分类并优化其空间位置。

The resulting method can train a very deep detection network (VGG16 [20]) 9× faster than R-CNN [9] and 3× faster than SPPnet [11]. At runtime, the detection network processes images in 0.3s (excluding object proposal time) while achieving top accuracy on PASCAL VOC 2012 [7] with a mAP of 66% (vs. 62% for R-CNN).

所得方法可以训练非常深的检测网络（VGG16 [20]），其速度比R-CNN [9]快9倍，比SPPnet [11]快3倍。在运行时，检测网络在0.3s内处理图像（不包括对象建议时间），同时以66％的mAP（对于R-CNN为62％）达到 PASCAL VOC 2012[7] 的最高准确性。

1.1. R-CNN and SPPnet

The Region-based Convolutional Network method (RCNN) [9] achieves excellent object detection accuracy by using a deep ConvNet to classify object proposals. R-CNN, however, has notable drawbacks:

Training is a multi-stage pipeline. R-CNN first finetunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned.
Training is expensive in space and time. For SVM and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the VOC07 trainval set. These features require hundreds of gigabytes of storage.
Object detection is slow. At test-time, features are extracted from each object proposal in each test image. Detection with VGG16 takes 47s / image (on a GPU).

基于区域的卷积网络方法（RCNN）[9]通过使用深层的ConvNet对目标建议进行分类，实现了出色的目标检测精度。但是，R-CNN具有明显的缺点：

训练是一个多阶段的流程。 R-CNN首先使用对数损失函数在对象建议上对ConvNet进行微调。然后用SVM拟合ConvNet特征。这些SVM充当对象检测器，代替了通过微调学习的softmax分类器。在第三训练阶段，学习边界框回归。
训练在空间和时间上都很昂贵。对于SVM和边界框回归器训练，将从每个图像中的每个对象建议中提取特征并将其写入磁盘。对于非常深的网络（例如VGG16），此过程需要2.5天的GPU天来处理VOC07训练集的5k图像。这些功能需要数百GB的存储空间。
对象检测速度慢。在测试时，将从每个测试图像中的每个对象建议中提取特征。使用VGG16进行检测需要47秒/图像（在GPU上）。

R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) [11] were proposed to speed up R-CNN by sharing computation. The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map. Features are extracted for a proposal by maxpooling the portion of the feature map inside the proposal into a fixed-size output (e.g., 6 × 6). Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling[15]. SPPnet accelerates R-CNN by 10 to 100× at test time. Training time is also reduced by 3× due to faster proposal feature extraction.

R-CNN速度很慢，因为它为每个对象建议执行一次ConvNet前向传递，而不共享计算。[11]提出了空间金字塔池化网络（SPPnet），以通过共享计算来加速R-CNN。SPPnet方法为整个输入图像计算卷积特征图，然后使用从共享特征图中提取的特征向量对每个对象建议进行分类。通过将建议中的特征图的一部分最大池化为固定大小的输出（例如6×6）来提取建议的特征。池化多个输出大小，然后像在空间金字塔池化中一样进行串联[15]。在测试时，SPPnet将R-CNN的速度提高了10到100倍。由于建议特征提取速度更快，训练时间也减少了3倍。

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in [11] cannot update the convolutional layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks.

SPPnet也有明显的缺点。像R-CNN一样，训练是一个多阶段的管道，涉及提取特征，对网络进行对数损失微调，训练SVM，最后拟合边界框回归，特征也会写入磁盘。但是与R-CNN不同，文献[11]中提出的微调算法无法更新空间金字塔池之前的卷积层。毫不奇怪，此限制（固定的卷积层）限制了非常深的网络的准确性。

1.2. Contributions

We propose a new training algorithm that fixes the disadvantages of R-CNN and SPPnet, while improving on their speed and accuracy. We call this method Fast R-CNN because it’s comparatively fast to train and test. The Fast RCNN method has several advantages:

Higher detection quality (mAP) than R-CNN, SPPnet
Training is single-stage, using a multi-task loss
Training can update all network layers
No disk storage is required for feature caching

我们提出了一种新的训练算法，该算法可以解决R-CNN和SPPnet的缺点，同时提高其速度和准确性。我们称此方法为“Fast R-CNN”，因为它的训练和测试速度相对较快。Fast RCNN方法具有以下优点：

比R-CNN，SPPnet更高的检测质量（mAP）；
训练是单阶段的，使用多任务丢失；
训练可以更新所有网络层；
不需要磁盘存储进行功能缓存；

2. Fast R-CNN architecture and training

Fig. 1 illustrates the Fast R-CNN architecture. A Fast R-CNN network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map. Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class and another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes.

图1说明了Fast R-CNN架构。Fast R-CNN网络将整个图像和一组对象建议作为输入。网络首先使用几个卷积（conv）和最大池化层处理整个图像，以生成卷积特征图。然后，对于每个对象建议，感兴趣区域（region of interest，RoI）池化层从特征图中提取固定长度的特征向量。每个特征向量都被送到一系列全连接层中，这些层最终分支为两个同级输出层：一层在K个对象类以及所有“背景”类上产生softmax概率估计，另一层分别为K个对象类输出4个实数，表示边界框位置。

图1.Fast R-CNN架构。输入图像和多个感兴趣区域（RoI）被输入到完全卷积网络中。每个RoI被合并到一个固定大小的特征图中，然后通过完连接层（FC）映射到特征向量。该网络每个RoI有两个输出向量：softmax概率和每个类边界框回归偏移。该架构经过端到端的多任务损失训练。

2.1. The RoI pooling layer

The RoI pooling layer uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H ×W (e.g., 7×7), where H and W are layer hyper-parameters that are independent of any particular RoI. In this paper, an RoI is a rectangular window into a conv feature map. Each RoI is defined by a four-tuple (r, c, h, w) that specifies its top-left corner (r, c) and its height and width (h, w).

RoI合并层使用最大合并将任何有效感兴趣区域内的要素转换为具有固定空间范围H×W(e.g., 7×7)的小要素图，其中H和W是层超参数，独立于任何特定的投资回报率。在本文中，RoI是进入转换特征图的矩形窗口。每个RoI由一个四元组 (r, c, h, w) 定义，该四元组指定其左上角 (r, c) 以及其高度和宽度 (h, w)。

RoI max pooling works by dividing the h×w RoI window into an H×W grid of sub-windows of approximate size h/H × w/W and then max-pooling the values in each sub-window into the corresponding output grid cell. Pooling is applied independently to each feature map channel, as in standard max pooling. The RoI layer is simply the special-case of the spatial pyramid pooling layer used in SPPnets [11] in which there is only one pyramid level. We use the pooling sub-window calculation given in [11].

RoI最大池化的工作方式是将 h×w RoI 窗口划分为大小约为 h/H × w/W 的子窗口的 H×W 网格，然后将每个子窗口中的最大池化到相应的输出网格单元中。像标准最大池化一样，池化独立应用于每个特征图通道。RoI层只是在SPPnet [11]中使用的空间金字塔池化层的特例，在该空间中，金字塔层只有一个。我们使用[11]中给出的池化窗口计算。

2.2. Initializing from pre-trained networks

We experiment with three pre-trained ImageNet [4] networks, each with five max pooling layers and between five and thirteen conv layers (see Section 4.1 for network details). When a pre-trained network initializes a Fast R-CNN network, it undergoes three transformations.

我们使用三个经过预训练的ImageNet [4]网络进行实验，每个网络具有五个最大池化层以及五到十三个卷积层（有关网络详细信息，请参见第4.1节）。当预训练的网络初始化Fast R-CNN网络时，它将经历三个转换。

First, the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H and W to be compatible with the net’s first fully connected layer (e.g., H = W = 7 for VGG16).

首先，最后一个最大池化层被RoI池化层取代，该RoI池化层通过将H和W设置为与网络的第一个全连接层兼容（例如，对于VGG16，H = W = 7）进行配置。

Second, the network’s last fully connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully connected layer and softmax over K + 1 categories and category-specific bounding-box regressors).

其次，将网络的最后一个全连接层和softmax（经过1000路ImageNet分类训练）替换为先前描述的两个同级层（K + 1类的完全连接层和softmax以及特定于类别的边界盒回归器））。

其次，将网络的最后一个全连接层和softmax（经过1000路ImageNet分类训练）替换为先前描述的两个同级层（K + 1类的全连接层和softmax以及特定于类别的边界框回归器）。

Third, the network is modified to take two data inputs: a list of images and a list of RoIs in those images.

第三，修改网络以获取两个数据输入：图像列表和这些图像中的RoI列表。

2.3. Fine-tuning for detection

Training all network weights with back-propagation is an important capability of Fast R-CNN. First, let’s elucidate why SPPnet is unable to update weights below the spatial pyramid pooling layer.

使用反向传播训练所有网络权重是Fast R-CNN的一项重要功能。首先，让我们说明一下 为什么SPPnet无法更新空间金字塔池化层以下的权重。

The root cause is that back-propagation through the SPP layer is highly inefficient when each training sample (i.e. RoI) comes from a different image, which is exactly how R-CNN and SPPnet networks are trained. The inefficiency stems from the fact that each RoI may have a very large receptive field, often spanning the entire input image. Since the forward pass must process the entire receptive field, the training inputs are large (often the entire image).

根本原因是，当每个训练样本（即RoI）来自不同的图像时，通过SPP层进行的反向传播效率非常低，这正是R-CNN和SPPnet网络的训练方式。效率低下的原因是，每个RoI可能都有很大的接收场，通常跨越整个输入图像。由于前向通过必须处理整个接收场，因此训练输入很大（通常是整个图像）。

We propose a more efficient training method that takes advantage of feature sharing during training. In Fast RCNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image. Critically, RoIs from the same image share computation and memory in the forward and backward passes. Making N small decreases mini-batch computation. For example, when using N = 2 and R = 128, the proposed training scheme is roughly 64× faster than sampling one RoI from 128 different images (i.e., the R-CNN and SPPnet strategy).

我们提出了一种更有效的训练方法，该方法可以在训练过程中利用特征共享。在Fast RCNN训练中，首先对N个图像进行采样，然后对每个图像中的 R/N RoI 进行采样，从而对随机梯度下降（SGD）小批次进行分层采样。至关重要的是，来自同一图像的RoI在前向和后向计算中共享计算和内存。使N变小会减少小批量计算。例如，当使用N = 2和R = 128时，提出的训练方案比从128个不同图像中采样一个RoI快大约64倍（即R-CNN和SPPnet策略）。

One concern over this strategy is it may cause slow training convergence because RoIs from the same image are correlated. This concern does not appear to be a practical issue and we achieve good results with N = 2 and R = 128 using fewer SGD iterations than R-CNN.

这种策略的一个担心是，由于来自同一图像的RoI相互关联，因此可能会导致训练收敛缓慢。这种担忧似乎不是实际问题，并且在N = 2和R = 128的情况下，与R-CNN相比，使用更少的SGD迭代可以获得良好的结果。

In addition to hierarchical sampling, Fast R-CNN uses a streamlined training process with one fine-tuning stage that jointly optimizes a softmax classifier and bounding-box regressors, rather than training a softmax classifier, SVMs, and regressors in three separate stages [9, 11]. The components of this procedure (the loss, mini-batch sampling strategy, back-propagation through RoI pooling layers, and SGD hyper-parameters) are described below.

除分层采样外，Fast R-CNN使用简化的训练过程和一个微调阶段来共同优化softmax分类器和边界框回归器，而不是在三个单独的阶段中训练softmax分类器，SVM和回归器[9，11]。下面描述了此过程的组成部分（损失，小批量采样策略，通过RoI池化层进行的反向传播以及SGD超参数）。

Multi-task loss. A Fast R-CNN network has two sibling output layers. The first outputs a discrete probability distribution (per RoI), $p = (p_0, . . . , p_K)$ , over $K + 1$ categories. As usual, p is computed by a softmax over the $K + 1$ outputs of a fully connected layer. The second sibling layer outputs bounding-box regression offsets, $t^k= (t_k^x, t_k^y, t_k^w, t_k^h)$ , for each of the K object classes, indexed by k. We use the parameterization for tkgiven in [9], in which $t^k$ specifies a scale-invariant translation and log-space height/width shift relative to an object proposal.

多任务损失。Fast R-CNN网络具有两个同级输出层。第一个输出在K + 1个类别上的离散概率分布（每个RoI）， $p = (p_0, . . . , p_K)$ 。通常， $p$ 是通过在全连接层的K + 1个输出上的softmax计算的。第二个同级层为K个索引中的每个K类对象输出边界框回归偏移 $t^k= (t_k^x, t_k^y, t_k^w, t_k^h)$ 。我们使用[9]中给出的 $t^k$ 的参数化，其中 $t^k$ 指定了相对于对象建议的尺度不变平移和对数空间高度/宽度偏移。

Each training RoI is labeled with a ground-truth class u and a ground-truth bounding-box regression target v. We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression:

每个训练的投资回报率都标有地面真理类 $u$ 和地面真理的包围盒回归目标 $v$ 。我们在每个标记的投资回报率上使用多任务损失L来共同训练分类和边界框回归：

in which $L_{cls}(p, u) = −logp_u$ is log loss for true class u.

其中 $L_{cls}(p, u) = −logp_u$ 是真类u的对数损失。

The second task loss, $L_{loc}$ , is defined over a tuple of true bounding-box regression targets for class $u, v = (v_x, v_y, v_w, v_h)$ , and a predicted tuple $t^u= (t^u_x, t^u_y, t^u_w, t^u_h)$ , again for class u. The Iverson bracket indicator function $[u \geq 1]$ evaluates to 1 when $u \geq 1$ and 0 otherwise. By convention the catch-all background class is labeled $u = 0$ . For background RoIs there is no notion of a ground-truth bounding box and hence $L_{loc}$ is ignored. For bounding-box regression, we use the loss

第二个任务损失 $L_{loc}$ 是在针对类 $u, v = (v_x, v_y, v_w, v_h)$ 和预测元组 $t^u= (t^u_x, t^u_y, t^u_w, t^u_h)$ 的真实边界框回归目标的元组上定义的，tu h），再次用于类u。Iverson bracket indicator函数 $[u \geq 1]$ 在 $u \geq 1$ 时计算为1，否则为0。按照惯例，万能背景类标记为 $u = 0$ 。对于背景RoIs，没有真实边界框的概念，因此 $L_{loc}$ 被忽略。对于边界框回归，我们使用损失

in which

is a robust $L_1$ loss that is less sensitive to outliers than the $L_2$ loss used in R-CNN and SPPnet. When the regression targets are unbounded, training with $L_2$ loss can require careful tuning of learning rates in order to prevent exploding gradients. Eq. 3 eliminates this sensitivity.

是一个健壮的L1 loss，对异常值的敏感性不如R-CNN和SPPnet中使用的L2 loss。当回归目标不受限制时，使用L2 loss进行训练可能需要仔细调整学习率，以防止梯度爆炸。等式3消除了这种敏感性。

The hyper-parameter λ in Eq. 1 controls the balance between the two task losses. We normalize the ground-truth regression targets $v_i$ to have zero mean and unit variance. All experiments use $λ = 1$ .

式1中的超参数 $λ$ 控制两个任务损失之间的平衡。我们对具有零均值和单位方差的真实标注回归目标 $v_i$ 进行归一化。所有实验均使用 $λ = 1$ 。

We note that [6] uses a related loss to train a classagnostic object proposal network. Different from our approach, [6] advocates for a two-network system that separates localization and classification. OverFeat [19], R-CNN [9], and SPPnet [11] also train classifiers and bounding-box localizers, however these methods use stage-wise training, which we show is suboptimal for Fast R-CNN (Section 5.1).

我们注意到，[6]使用相关的损失来训练分类对象建议网络。与我们的方法不同，[6]提倡将定位和分类分开的两个网络系统。 OverFeat [19]，R-CNN [9]和SPPnet [11]也训练分类器和边界框定位器，但是这些方法使用分阶段训练，对于快速R-CNN（5.1节），我们证明它们是次优的。

Mini-batch sampling. During fine-tuning, each SGD mini-batch is constructed from N = 2 images, chosen uniformly at random (as is common practice, we actually iterate over permutations of the dataset). We use mini-batches of size R = 128, sampling 64 RoIs from each image. As in [9], we take 25% of the RoIs from object proposals that have intersection over union (IoU) overlap with a groundtruth bounding box of at least 0.5. These RoIs comprise the examples labeled with a foreground object class, i.e. u ≥ 1. The remaining RoIs are sampled from object proposals that have a maximum IoU with ground truth in the interval [0.1,0.5), following [11]. These are the background examples and are labeled with u = 0. The lower threshold of 0.1 appears to act as a heuristic for hard example mining [8]. During training, images are horizontally flipped with probability 0.5. No other data augmentation is used.

小批量采样。在微调过程中，每个SGD微型批处理均由N = 2张图像构成，并随机选择（按照惯例，我们实际上对数据集的排列进行迭代）。我们使用大小为R = 128的小批次，从每个图像中采样64个RoI。像[9]中一样，我们从目标提案中获得25％的投资回报率，这些建议的联合交叉点（IoU）与真实标注边界框至少重叠0.5。这些RoI包括标有前景对象类（即u≥1）的示例，其余的RoI则是从对象建议中采样的，这些对象建议的最大IoU范围为[0.1,0.5），紧随[11]。这些是背景样本，并标有u =0。较低的阈值0.1似乎可以作为难样本挖掘的启发式方法[8]。在训练期间，图像以0.5的概率水平翻转。不使用其他数据扩充。

Back-propagation through RoI pooling layers. Backpropagation routes derivatives through the RoI pooling layer. For clarity, we assume only one image per mini-batch (N = 1), though the extension to N > 1 is straightforward because the forward pass treats all images independently.

通过RoI池化层进行反向传播。反向传播通过RoI池化层派生。为了清楚起见，我们假设每个小批量（N = 1）仅一张图像，尽管扩展到N> 1很简单，因为前向通过独立地对待所有图像。

Let $x_i∈ \R$ be the i-th activation input into the RoI pooling layer and let $y_{rj}$ be the layer’s j-th output from the r-th RoI. The RoI pooling layer computes $y_{rj} = x_i *(r，j)$ , in which $i∗(r, j) = argmax_{i'∈R(r,j)}x_i'$ . $R (r, j)$ is the index set of inputs in the sub-window over which the output unit $y_{rj}$ max pools. A single $x_i$ may be assigned to several different outputs $y_{rj}$ .

令 $x_i∈ \R$ 为RoI池化层的第 $i$ 个激活输入，而使 $y_{rj}$ 为第 $r$ 个RoI的第 $i$ 个激活输入。RoI池化层计算 $y_{rj} = x_i *(r，j)$ ，其中 $i∗(r, j) = argmax_{i'∈R(r,j)}x_i'$ 。 $R (r, j)$ 是子窗口中输入的索引集，输出单元 $y_{rj}$ 在该子窗口中最大池化。可以将单个 $x_i$ 分配给几个不同的输出 $y_{rj}$ 。

The RoI pooling layer’s backwards function computes partial derivative of the loss function with respect to each input variable xiby following the argmax switches:

RoI池层的向后函数通过遵循argmax开关，针对每个输入变量 $x_i$ 计算损失函数的偏导数：

In words, for each mini-batch RoI r and for each pooling output unit yrj, the partial derivative ∂L/∂yrj is accumulated if i is the argmax selected for yrjby max pooling. In back-propagation, the partial derivatives $L/∂y_{rj}$ are already computed by the backwards function of the layer on top of the RoI pooling layer.

换句话说，对于每个小批量RoI $r$ 和每个池化输出单元 $y_{rj}$ ，如果 $i$ 是通过最大池化为 $y_{rj}$ 选择的argmax，则累积偏导数 $L/∂y_{rj}$ 。在反向传播中，偏导数∂L/∂yrj已经通过RoI池化层顶部的层的反向函数进行了计算。

SGD hyper-parameters. The fully connected layers used for softmax classification and bounding-box regression are initializedfromzero-meanGaussiandistributionswithstandard deviations 0.01 and 0.001, respectively. Biases are initialized to 0. All layers use a per-layer learning rate of 1 for weights and 2 for biases and a global learning rate of 0.001. When training on VOC07 or VOC12 trainval we run SGD for 30k mini-batch iterations, and then lower the learning rate to 0.0001 and train for another 10k iterations. When we train on larger datasets, we run SGD for more iterations, as described later. A momentum of 0.9 and parameter decay of 0.0005 (on weights and biases) are used.

SGD超参数。从零均值高斯分布分别使用标准偏差0.01和0.001初始化用于softmax分类和边界框回归的完全连接层。偏差被初始化为0，所有层使用权重为1的每层学习率，使用偏差为2的全局学习率为0.001。在对VOC07或VOC12的train val进行训练时，我们对30k个小批量迭代运行SGD，然后将学习率降低至0.0001，然后再进行10k迭代训练。当我们在更大的数据集上进行训练时，我们将运行SGD进行更多的迭代，如稍后所述。使用的动量为0.9，参数衰减为0.0005（基于权重和偏差）。

2.4. Scale invariance

We explore two ways of achieving scale invariant object detection: (1) via “brute force” learning and (2) by using image pyramids. These strategies follow the two approaches in [11]. In the brute-force approach, each image is processed at a pre-defined pixel size during both training and testing. The network must directly learn scale-invariant object detection from the training data.

我们探索了实现尺度不变物体检测的两种方法：（1）通过“brute force”学习和（2）使用图像金字塔。这些策略遵循[11]中的两种方法。在brute force方法中，在训练和测试期间，每个图像均以预定义的像素大小进行处理。网络必须直接从训练数据中学习尺度不变对象检测。

The multi-scale approach, in contrast, provides approximate scale-invariance to the network through an image pyramid. At test-time, the image pyramid is used to approximately scale-normalize each object proposal. During multi-scale training, we randomly sample a pyramid scale each time an image is sampled, following [11], as a form of data augmentation. We experiment with multi-scale training for smaller networks only, due to GPU memory limits.

相比之下，多尺度方法通过图像金字塔为网络提供近似的尺度不变性。在测试时，图像金字塔用于近似缩放每个对象建议的比例。在多尺度训练中，我们根据[11]每次对图像进行采样时都会随机抽取金字塔比例，作为数据增强的一种形式。由于GPU内存的限制，我们仅针对较小的网络进行了多尺度训练。

3. Fast R-CNN detection

Once a Fast R-CNN network is fine-tuned, detection amounts to little more than running a forward pass (assuming object proposals are pre-computed). The network takes as input an image (or an image pyramid, encoded as a list of images) and a list of R object proposals to score. At test-time, R is typically around 2000, although we will consider cases in which it is larger (≈ 45k). When using an image pyramid, each RoI is assigned to the scale such that the scaled RoI is closest to 2242 pixels in area [11].

一旦对Fast R-CNN网络进行了微调，检测量就等于运行前向传递（假设对象建议已预先计算）。网络将图像（或图像金字塔，编码为图像列表）和R个对象建议列表进行输入作为评分。在测试时，R通常约为2000，尽管我们会考虑较大的情况（≈45k）。使用图像金字塔时，会将每个RoI分配给比例，以使缩放后的RoI在区域[11]中最接近2242像素。

For each test RoI r, the forward pass outputs a class posterior probability distribution p and a set of predicted bounding-box offsets relative to r (each of the K classes gets its own refined bounding-box prediction). We assign a detection confidence to r for each object class k using the estimated probability $Pr(class = k | r) ^∆_= p_k$ . We then perform non-maximum suppression independently for each class using the algorithm and settings from R-CNN [9].

对于每个测试RoI $r$ ，前向通过均输出类后验概率分布 $p$ 和相对于 $r$ 的一组预测的边界框偏移量（K个类中的每一个都具有自己的精确边界框预测）。我们使用估计概率 $Pr(class = k | r) ^∆_= p_k$ 为每个对象类别k分配r的检测置信度。然后，我们使用R-CNN [9]的算法和设置为每个类别独立执行非极大值抑制。

3.1. Truncated SVD for faster detection

For whole-image classification, the time spent computing the fully connected layers is small compared to the conv layers. On the contrary, for detection the number of RoIs to process is large and nearly half of the forward pass time is spent computing the fully connected layers (see Fig. 2). Large fully connected layers are easily accelerated by compressing them with truncated SVD [5, 23].

对于全图像分类，与转换层相比，计算完全连接的层所花费的时间少。相反，对于检测到的RoI的数量很大，正向传播的近一半时间花费在计算全连接层上（见图2）。通过使用截断SVD压缩它们，可以轻松地加速大型的全连接层[5，23]。

In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as

在此技术中，将由 $u \times v$ 权重矩阵 $W$ 参数化的图层近似分解为

using SVD. In this factorization, U is a u×t matrix comprising the first t left-singular vectors of W, Σt is a t×t diagonal matrix containing the top t singular values of W, and V is v×t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix $Σ_tV^T$ (and no biases) and the second uses U (with the original biases associated with W). This simple compression method gives good speedups when the number of RoIs is large.

使用SVD。在此分解中，U是一个包含W的前t个左奇异（left-singular）矢量的u×t矩阵，Σt是包含W的前t个奇异值的t×t对角矩阵，V是包含第一个t右奇异矢量的v×t矩阵W的奇异向量。截断SVD的向量可将参数计数从uv减少到t（u + v），如果t远小于min（u，v），这将很重要。为了压缩网络，将与W对应的单个全连接层替换为两个全连接层，它们之间没有非线性。这些层中的第一层使用权重矩阵 $Σ_tV^T$ （无偏差），第二层使用U（原始偏差与W相关联）。当RoI数量很大时，这种简单的压缩方法可以提供良好的加速效果。

4. Main results

Three main results support this paper’s contributions:

State-of-the-art mAP on VOC07, 2010, and 2012
Fast training and testing compared to R-CNN, SPPnet
Fine-tuning conv layers in VGG16 improves mAP

以下三个主要结果支持了本文的工作：

在VOC07、2010和2012上使用最新的mAP。
与R-CNN，SPPnet相比，快速的训练和测试。
微调VGG16中的卷积层可改善mAP。

4.1. Experimental setup

Our experiments use three pre-trained ImageNet models that are available online.2The first is the CaffeNet (essentially AlexNet[14]) from R-CNN [9]. We alternatively refer to this CaffeNet as model S, for “small.” The second network is VGG CNN M 1024 from [3], which has the same depth as S, but is wider. We call this network model M, for “medium.” The final network is the very deep VGG16 model from [20]. Since this model is the largest, we call it model L. In this section, all experiments use single-scale training and testing (s = 600; see Section 5.2 for details).

我们的实验使用了三个可在线使用的预先训练的ImageNet模型。第一个是R-CNN [9]的CaffeNet（本质上是AlexNet [14]）。我们也可以将此CaffeNet称为模型S，以表示“小型”。第二个网络是[3]中的VGG CNN M 1024，其深度与S相同，但宽度更大。我们称此网络模型M为“中”。最终的网络是来自[20]的非常深的VGG16模型。由于此模型是最大的模型，因此我们将其称为L模型。在本节中，所有实验均使用单尺度训练和测试（s = 600；有关详细信息，请参见5.2节）。

4.2. VOC 2010 and 2012 results

On these datasets, we compare Fast R-CNN (FRCN, for short) against the top methods on the comp4 (outside data) track from the public leaderboard (Table 2, Table 3).3For the NUS NIN c2000 and BabyLearning methods, there are no associated publications at this time and we could not find exact information on the ConvNet architectures used; they are variants of the Network-in-Network design [17]. All other methods are initialized from the same pre-trained VGG16 network.

在这些数据集上，我们将Fast R-CNN（简称FRCN）与公共排行榜上comp4（外部数据）轨道上的顶级方法（表2，表3）进行了比较.3对于NUS NIN c2000和BabyLearning方法，目前没有相关出版物，我们无法找到有关所使用的ConvNet体系结构的确切信息；它们是网络中网络设计的变体[17]。所有其他方法均从相同的预训练VGG16网络初始化。

Fast R-CNN achieves the top result on VOC12 with a mAP of 65.7% (and 68.4% with extra data). It is also two orders of magnitude faster than the other methods, which are all based on the “slow” R-CNN pipeline. On VOC10, SegDeepM [25] achieves a higher mAP than Fast R-CNN (67.2% vs. 66.1%). SegDeepM is trained on VOC12 trainval plus segmentation annotations; it is designed to boost R-CNN accuracy by using a Markov random field to reason over R-CNN detections and segmentations from the O2P [1] semantic-segmentation method. Fast R-CNN can be swapped into SegDeepM in place of R-CNN, which may lead to better results. When using the enlarged 07++12 training set (see Table 2 caption), Fast R-CNN’s mAP increases to 68.8%, surpassing SegDeepM.

快速R-CNN以65.7％的mAP（在有额外数据的情况下为68.4％）在VOC12上获得最高的结果。它也比其他所有基于“慢速” R-CNN管道的方法快两个数量级。在VOC10上，SegDeepM [25]比快速R-CNN获得了更高的mAP（67.2％对66.1％）。 SegDeepM接受了VOC12 Trainval加上分段注释的培训；通过使用Markov随机字段来推理O-P [1]语义分段方法中的R-CNN检测和分段，可以提高R-CNN的准确性。快速R-CNN可以代替R-CNN交换到SegDeepM中，这可能会导致更好的结果。使用扩大的07 ++ 12训练集（请参见表2标题）时，Fast R-CNN的mAP增加到68.8％，超过了SegDeepM。

4.3. VOC 2007 results

On VOC07, we compare Fast R-CNN to R-CNN and SPPnet. All methods start from the same pre-trained VGG16 network and use bounding-box regression. The VGG16 SPPnet results were computed by the authors of [11]. SPPnet uses five scales during both training and testing. The improvement of Fast R-CNN over SPPnet illustrates that even though Fast R-CNN uses single-scale training and testing, fine-tuning the conv layers provides a large improvement in mAP (from 63.1% to 66.9%). R-CNN achieves a mAP of 66.0%. As a minor point, SPPnet was trained without examples marked as “difficult” in PASCAL. Removing these examples improves Fast R-CNN mAP to 68.1%. All other experiments use “difficult” examples.

在VOC07上，我们将Fast R-CNN与R-CNN和SPPnet进行了比较。所有方法均从相同的预训练VGG16网络开始，并使用包围盒回归。 VGG16 SPPnet结果由[11]的作者计算。在培训和测试期间，SPPnet使用五个等级。与SPPnet相比，Fast R-CNN的改进表明，即使Fast R-CNN使用单尺度训练和测试，对卷积层进行微调也可以在mAP上实现较大的改进（从63.1％到66.9％）。 R-CNN的mAP达到66.0％。较小的一点是，对SPPnet进行了训练，没有在PASCAL中标记为“困难”的示例。删除这些示例会将Fast R-CNN mAP提升到68.1％。所有其他实验都使用“困难”示例。

4.4. Training and testing time

Fast training and testing times are our second main result. Table 4 compares training time (hours), testing rate (seconds per image), and mAP on VOC07 between Fast RCNN, R-CNN, and SPPnet. For VGG16, Fast R-CNN processes images 146× faster than R-CNN without truncated SVD and 213× faster with it. Training time is reduced by 9×, from 84 hours to 9.5. Compared to SPPnet, Fast RCNN trains VGG16 2.7× faster (in 9.5 vs. 25.5 hours) and tests 7× faster without truncated SVD or 10× faster with it. Fast R-CNN also eliminates hundreds of gigabytes of disk storage, because it does not cache features.

快速的训练和测试时间是我们的第二个主要结果。表4比较了快速RCNN，R-CNN和SPPnet在VOC07上的训练时间（小时），测试速率（每幅图像的秒数）和mAP。对于VGG16，快速R-CNN处理图像的速度比没有截断SVD的R-CNN快146倍，使用SVD则快213倍。训练时间从84小时减少到9.5倍，减少了9倍。与SPPnet相比，Fast RCNN将VGG16的训练速度提高了2.7倍（在9.5与25.5小时之间），测试速度提高了7倍，而截断SVD的速度却提高了10倍。快速R-CNN还消除了数百GB的磁盘存储空间，因为它没有缓存功能。

Truncated SVD. Truncated SVD can reduce detection time by more than 30% with only a small (0.3 percentage point) drop in mAP and without needing to perform additional fine-tuning after model compression. Fig. 2 illustrates how using the top 1024 singular values from the 25088×4096 matrix in VGG16’s fc6 layer and the top 256 singular values from the 4096×4096 fc7 layer reduces runtime with little loss in mAP . Further speed-ups are possible with smaller drops in mAP if one fine-tunes again after compression.

截断的SVD。截短的SVD可以将检测时间减少30％以上，而mAP仅下降很小（0.3个百分点），并且在模型压缩后无需执行其他微调。图2说明了如何在VGG16的fc6层中使用25088×4096矩阵中的前1024个奇异值和在4096×4096 fc7层中使用前256个奇异值，以减少mAP的运行时间。如果压缩后再次进行微调，则mAP下降较小的情况下可能会进一步加快速度。

4.5. Which layers to fine-tune?

For the less deep networks considered in the SPPnet paper [11], fine-tuning only the fully connected layers appeared to be sufficient for good accuracy. We hypothesized that this result would not hold for very deep networks. To validate that fine-tuning the conv layers is important for VGG16, we use Fast R-CNN to fine-tune, but freeze the thirteen conv layers so that only the fully connected layers learn. This ablation emulates single-scale SPPnet training and decreases mAP from 66.9% to 61.4% (Table 5). This experiment verifies our hypothesis: training through the RoI pooling layer is important for very deep nets.

对于SPPnet论文[11]中考虑的深度较浅的网络，仅微调全连接层似乎足以实现良好的精度。我们假设此结果不适用于非常深的网络。为了验证微调conv层对VGG16的重要性，我们使用Fast R-CNN进行微调，但冻结了13个conv层，以便仅全连接层学习。这种消融模拟了单规模SPPnet训练，并将mAP从66.9％降低到61.4％（表5）。该实验验证了假设：通过RoI池化层进行的训练对于非常深的网络很重要。

Does this mean that all conv layers should be fine-tuned? In short, no. In the smaller networks (S and M) we find that conv1 is generic and task independent (a well-known fact [14]). Allowing conv1 to learn, or not, has no meaningful effect on mAP . For VGG16, we found it only necessary to update layers from conv3 1 and up (9 of the 13 conv layers). This observation is pragmatic: (1) updating from conv2 1 slows training by 1.3× (12.5 vs. 9.5 hours) compared to learning from conv3 1; and (2) updating from conv1 1 over-runs GPU memory. The difference in mAP when learning from conv2 1 up was only +0.3 points (Table 5, last column). All Fast R-CNN results in this paper using VGG16 fine-tune layers conv3 1 and up; all experiments with models S and M fine-tune layers conv2 and up.

这是否意味着所有卷积层都应进行微调？简而言之，没有。在较小的网络（S和M）中，我们发现conv1是通用的且与任务无关（众所周知的事实[14]）。允许conv1学习或不影响mAP。对于VGG16，我们发现只需要更新conv3_1和更高的层（13个conv层中的9个）。这种观察是务实的：（1）与从conv3_1学习相比，从conv2_1更新会使训练速度降低1.3倍（12.5与9.5小时）。（2）从conv1_1更新会超出GPU内存。从conv2_1开始学习时，mAP的差异仅为+0.3分（表5，最后一栏）。本文所有的Fast R-CNN结果均使用VGG16微调层conv3_1和更高；使用模型S和M进行的所有实验均会微调conv2层及以上的层。

5. Design evaluation

We conducted experiments to understand how Fast RCNN compares to R-CNN and SPPnet, as well as to evaluate design decisions. Following best practices, we performed these experiments on the PASCAL VOC07 dataset.

我们进行了实验，以了解Fast RCNN与R-CNN和SPPnet的比较，以及评估设计决策。按照最佳做法，我们在PASCAL VOC07数据集上进行了这些实验。

5.1. Does multi-task training help?

Multi-task training is convenient because it avoids managing a pipeline of sequentially-trained tasks. But it also has the potential to improve results because the tasks influence each other through a shared representation (the ConvNet) [2]. Does multi-task training improve object detection accuracy in Fast R-CNN?

多任务训练很方便，因为它避免了管理顺序训练的任务的流水线。但它也有可能改善结果，因为任务通过共享表示（ConvNet）相互影响[2]。多任务训练是否可以提高Fast R-CNN中的目标检测精度？

To test this question, we train baseline networks that use only the classification loss, Lcls, in Eq. 1 (i.e., setting λ = 0). These baselines are printed for models S, M, and L in the first column of each group in Table 6. Note that these models do not have bounding-box regressors. Next (second column per group), we take networks that were trained with the multi-task loss (Eq. 1, λ = 1), but we disable boundingbox regression at test time. This isolates the networks’ classification accuracy and allows an apples-to-apples comparison with the baseline networks.

为了测试此问题，我们训练了仅使用等式中的分类损失 $L_{cls}$ 的基线网络。（即设定λ= 0）。表6中每组第一栏中的模型S，M和L均打印了这些基准。请注意，这些模型没有边界框回归器。接下来（每组第二列），我们采用经过多任务丢失训练的网络（等式1，λ= 1），但是在测试时禁用边界框回归。这样可以隔离网络的分类准确性，并可以与基准网络进行逐个比较。

Across all three networks we observe that multi-task training improves pure classification accuracy relative to training for classification alone. The improvement ranges from +0.8 to +1.1 mAP points, showing a consistent positive effect from multi-task learning.

在所有三个网络中，我们观察到多任务训练相对于单独的分类训练可以提高纯分类精度。改进范围从+0.8到+1.1 mAP点，显示了多任务学习的一致积极效果。

Finally, we take the baseline models (trained with only the classification loss), tack on the bounding-box regression layer, and train them with Llocwhile keeping all other network parameters frozen. The third column in each group shows the results of this stage-wise training scheme: mAP improves over column one, but stage-wise training underperforms multi-task training (forth column per group).

最后，我们采用基线模型（仅通过分类损失进行训练），在边界框回归层上进行定位，并使用Lloc对其进行训练，同时保持所有其他网络参数冻结。每组的第三列显示了此阶段训练方案的结果：mAP比第一列有所改善，但阶段训练的性能却不如多任务训练（每组第四列）。

5.2. Scale invariance: to brute force or finesse?

We compare two strategies for achieving scale-invariant object detection: brute-force learning (single scale) and image pyramids (multi-scale). In either case, we define the scale s of an image to be the length of its shortest side.

我们比较了实现尺度不变的对象检测的两种策略：蛮力学习（单尺度）和图像金字塔（多尺度）。无论哪种情况，我们都将图像的尺度s定义为其最短边的长度。

All single-scale experiments use s = 600 pixels; s may be less than 600 for some images as we cap the longest image side at 1000 pixels and maintain the image’s aspect ratio. These values were selected so that VGG16 fits in GPU memory during fine-tuning. The smaller models are not memory bound and can benefit from larger values of s; however, optimizing s for each model is not our main concern. We note that PASCAL images are 384 × 473 pixels on average and thus the single-scale setting typically upsamples images by a factor of 1.6. The average effective stride at the RoI pooling layer is thus ≈ 10 pixels.

所有单尺度实验都使用s = 600像素；对于某些图像，s可能小于600，因为我们将最长的图像边上限为1000像素，并保持图像的长宽比。选择这些值是为了使VGG16在微调期间适合GPU内存。较小的模型没有内存限制，无法从较大的值中受益。但是，为每个模型优化并不是我们主要关注的问题。我们注意到，PASCAL图像平均为384×473像素，因此单比例设置通常会将图像上采样率提高1.6倍。因此，RoI池层的平均有效步幅约为10个像素

In the multi-scale setting, we use the same five scales specified in [11] (s ∈ {480,576,688,864,1200}) to facilitate comparison with SPPnet. However, we cap the longest side at 2000 pixels to avoid exceeding GPU memory.

在多尺度设置中，我们使用与[11]中指定的相同的五个尺度（s∈{480,576,688,864,1200}），以便于与SPPnet进行比较。但是，我们将最长边的上限设置为2000像素，以避免超出GPU内存。

Table 7 shows models S and M when trained and tested with either one or five scales. Perhaps the most surprising result in [11] was that single-scale detection performs almost as well as multi-scale detection. Our findings confirm their result: deep ConvNets are adept at directly learning scale invariance. The multi-scale approach offers only a small increase in mAP at a large cost in compute time (Table 7). In the case of VGG16 (model L), we are limited to using a single scale by implementation details. Y et it achieves a mAP of 66.9%, which is slightly higher than the 66.0% reported for R-CNN [10], even though R-CNN uses “infinite” scales in the sense that each proposal is warped to a canonical size.

表7显示了使用一或五个尺度训练和测试时的模型S和M。也许[11]中最令人惊讶的结果是单一尺度检测的性能几乎与多尺度检测相同。我们的发现证实了他们的结果：深度ConvNets擅长直接学习尺度不变性。多尺度方法仅以很小的计算量就提供了mAP的增加（表7）。在VGG16（型号L）的情况下，根据实现细节，我们只能使用单一尺度。迄今为止，它实现了66.9％的mAP，比R-CNN报道的66.0％[10]略高，即使R-CNN使用“无限”尺度，因为每个建议区域都扭曲为规范大小。

Since single-scale processing offers the best tradeoff between speed and accuracy, especially for very deep models, all experiments outside of this sub-section use single-scale training and testing with s = 600 pixels.

由于单尺度处理可在速度和精度之间取得最佳平衡，尤其是对于非常深的模型，因此，本小节以外的所有实验均使用s = 600像素的单标度训练和测试。

5.3. Do we need more training data?

A good object detector should improve when supplied with more training data. Zhu et al. [24] found that DPM [8] mAP saturates after only a few hundred to thousand training examples. Here we augment the VOC07 trainval set with the VOC12 trainval set, roughly tripling the number of images to 16.5k, to evaluate Fast R-CNN. Enlarging the training set improves mAP on VOC07 test from 66.9% to 70.0% (Table 1). When training on this dataset we use 60k mini-batch iterations instead of 40k.

一个好的物体检测器在提供更多训练数据时应该得到改善。朱等。 [24]发现DPM [8] mAP仅在几百到数千个训练示例后就饱和。在这里，我们用VOC12训练集扩展VOC07训练集，将图像数量大约增加了三倍，达到16.5k，以评估Fast R-CNN。扩大训练范围可将VOC07测试中的mAP从66.9％提高到70.0％（表1）。在对该数据集进行训练时，我们使用60k的小批量迭代，而不是40k。

We perform similar experiments for VOC10 and 2012, for which we construct a dataset of 21.5k images from the union of VOC07 trainval, test, and VOC12 trainval. When training on this dataset, we use 100k SGD iterations and lower the learning rate by 0.1× each 40k iterations (instead of each 30k). For VOC10 and 2012, mAP improves from 66.1% to 68.8% and from 65.7% to 68.4%, respectively.

我们对VOC10和2012进行了类似的实验，为此我们从VOC07训练，测试和VOC12训练的结合中构建了一个21.5k图像的数据集。在此数据集上进行训练时，我们使用100k SGD迭代，并且每40k迭代（而不是每30k）将学习率降低0.1倍。对于VOC10和2012，mAP分别从66.1％提高到68.8％，从65.7％提高到68.4％。

5.4. Do SVMs outperform softmax?

Fast R-CNN uses the softmax classifier learnt during fine-tuning instead of training one-vs-rest linear SVMs post-hoc, as was done in R-CNN and SPPnet. To understand the impact of this choice, we implemented post-hoc SVM training with hard negative mining in Fast R-CNN. We use the same training algorithm and hyper-parameters as in R-CNN.

快速R-CNN使用在微调过程中学习到的softmax分类器，而不是像R-CNN和SPPnet那样事后训练一对一的线性SVM。为了了解此选择的影响，我们在Fast R-CNN中实施了post-hoc SVM训练，并进行了硬负例挖掘。我们使用与R-CNN中相同的训练算法和超参数。

Table 8 shows softmax slightly outperforming SVM for all three networks, by +0.1 to +0.8 mAP points. This effect is small, but it demonstrates that “one-shot” fine-tuning is sufficient compared to previous multi-stage training approaches. We note that softmax, unlike one-vs-rest SVMs, introduces competition between classes when scoring a RoI.

表8显示了所有三个网络的softmax略胜于SVM，提高了+0.1至+0.8 mAP点。这种效果很小，但表明与以前的多阶段训练方法相比，“单次”微调就足够了。我们注意到，softmax与一对一的SVM不同，在为RoI评分时会引入类之间的竞争。

5.5. Are more proposals always better?

There are (broadly) two types of object detectors: those that use a sparse set of object proposals (e.g., selective search [21]) and those that use a dense set (e.g., DPM [8]). Classifying sparse proposals is a type of cascade [22] in which the proposal mechanism first rejects a vast number of candidates leaving the classifier with a small set to evaluate. This cascade improves detection accuracy when applied to DPM detections [21]. We find evidence that the proposalclassifier cascade also improves Fast R-CNN accuracy.

有两种类型的对象检测器：使用稀疏对象建议集的对象检测器（例如，选择性搜索[21]）和使用密集对象的检测器（例如DPM [8]）。对稀疏提议进行分类是一种级联[22]，其中建议机制首先会拒绝大量候选，而给分类器留下一小集进行评估。当应用于DPM检测时，这种级联提高了检测精度[21]。我们发现，建议分类器级联还提高了快速R-CNN的准确性。

Using selective search’s quality mode, we sweep from 1k to 10k proposals per image, each time re-training and retesting model M. If proposals serve a purely computational role, increasing the number of proposals per image should not harm mAP.

使用选择性搜索的质量模式，我们每次重新训练和重新测试模型M时，每张图像的建议从1k扫到10,000k。如果建议仅起到计算作用，则增加每张图片的建议数量不会损害mAP。

This result is difficult to predict without actually running the experiment. The state-of-the-art for measuring object proposal quality is Average Recall (AR) [12]. AR correlates well with mAP for several proposal methods using R-CNN, when using a fixed number of proposals per image. Fig. 3 shows that AR (solid red line) does not correlate well with mAP as the number of proposals per image is varied. AR must be used with care; higher AR due to more proposals does not imply that mAP will increase. Fortunately, training and testing with model M takes less than 2.5 hours. Fast R-CNN thus enables efficient, direct evaluation of object proposal mAP, which is preferable to proxy metrics.

如果不实际运行实验，很难预测此结果。测量对象建议质量的最新技术是平均召回率（AR）[12]。当每个图像使用固定数量的建议时，AR与使用R-CNN的几种建议方法的mAP关联良好。图3显示，随着每幅图像的建议数量变化，AR（红色实线）与mAP的关联性不高。必须谨慎使用AR；由于有更多建议，因此更高的AR并不意味着mAP会增加。幸运的是，使用M型进行训练和测试的时间少于2.5小时。因此，快速R-CNN可以高效，直接地评估对象建议mAP，这比代理指标更可取。

We also investigate Fast R-CNN when using densely generated boxes (over scale, position, and aspect ratio), at a rate of about 45k boxes / image. This dense set is rich enough that when each selective search box is replaced by its closest (in IoU) dense box, mAP drops only 1 point (to 57.7%, Fig. 3, blue triangle).

当使用密集生成的框（超比例，位置和长宽比）时，我们还研究了Fast R-CNN，速率约为每张图片45,000个框。这个密集的集合足够丰富，以至于当每个选择性搜索框被最接近的（在IoU中）密集框替换时，mAP只会下降1点（下降到57.7％，图3，蓝色三角形）。

The statistics of the dense boxes differ from those of selective search boxes. Starting with 2k selective search boxes, we test mAP when adding a random sample of 1000 × {2,4,6,8,10,32,45} dense boxes. For each experiment we re-train and re-test model M. When these dense boxes are added, mAP falls more strongly than when adding more selective search boxes, eventually reaching 53.0%.

密集框的统计信息与选择性搜索框的统计信息不同。从2k个选择性搜索框开始，我们在添加1000×{2,4,6,8,10,32,45}密集框的随机样本时测试mAP。对于每个实验，我们都会重新训练和重新测试模型M。添加这些密集框时，与添加更多选择性搜索框相比，mAP下降的幅度更大，最终达到53.0％。

We also train and test Fast R-CNN using only dense boxes (45k / image). This setting yields a mAP of 52.9% (blue diamond). Finally, we check if SVMs with hard negative mining are needed to cope with the dense box distribution. SVMs do even worse: 49.3% (blue circle).

我们还仅使用密集框（45k /图像）训练和测试Fast R-CNN。此设置产生的mAP为52.9％（蓝色菱形）。最后，我们检查是否需要使用带有硬负例挖掘的SVM来处理密集的框分布。 SVM甚至更糟：49.3％（蓝色圆圈）。

5.6. Preliminary MS COCO results

We applied Fast R-CNN (with VGG16) to the MS COCO dataset [18] to establish a preliminary baseline. We trained on the 80k image training set for 240k iterations and evaluated on the “test-dev” set using the evaluation server. The PASCAL-style mAP is 35.9%; the new COCO-style AP , which also averages over IoU thresholds, is 19.7%.

我们将快速R-CNN（带有VGG16）应用于MS COCO数据集[18]，以建立初步基线。我们在80k图像训练集上进行了240k迭代训练，并使用评估服务器在“ test-dev”数据集上进行了评估。 PASCAL样式的mAP为35.9％；新的COCO样式的AP也达到了IoU阈值的平均值，为19.7％。

6. Conclusion

This paper proposes Fast R-CNN, a clean and fast update to R-CNN and SPPnet. In addition to reporting state-of-theart detection results, we present detailed experiments that we hope provide new insights. Of particular note, sparse object proposals appear to improve detector quality. This issue was too costly (in time) to probe in the past, but becomes practical with Fast R-CNN. Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as sparse proposals. Such methods, if developed, may help further accelerate object detection.

本文提出了快速R-CNN，它是对R-CNN和SPPnet的干净快速更新。除了报告最新的检测结果外，我们还提供了详细的实验，希望能提供新的见解。特别要注意的是，稀疏对象建议似乎可以提高检测器质量。这个问题过去花费的时间太长（无法及时解决），但是对于Fast R-CNN来说是可行的。当然，可能存在尚未发现的技术，这些技术允许密集的框执行稀疏的建议。如果开发出此类方法，则可能有助于进一步加速物体检测。

Acknowledgements. I thank Kaiming He, Larry Zitnick, and Piotr Dollár for helpful discussions and encouragement.

你可能感兴趣的:(目标检测（Object,Detection）,论文学习（Paper）,计算机视觉,神经网络,目标检测,Fast,RCNN)

JSON 与 AJAX Auscy json ajax 前端
一、JSON（JavaScriptObjectNotation）1.数据类型与语法细节支持的数据类型：基本类型：字符串（需用双引号）、数字、布尔值（true/false）、null。复杂类型：数组（[]）、对象（{}）。严格语法规范：键名必须用双引号包裹（如"name":"张三"）。数组元素用逗号分隔，最后一个元素后不能有多余逗号。数字不能以0开头（如012会被解析为12），不支持八进制/十六进制
LeetCode 148. 排序链表：归并排序的细节解析进击的小白菜 2025 Top100 详解 leetcode 链表算法
文章目录题目描述一、方法思路：归并排序的核心步骤二、关键实现细节：快慢指针分割链表1.快慢指针的初始化问题2.为什么选择`fast=head.next`？示例1：链表长度为偶数（`1->2->3->4`）三、完整代码实现四、复杂度分析五、总结题目描述LeetCode148题要求对链表进行排序，时间复杂度需为O(nlogn)，且空间复杂度为O(logn)。由于链表的特殊结构（无法随机访问），归并排序
PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
霍夫变换（Hough Transform）算法原来详解和纯C++代码实现以及OpenCV中的使用示例点云SLAM 算法图形图像处理算法 opencv 图像处理与计算机视觉算法直线提取检测目标检测霍夫变换算法
霍夫变换（HoughTransform）是一种经典的图像处理与计算机视觉算法，广泛用于检测图像中的几何形状，例如直线、圆、椭圆等。其核心思想是将图像空间中的“点”映射到参数空间中的“曲线”，从而将形状检测问题转化为参数空间中的峰值检测问题。一、霍夫变换基本思想输入：边缘图像（如经过Canny边缘检测）输出：一组满足几何模型的形状（如直线、圆）关键思想：图像空间中的一个点→参数空间中的一个曲线参数空
【目标检测】机场内部目标检测数据集4106张YOLO+VOC格式
数据集格式：VOC格式+YOLO格式压缩包内含：3个文件夹，分别存储图片、xml、txt文件JPEGImages文件夹中jpg图片总计：4106Annotations文件夹中xml文件总计：4106labels文件夹中txt文件总计：4106标签种类数：7标签名称:["Ground_vehicles","Horizontal_sign","Runaway_limit","Taxiway","Ver
传统检测响应慢？陌讯多模态引擎提速90+FPS实战 2501_92473147 算法计算机视觉目标检测
开篇痛点：实时目标检测在安防监控中的核心挑战在安防监控领域，实时目标检测是保障公共安全的关键技术。然而，传统算法如YOLOv5或开源框架MMDetection常面临两大痛点：误报率高（复杂光照或遮挡场景下检测不稳定）和响应延迟（高分辨率视频流处理FPS低于30）。实测数据显示，城市交通监控系统误报率达15%，导致安保资源浪费；客户反馈表明，延迟超100ms时，目标跟踪可能失效。这些问题源于算法泛化
实时预览功能问题 GISer_Jinger 项目 javascript 开发语言 ecmascript
你遇到的问题是：“B端修改配置后无法实时出现在previewiframe中，而必须点击刷新才能生效”。主要原因与以下几方面有关：❗为什么需要手动刷新：iFrame与主页面之间缺少实时通信机制：原本仅靠刷新重新加载iframe，而没有通过postMessage等方式同步状态；Valtio的proxy状态不能跨文件热刷新持久保存：当你修改包含proxy定义的文件，热重载会导致object被替换，监听丢
盲超分的核心概念小冷爱读书数学建模盲超分超分重建
一、盲超分的本质与数学建模1.退化过程的数学表达低分辨率图像（LR）可看作高分辨率图像（HR）经过退化模型后的结果：：观测到的低分辨率图像：待恢复的高分辨率图像：模糊核（BlurKernel）⊗：卷积操作↓：下采样（步长为）：加性噪声（如高斯噪声、泊松噪声等）盲超分的核心问题：在未知、、的情况下，从估计。2.为什么传统超分方法会失效？传统方法（如SRCNN、EDSR）假设退化是固定的（如双三次下采
h5-video标签全屏显示记录 ZhDan91 前端开发混合app
video{width:100%;height:100%;object-fit:fill;}
QML与C++相互调用函数并获得返回值 cpp_learners QML c++QML qt
这篇博客主要讲解在qml端如何直接调用c++的函数并获得返回值，在c++端如何直接调用qml的函数并获得返回值；主要以map或者jsonobject、list或者jsonarray为主！其他单个类型，常见的类型，例如QString、int等，就不演示了；一通百通。目录1准备工作1.1C++端1.2QML端2qml端直接调用c++端函数3c++端直接调用qml端函数3.1调用qml的qmlFuncO
iOS 多个线程对数组操作（遍历，插入，删除),实现一个线程安全的NSMutabeArray
//联系人:石虎QQ:1224614774昵称:嗡嘛呢叭咪哄一、概念1.含义:@synchronized(self){}//这个其实就是一个加锁。如果self其他线程访问，则会阻塞。这样做一般是用来对单2.重写构造方法@interfaceSHSafetyArray:NSObject{@privateNSMutableArray*_mutableArray;//声明数组}//遍历加锁-(void)m
目标检测（object detection）加油吧zkf 目标检测目标检测人工智能计算机视觉
目标检测作为计算机视觉的核心技术，在自动驾驶、安防监控、医疗影像等领域发挥着不可替代的作用。本文将系统讲解目标检测的概念、原理、主流模型、常见数据集及应用场景，帮助读者构建对这一技术的完整认知。一、目标检测的核心概念目标检测（ObjectDetection）是指在图像或视频中自动定位并识别出所有感兴趣的目标的技术。它需要解决两个核心问题：分类（Classification）：确定图像中每个目标的类
深度学习篇---昇腾NPU&CANN 工具包 Atticus-Orion 上位机知识篇图像处理篇深度学习篇深度学习人工智能 NPU 昇腾 CANN
介绍昇腾NPU是华为推出的神经网络处理器，具有强大的AI计算能力，而CANN工具包则是面向AI场景的异构计算架构，用于发挥昇腾NPU的性能优势。以下是详细介绍：昇腾NPU架构设计：采用达芬奇架构，是一个片上系统，主要由特制的计算单元、大容量的存储单元和相应的控制单元组成。集成了多个CPU核心，包括控制CPU和AICPU，前者用于控制处理器整体运行，后者承担非矩阵类复杂计算。此外，还拥有AICore
深度学习图像分类数据集—桃子识别分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：桃子识别分类：['B1','M2','R0','S3']训练数据集总共有6637张图片，每个文件夹单独放一种数据各子文件夹图片统计:·B1:1601张图片·M2:1800张图片·R0:1601张图片·S3:
2.4 基于dpdk的用户态协议栈的实现百亿苍狗高性能网络设计专栏开发语言网络
操作系统PosixAPI所提供的网络接口，数据收发是基于用户态与内核态的频繁切换实现。而dpdk实现了绕过内核监管，直接在用户态访问网络硬件，避免频繁状态切换。DPDK安装与配置虚拟机环境配置检查是否支持多队列网卡cat/proc/interrupts|grepens33(获取整个机器的终端)，结果19:4202120IO-APIC19-fasteoiens33，不支持多队列网卡。虚拟机关机，修改
技术演进中的开发沉思-32 MFC系列：生命周期 chilavert318 熬之滴水穿石 windows c++
今天，我们继续MFC以一种更亲近的方式，梳理这个框架的脉络，看看一个MFC程序从诞生到运行的完整故事。一、MFC类层次结构昨天已经梳理过MFC的类层次了，今天梳理其生命周期，还是要提一下。因为它确实很重要，如果把MFC比作一个庞大的家族，那类层次结构就是它的族谱。最顶层的CObject就像家族的老祖宗，所有成员都流淌着它的血液——封装了最基础的功能，比如对象的创建与销毁、序列化等。往下分，就像家族
使用NVIDIA NeRF将2D图像转换为逼真的3D模型（Python） ByteWhiz 3d python 计算机视觉 Python
使用NVIDIANeRF将2D图像转换为逼真的3D模型（Python）NeuralRadianceFields（NeRF）是一种强大的方法，可以将2D图像转换为逼真的3D模型。它使用神经网络来建模场景的辐射场，并通过渲染多个视角的图像来重建3D模型。在本文中，我们将使用Python和NVIDIANeRF库来实现这一过程。首先，我们需要安装所需的库。我们可以通过以下命令使用pip安装NVIDIANe
php中调用对象的方法可以使用array($object, ‘methodName‘)？ IT 老王 php android 开发语言
是的，在PHP中，array($object,'methodName')是一种标准的回调语法，用于表示“调用某个对象的特定方法”。这种语法可以被许多函数（如call_user_func()、call_user_func_array()、usort()等）识别并执行。语法原理在PHP中，可调用对象（callable）有多种形式，其中之一是[对象实例,方法名]数组：第一个元素：对象实例（必须是已实例化
微算法科技的前沿探索：量子机器学习算法在视觉任务中的革新应用 MicroTech2025 量子计算算法
在信息技术飞速发展的今天，计算机视觉作为人工智能领域的重要分支，正逐步渗透到我们生活的方方面面。从自动驾驶到人脸识别，从医疗影像分析到安防监控，计算机视觉技术展现了巨大的应用潜力。然而，随着视觉任务复杂度的不断提升，传统机器学习算法在处理大规模、高维度数据时遇到了计算瓶颈。在此背景下，量子计算作为一种颠覆性的计算模式，以其独特的并行处理能力和指数级增长的计算空间，为解决这一难题提供了新的思路。微算
目标检测中的NMS算法详解
好的，我们来详细解释一下目标检测中非极大值抑制（Non-MaximumSuppression,NMS）的相关概念和计算过程。1.为什么需要NMS？问题：目标检测模型（如FasterR-CNN,YOLO,SSD等）在推理时，对于同一个目标物体，通常会预测出多个重叠的、不同置信度（confidencescore）的候选边界框（BoundingBoxes）。直接输出所有这些框会导致：结果冗余：同一个物体
Mamba项目用户指南：高效管理Python环境的利器左松钦Travis
Mamba项目用户指南：高效管理Python环境的利器mambaTheFastCross-PlatformPackageManager项目地址:https://gitcode.com/gh_mirrors/mam/mamba什么是Mamba？Mamba是一个基于Conda的CLI工具，专为高效管理Python环境而设计。它继承了Conda的所有优点，同时在性能上进行了显著优化，特别是在解决依赖关系
什么是ORM？它如何简化后端开发？破碎的天堂鸟学习教程数据库
什么是ORM？ORM（对象关系映射，Object-RelationalMapping）是一种编程技术，用于解决面向对象编程语言与关系型数据库之间的数据转换问题。其核心是将数据库中的表结构映射为程序中的类和对象，使开发者能够以操作对象的方式操作数据库，而非直接编写SQL语句。具体而言：映射机制：数据库表→编程语言中的类（如User类对应users表）表字段→类的属性（如username字段对应Use
图神经网络：挖掘关系数据中的宝藏
图神经网络：挖掘关系数据中的宝藏在浩瀚的数据海洋中，蕴藏着一类特殊而强大的资源——关系数据。它们不是孤立的点，而是相互连接、彼此影响的复杂网络：社交平台上朋友的朋友、电商系统中商品与用户的互动、蛋白质分子内原子的结合、城市交通网中的道路连接……这些数据天然以图的形式存在，节点代表实体，边则承载着实体间千丝万缕的关系。传统的数据挖掘工具面对这些盘根错节的结构往往力不从心，而图神经网络（GNN）的崛起
MATLAB实现快速非局部均值图像去噪方法一只爪子
本文还有配套的精品资源，点击获取简介：非局部均值滤波是一种先进的图像去噪技术，与传统方法相比，它利用图像的全局信息来去除噪声，同时保持图像细节。该算法通过搜索和利用整个图像中相似的像素块，对每个像素点进行去噪处理。本文提供的MATLAB代码FAST_NLM_II.m实现此算法，并包含必要的参数设置、相似性计算、加权平均和图像更新步骤。了解并应用此代码是学习和进一步改进非局部均值滤波技术的基础。1.
从RNN循环神经网络到Transformer注意力机制：解析神经网络架构的华丽蜕变熊猫钓鱼>_> 神经网络 rnn transformer
1.引言在自然语言处理和序列建模领域，神经网络架构经历了显著的演变。从早期的循环神经网络（RNN）到现代的Transformer架构，这一演变代表了深度学习方法在处理序列数据方面的重大进步。本文将深入比较这两种架构，分析它们的工作原理、优缺点，并通过实验结果展示它们在实际应用中的性能差异。2.循环神经网络（RNN）2.1基本原理循环神经网络是专门为处理序列数据而设计的神经网络架构。RNN的核心思想
FastAPI 实用教程：构建高性能 Python Web API 的终极指南熊猫钓鱼>_> 大数据 hadoop 分布式
本文为原创实战教程，涵盖FastAPI核心特性、路由设计、数据验证、数据库集成、认证授权、测试部署全流程，4000+字助你快速掌握现代PythonWeb开发利器。一、FastAPI为何成为开发者新宠？在PythonWeb框架领域，Flask和Django长期占据主导地位。但FastAPI自2018年发布以来迅速崛起，其魅力在于：极致的性能：基于Starlette（异步Web框架）和Pydantic
如何使用Python实现交通工具识别
如何使用Python实现交通工具识别文章目录技术架构功能流程识别逻辑用户界面增强特性依赖项主要类别内容展示该系统是一个基于深度学习的交通工具识别工具，具备以下核心功能与特点：技术架构使用预训练的ResNet50卷积神经网络模型（来自ImageNet数据集）集成图像增强预处理技术（随机裁剪、旋转、翻转等）采用多数投票机制提升预测稳定性基于置信度评分的结果筛选策略功能流程用户通过GUI界面选择待识别图
【EGSR2025】材质+扩散模型+神经网络相关论文整理随笔（四） Superstarimage 文献随笔材质神经网络人工智能扩散模型
AnevaluationofSVBRDFPredictionfromGenerativeImageModelsforAppearanceModelingof3DScenes输入3D场景的几何和一张参考图像，通过扩散模型和SVBRDF预测器获取多视角的材质maps，这些maps最终合并成场景的纹理地图集，并支持在任意视角、任意光照条件下进行重新渲染。样例图如下：在当前时代的技术背景下，生成与几何匹配
YOLOv11 技术详解：架构优化与性能提升代码老y YOLO 架构目标跟踪
YOLOv11是目标检测领域中一个备受瞩目的新版本，它在保持实时性的同时，显著提升了检测的准确性和效率。本文将深入探讨YOLOv11的架构改进、性能优化以及它在不同应用场景中的表现。一、架构改进（一）C3K2块YOLOv11引入了C3K2块，这是对之前版本中CSP（CrossStagePartial）块的增强。C3K2块使用不同的核大小（例如3x3或5x5）和通道分离策略来优化更复杂特征的提取。这
OpenCV图片操作100例：从入门到精通指南（1）总有刁民想爱朕ha opencv 计算机视觉人工智能
OpenCV图片操作100例：从入门到精通指南本文整理了100个OpenCV实用技巧，涵盖图像处理各个领域，助你轻松掌握计算机视觉核心技能！一、入门必备：基础操作1.图像读写与显示importcv2#读取图像（BGR格式）img=cv2.imread('image.jpg')#显示图像cv2.imshow('示例图片',img)cv2.waitKey(0)#按任意键退出cv2.destroyAll
关于旗正规则引擎下载页面需要弹窗保存到本地目录的问题何必如此 jsp 超链接文件下载窗口
生成下载页面是需要选择“录入提交页面”，生成之后默认的下载页面<a>标签超链接为：<a href="<%=root_stimage%>stimage/image.jsp?filename=<%=strfile234%>&attachname=<%=java.net.URLEncoder.encode(file234filesourc
【Spark九十八】Standalone Cluster Mode下的资源调度源代码分析 bit1129 cluster
在分析源代码之前，首先对Standalone Cluster Mode的资源调度有一个基本的认识：首先，运行一个Application需要Driver进程和一组Executor进程。在Standalone Cluster Mode下，Driver和Executor都是在Master的监护下给Worker发消息创建(Driver进程和Executor进程都需要分配内存和CPU，这就需要Maste
linux上独立安装部署spark daizj linux 安装 spark 1.4 部署
下面讲一下linux上安装spark，以 Standalone Mode 安装 1）首先安装JDK 下载JDK：jdk-7u79-linux-x64.tar.gz ，版本是1.7以上都行，解压 tar -zxvf jdk-7u79-linux-x64.tar.gz 然后配置 ~/.bashrc&nb
Java 字节码之解析一周凡杨 java 字节码 javap
一： Java 字节代码的组织形式类文件 { OxCAFEBABE ，小版本号，大版本号，常量池大小，常量池数组，访问控制标记，当前类信息，父类信息，实现的接口个数，实现的接口信息数组，域个数，域信息数组，方法个数，方法信息数组，属性个数，属性信息数组 } &nbs
java各种小工具代码 g21121 java
1.数组转换成List import java.util.Arrays; Arrays.asList(Object[] obj); 2.判断一个String型是否有值 import org.springframework.util.StringUtils; if (StringUtils.hasText(str)) 3.判断一个List是否有值 import org.spring
加快FineReport报表设计的几个心得体会老A不折腾 finereport
一、从远程服务器大批量取数进行表样设计时，最好按“列顺序”取一个“空的SQL语句”，这样可提高设计速度。否则每次设计时模板均要从远程读取数据，速度相当慢！！二、找一个富文本编辑软件（如NOTEPAD+）编辑SQL语句，这样会很好地检查语法。有时候带参数较多检查语法复杂时，结合FineReport中生成的日志，再找一个第三方数据库访问软件（如PL/SQL）进行数据检索，可以很快定位语法错误。
mysql linux启动与停止墙头上一根草
如何启动/停止/重启MySQL一、启动方式1、使用 service 启动：service mysqld start2、使用 mysqld 脚本启动：/etc/inint.d/mysqld start3、使用 safe_mysqld 启动：safe_mysqld&二、停止1、使用 service 启动：service mysqld stop2、使用 mysqld 脚本启动：/etc/inin
Spring中事务管理浅谈 aijuans spring 事务管理
Spring中事务管理浅谈 By Tony Jiang@2012-1-20 Spring中对事务的声明式管理拿一个XML举例 [html] view plain copy print ? <?xml version="1.0" encoding="UTF-8"?>&nb
php中隐形字符65279（utf-8的BOM头）问题 alxw4616
php中隐形字符65279（utf-8的BOM头）问题今天遇到一个问题. php输出JSON 前端在解析时发生问题:parsererror. 调试: 1.仔细对比字符串发现字符串拼写正确.怀疑是非打印字符的问题. 2.逐一将字符串还原为unicode编码. 发现在字符串头的位置出现了一个 65279的非打印字符.
调用对象是否需要传递对象(初学者一定要注意这个问题) 百合不是茶对象的传递与调用技巧
类和对象的简单的复习,在做项目的过程中有时候不知道怎样来调用类创建的对象,简单的几个类可以看清楚,一般在项目中创建十几个类往往就不知道怎么来看为了以后能够看清楚,现在来回顾一下类和对象的创建,对象的调用和传递(前面写过一篇) 类和对象的基础概念: JAVA中万事万物都是类类有字段(属性),方法,嵌套类和嵌套接
JDK1.5 AtomicLong实例 bijian1013 java thread java多线程 AtomicLong
JDK1.5 AtomicLong实例类 AtomicLong 可以用原子方式更新的 long 值。有关原子变量属性的描述，请参阅 java.util.concurrent.atomic 包规范。AtomicLong 可用在应用程序中（如以原子方式增加的序列号），并且不能用于替换 Long。但是，此类确实扩展了 Number，允许那些处理基于数字类的工具和实用工具进行统一访问。
自定义的RPC的Java实现 bijian1013 java rpc
网上看到纯java实现的RPC，很不错。 RPC的全名Remote Process Call，即远程过程调用。使用RPC，可以像使用本地的程序一样使用远程服务器上的程序。下面是一个简单的RPC 调用实例，从中可以看到RPC如何
【RPC框架Hessian一】Hessian RPC Hello World bit1129 Hello world
什么是Hessian The Hessian binary web service protocol makes web services usable without requiring a large framework, and without learning yet another alphabet soup of protocols. Because it is a binary p
【Spark九十五】Spark Shell操作Spark SQL bit1129 shell
在Spark Shell上，通过创建HiveContext可以直接进行Hive操作 1. 操作Hive中已存在的表 [hadoop@hadoop bin]$ ./spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Welcom
F5　往header加入客户端的ip ronin47
when HTTP_RESPONSE {if {[HTTP::is_redirect]}{ HTTP::header replace Location [string map {:port/ /} [HTTP::header value Location]]HTTP::header replace Lo
java-61-在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差. 求所有数对之差的最大值。例如在数组{2, 4, 1, 16, 7, 5, bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/2541117420116135376632/ 写了个java版的 public class GreatestLeftRightDiff { /** * Q61.在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差。 * 求所有数对之差的最大值。例如在数组
mongoDB 索引开窍的石头 mongoDB索引
在这一节中我们讲讲在mongo中如何创建索引得到当前查询的索引信息 db.user.find(_id:12).explain(); cursor: basicCoursor 指的是没有索引 &
[硬件和系统]迎峰度夏 comsci 系统
从这几天的气温来看，今年夏天的高温天气可能会维持在一个比较长的时间内所以，从现在开始准备渡过炎热的夏天。。。。每间房屋要有一个落地电风扇，一个空调(空调的功率和房间的面积有密切的关系) 坐的，躺的地方要有凉垫，床上要有凉席电脑的机箱
基于ThinkPHP开发的公司官网 cuiyadll 行业系统
后端基于ThinkPHP，前端基于jQuery和BootstrapCo.MZ 企业系统轻量级企业网站管理系统运行环境:PHP5.3+, MySQL5.0 系统预览系统下载：http://www.tecmz.com 预览地址：http://co.tecmz.com 各种设备自适应响应式的网站设计能够对用户产生友好度，并且对于
Transaction and redelivery in JMS (JMS的事务和失败消息重发机制) darrenzhu jms 事务承认 MQ acknowledge
JMS Message Delivery Reliability and Acknowledgement Patterns http://wso2.com/library/articles/2013/01/jms-message-delivery-reliability-acknowledgement-patterns/ Transaction and redelivery in
Centos添加硬盘完全教程 dcj3sjt126com linux centos hardware
Linux的硬盘识别: sda 表示第1块SCSI硬盘 hda 表示第1块IDE硬盘 scd0 表示第1个USB光驱一般使用“fdisk -l”命
yii2 restful web服务路由 dcj3sjt126com PHP yii2
路由随着资源和控制器类准备，您可以使用URL如 http://localhost/index.php?r=user/create访问资源，类似于你可以用正常的Web应用程序做法。在实践中，你通常要用美观的URL并采取有优势的HTTP动词。例如，请求POST /users意味着访问user/create动作。这可以很容易地通过配置urlManager应用程序组件来完成如下所示
MongoDB查询(4)——游标和分页[八] eksliang mongodb MongoDB游标 MongoDB深分页
转载请出自出处：http://eksliang.iteye.com/blog/2177567 一、游标数据库使用游标返回find的执行结果。客户端对游标的实现通常能够对最终结果进行有效控制，从shell中定义一个游标非常简单，就是将查询结果分配给一个变量（用var声明的变量就是局部变量），便创建了一个游标，如下所示： > var
Activity的四种启动模式和onNewIntent() gundumw100 android
Android中Activity启动模式详解　　在Android中每个界面都是一个Activity，切换界面操作其实是多个不同Activity之间的实例化操作。在Android中Activity的启动模式决定了Activity的启动运行方式。　　Android总Activity的启动模式分为四种： Activity启动模式设置： <acti
攻城狮送女友的CSS3生日蛋糕 ini html Web html5 css css3
在线预览：http://keleyi.com/keleyi/phtml/html5/29.htm 代码如下： <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>攻城狮送女友的CSS3生日蛋糕-柯乐义<
读源码学Servlet（1）GenericServlet 源码分析 jzinfo tomcat Web servlet 网络应用网络协议
Servlet API的核心就是javax.servlet.Servlet接口，所有的Servlet 类（抽象的或者自己写的）都必须实现这个接口。在Servlet接口中定义了5个方法，其中有3个方法是由Servlet 容器在Servlet的生命周期的不同阶段来调用的特定方法。先看javax.servlet.servlet接口源码： package
JAVA进阶：VO(DTO)与PO(DAO)之间的转换 snoopy7713 java VO Hibernate po
PO即 Persistence Object　　VO即 Value Object 　VO和PO的主要区别在于：　　VO是独立的Java Object。　　PO是由Hibernate纳入其实体容器（Entity Map）的对象，它代表了与数据库中某条记录对应的Hibernate实体，PO的变化在事务提交时将反应到实际数据库中。　实际上，这个VO被用作Data Transfer
mongodb group by date 聚合查询日期统计每天数据（信息量） qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
/* 1 */ { "_id" : ObjectId("557ac1e2153c43c320393d9d"), "msgType" : "text", "sendTime" : ISODate("2015-06-12T11:26:26.000Z")
java之18天常用的类(一) Luob. Math Date System Runtime Rundom
System类 import java.util.Properties; /** * System: * out:标准输出,默认是控制台 * in:标准输入,默认是键盘 * * 描述系统的一些信息 * 获取系统的属性信息:Properties getProperties(); * * * */ public class Sy
maven wuai maven
1、安装maven：解压缩、添加M2_HOME、添加环境变量path 2、创建maven_home文件夹，创建项目mvn_ch01,在其下面建立src、pom.xml，在src下面简历main、test、main下面建立java文件夹 3、编写类，在java文件夹下面依照类的包逐层创建文件夹，将此类放入最后一级文件夹 4、进入mvn_ch01 4.1、mvn compile ,执行后会在