datamonday

【CV-Paper 21】目标检测 06：Faster R-CNN-2016

论文原文：LINK
论文被引：20864(08/09/2020)

文章目录

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Abstract
1 INTRODUCTION
2 RELATED WORK
3 FASTER R-CNN
- 3.1 Region Proposal Networks
- - 3.1.1 Anchors
  - 3.1.2 Loss Function
  - 3.1.3 T raining RPNs
- 3.2 Sharing Features for RPN and Fast R-CNN
- 3.3 Implementation Details
4 EXPERIMENTS
- 4.1 Experiments on P ASCAL VOC
- 4.2 Experiments on MS COCO
5 CONCLUSION

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with “attention” mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on P ASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

Index Terms—Object Detection, Region Proposal, Convolutional Neural Network.

最新的物体检测网络依靠区域提议（区域建议）算法来假设物体的位置。SPPnet [1]和Fast R-CNN [2]之类的进步减少了这些检测网络的运行时间，暴露了区域提议计算的瓶颈。在这项工作中，我们引入了一个区域提议网络（Region Proposal Network，RPN），该区域提议网络与检测网络共享全图像卷积特征，从而实现几乎免费的区域提议。 RPN是一个全卷积的网络，可以同时预测每个位置的对象边界和对象得分。对RPN进行了端到端的培训，以生成高质量的区域建议，Fast R-CNN将其用于检测。通过共享RPN和Fast R-CNN的卷积特征，我们将RPN和Fast R-CNN进一步合并为一个网络-使用最近流行的带有“注意力”机制的神经网络术语，RPN组件告诉统一网络在哪里查看。对于非常深的VGG-16模型[3]，我们的检测系统在GPU上具有5fps（包括所有步骤）的帧速率，同时在P ASCAL VOC 2007、2012，和MS COCO数据集，每个图像仅包含300个提议。在ILSVRC和COCO 2015比赛中，Faster R-CNN和RPN是在多个赛道中获得第一名的作品的基础。代码已公开提供。

1 INTRODUCTION

Recent advances in object detection are driven by the success of region proposal methods (e.g., [4]) and region-based convolutional neural networks (RCNNs) [5]. Although region-based CNNs were computationally expensive as originally developed in [5], their cost has been drastically reduced thanks to sharing convolutions across proposals [1], [2]. The latest incarnation, Fast R-CNN [2], achieves near real-time rates using very deep networks [3], when ignoring the time spent on region proposals. Now, proposals are the test-time computational bottleneck in state-of-the-art detection systems.

区域提议方法（例如[4]）和基于区域的卷积神经网络（RCNN）[5]的成功推动了对象检测的最新进展。尽管基于区域的CNN在计算上很昂贵，如最初在[5]中开发的，但由于在提案[1]，[2]中共享卷积，因此其成本已大大降低。最新的化身，Fast R-CNN [2]，在忽略区域提议花费的时间时，使用非常深的网络[3]实现了接近实时的速度。现在，提议是最新检测系统中测试时间的计算瓶颈。

Region proposal methods typically rely on inexpensive features and economical inference schemes. Selective Search [4], one of the most popular methods, greedily merges superpixels based on engineered low-level features. Yet when compared to efficient detection networks [2], Selective Search is an order of magnitude slower, at 2 seconds per image in a CPU implementation. EdgeBoxes [6] currently provides the best tradeoff between proposal quality and speed, at 0.2 seconds per image. Nevertheless, the region proposal step still consumes as much running time as the detection network.

区域提议方法通常依赖于便宜的特征和经济的推理方案。选择性搜索[4]是最流行的方法之一，它根据工程化的底层特征贪婪地合并超像素。然而，与高效的检测网络相比[2]，选择性搜索的速度要慢一个数量级，在CPU实现中每张图像2秒。 EdgeBoxes [6]当前提供建议质量和速度之间的最佳权衡，每张图像0.2秒。尽管如此，区域提议步骤仍然消耗与检测网络一样多的运行时间。

One may note that fast region-based CNNs take advantage of GPUs, while the region proposal methods used in research are implemented on the CPU, making such runtime comparisons inequitable. An obvious way to accelerate proposal computation is to reimplement it for the GPU. This may be an effective engineering solution, but re-implementation ignores the down-stream detection network and therefore misses important opportunities for sharing computation.

可能会注意到，基于区域的快速CNN充分利用了GPU的优势，而研究中使用的区域提议方法则是在CPU上实现的，因此这种运行时比较是不公平的。加速提议计算的一种明显方法是为GPU重新实现。这可能是一种有效的工程解决方案，但是重新实现会忽略下游检测网络，因此会丢失共享计算的重要机会。

In this paper, we show that an algorithmic change— computing proposals with a deep convolutional neural network—leads to an elegant and effective solution where proposal computation is nearly cost-free given the detection network’s computation. To this end, we introduce novel Region Proposal Networks (RPNs) that share convolutional layers with state-of-the-art object detection networks [1], [2]. By sharing convolutions at test-time, the marginal cost for computing proposals is small (e.g., 10ms per image).

在本文中，我们证明了算法的变化（使用深度卷积神经网络计算提议）导致了一种优雅而有效的解决方案，考虑到检测网络的计算，提议计算几乎是免费的。为此，我们介绍了与最新的对象检测网络[1]，[2]共享卷积层的新颖的区域提议网络（RPN）。通过在测试时共享卷积，计算提议的边际成本很小（例如，每张图片10毫秒）。

Our observation is that the convolutional feature maps used by region-based detectors, like Fast RCNN, can also be used for generating region proposals. On top of these convolutional features, we construct an RPN by adding a few additional convolutional layers that simultaneously regress region bounds and objectness scores at each location on a regular grid. The RPN is thus a kind of fully convolutional network (FCN) [7] and can be trained end-toend specifically for the task for generating detection proposals.

我们的观察结果是，基于区域的检测器（如Fast RCNN）使用的卷积特征图也可用于生成区域建议。在这些卷积特征之上，我们通过添加一些其他卷积层来构造RPN，这些卷积层同时回归规则网格上每个位置的区域边界和客观性得分。因此，RPN是一种全卷积网络（FCN）[7]，可以专门针对生成检测建议的任务进行端到端训练。

RPNs are designed to efficiently predict region proposals with a wide range of scales and aspect ratios. In contrast to prevalent methods [8], [9], [1], [2] that use pyramids of images (Figure 1, a) or pyramids of filters (Figure 1, b), we introduce novel “anchor” boxes that serve as references at multiple scales and aspect ratios. Our scheme can be thought of as a pyramid of regression references (Figure 1, c), which avoids enumerating images or filters of multiple scales or aspect ratios. This model performs well when trained and tested using single-scale images and thus benefits running speed.

RPN旨在以各种比例和纵横比有效预测区域提议。与使用图像金字塔（图1，a）或过滤器金字塔（图1，b）的流行方法[8]，[9]，[1]，[2]相比，我们引入了新颖的“锚定”框可以在多个比例和纵横比下用作参考。我们的方案可以看作是回归参考的金字塔（图1，c），它避免了枚举具有多个比例或纵横比的图像或过滤器。当使用单比例尺图像进行训练和测试时，该模型表现良好，从而提高了运行速度。

图1：解决多种尺度和规模的不同方案。（a）建立图像和特征图的金字塔，并在所有比例下运行分类器。（b）具有多个比例/大小的滤波器的金字塔在特征图上运行。（c）我们在回归函数中使用参考框的金字塔。

To unify RPNs with Fast R-CNN [2] object detection networks, we propose a training scheme that alternates between fine-tuning for the region proposal task and then fine-tuning for object detection, while keeping the proposals fixed. This scheme converges quickly and produces a unified network with convolutional features that are shared between both tasks.

为了将RPN与Fast R-CNN [2]对象检测网络统一起来，我们提出了一种训练方案，该方案在对区域建议任务进行微调与对对象检测进行微调之间交替，同时保持建议不变。这种方案可以快速收敛，并生成具有两个任务之间共享的卷积特征的统一网络。

We comprehensively evaluate our method on the PASCAL VOC detection benchmarks [11] where RPNs with Fast R-CNNs produce detection accuracy better than the strong baseline of Selective Search with Fast R-CNNs. Meanwhile, our method waives nearly all computational burdens of Selective Search at test-time—the effective running time for proposals is just 10 milliseconds. Using the expensive very deep models of [3], our detection method still has a frame rate of 5fps (including all steps) on a GPU, and thus is a practical object detection system in terms of both speed and accuracy . We also report results on the MS COCO dataset [12] and investigate the improvements on PASCAL VOC using the COCO data. Code has been made publicly available at https://github.com/shaoqingren/faster_ rcnn (in MATLAB) and https://github.com/ rbgirshick/py-faster-rcnn (in Python).

我们在PASCAL VOC检测基准[11]上全面评估了我们的方法，其中具有Fast R-CNN的RPN产生的检测精度要优于具有Fast R-CNN的选择性搜索的基准。同时，我们的方法在测试时几乎免除了“选择性搜索”的所有计算负担-提议的有效运行时间仅为10毫秒。使用昂贵的非常深的模型[3]，我们的检测方法在GPU上的帧速率仍然为5fps（包括所有步骤），因此在速度和准确性方面都是实用的对象检测系统。我们还报告了MS COCO数据集的结果[12]，并使用COCO数据研究了PASCAL VOC的改进。代码已在https://github.com/shaoqingren/faster_ rcnn（MATLAB）和https://github.com/rbgirshick/py-faster-rcnn（Python）中公开可用。

A preliminary version of this manuscript was published previously [10]. Since then, the frameworks of RPN and Faster R-CNN have been adopted and generalized to other methods, such as 3D object detection [13], part-based detection [14], instance segmentation [15], and image captioning [16]. Our fast and effective object detection system has also been built in commercial systems such as at Pinterests [17], with user engagement improvements reported.

该手稿的初步版本先前已发布[10]。从那时起，RPN和Faster R-CNN的框架已被采用并推广到其他方法，例如3D对象检测[13]，基于零件的检测[14]，实例分割[15]和图像字幕[16]。我们的快速有效的物体检测系统也已经建立在商业系统中，例如Pinterests [17]，据报道用户参与度有所提高。

In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the basis of several 1st-place entries [18] in the tracks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. RPNs completely learn to propose regions from data, and thus can easily benefit from deeper and more expressive features (such as the 101-layer residual nets adopted in [18]). Faster R-CNN and RPN are also used by several other leading entries in these competitions2. These results suggest that our method is not only a cost-efficient solution for practical usage, but also an effective way of improving object detection accuracy.

在ILSVRC和COCO 2015竞赛中，Faster R-CNN和RPN是ImageNet检测，ImageNet本地化，COCO检测和COCO分割中几个第一名的基础[18]。 RPN完全学会了根据数据提议区域，因此可以轻松地从更深，更具表现力的特征（例如[18]中采用的101层残差网）中受益。在这些比赛中，其他一些领先的参赛者也使用了Faster R-CNN和RPN。这些结果表明，我们的方法不仅是一种实用的高性价比解决方案，而且还是提高物体检测精度的有效途径。

2 RELATED WORK

Object Proposals. There is a large literature on object proposal methods. Comprehensive surveys and comparisons of object proposal methods can be found in [19], [20], [21]. Widely used object proposal methods include those based on grouping super-pixels (e.g., Selective Search [4], CPMC [22], MCG [23]) and those based on sliding windows (e.g., objectness in windows [24], EdgeBoxes [6]). Object proposal methods were adopted as external modules independent of the detectors (e.g., Selective Search [4] object detectors, RCNN [5], and Fast R-CNN [2]).

对象提议。关于对象提议方法的文献很多。可以在[19]，[20]，[21]中找到对象提议方法的综合调查和比较。广泛使用的对象提议方法包括基于超像素分组的方法（例如，选择性搜索[4]，CPMC [22]，MCG [23]）和基于滑动窗口的方法（例如，窗口中的对象[24]，EdgeBoxes [ 6]）。对象提议方法被用作独立于检测器的外部模块（例如，选择性搜索[4]对象检测器，RCNN [5]和Fast R-CNN [2]）。

Deep Networks for Object Detection. The R-CNN method [5] trains CNNs end-to-end to classify the proposal regions into object categories or background. R-CNN mainly plays as a classifier, and it does not predict object bounds (except for refining by bounding box regression). Its accuracy depends on the performance of the region proposal module (see comparisons in [20]). Several papers have proposed ways of using deep networks for predicting object bounding boxes [25], [9], [26], [27]. In the OverFeat method [9], a fully-connected layer is trained to predict the box coordinates for the localization task that assumes a single object. The fully-connected layer is then turned into a convolutional layer for detecting multiple classspecific objects. The MultiBox methods [26], [27] generate region proposals from a network whose last fully-connected layer simultaneously predicts multiple class-agnostic boxes, generalizing the “singlebox” fashion of OverFeat. These class-agnostic boxes are used as proposals for R-CNN [5]. The MultiBox proposal network is applied on a single image crop or multiple large image crops (e.g., 224×224), in contrast to our fully convolutional scheme. MultiBox does not share features between the proposal and detection networks. We discuss OverFeat and MultiBox in more depth later in context with our method. Concurrent with our work, the DeepMask method [28] is developed for learning segmentation proposals.

用于对象检测的深度网络。 R-CNN方法[5]端到端训练CNN将提议区域分类为对象类别或背景。 R-CNN主要充当分类器，它不预测对象边界（通过边界框回归进行精炼除外）。它的准确性取决于区域提议模块的性能（请参见[20]中的比较）。几篇论文提出了使用深度网络预测对象边界框的方法[25]，[9]，[26]，[27]。在OverFeat方法[9]中，训练了一个全连接层来预测假设单个对象的定位任务的框坐标。然后将全连接层转换为卷积层，以检测多个特定于类的对象。MultiBox方法[26]，[27]从网络中生成区域提议，该网络的最后一个全连接层同时预测多个与类无关的盒子，从而概括了OverFeat的“单个盒子”方式。这些与类无关的框用作R-CNN的提议[5]。与我们的全卷积方案相比，MultiBox提案网络适用于单个图片裁剪或多个大图片裁剪（例如224×224）。MultiBox在提议和检测网络之间不共享特征。我们稍后将在我们的方法中更深入地讨论OverFeat和MultiBox。与我们的工作同时，开发了DeepMask方法[28]以学习分割提议。

Shared computation of convolutions [9], [1], [29], [7], [2] has been attracting increasing attention for efficient, yet accurate, visual recognition. The OverFeat paper [9] computes convolutional features from an image pyramid for classification, localization, and detection. Adaptively-sized pooling (SPP) [1] on shared convolutional feature maps is developed for efficient region-based object detection [1], [30] and semantic segmentation [29]. Fast R-CNN [2] enables end-to-end detector training on shared convolutional features and shows compelling accuracy and speed.

卷积[9]，[1]，[29]，[7]，[2]的共享计算已吸引了越来越多的关注，以进行有效而准确的视觉识别。OverFeat论文[9]从图像金字塔计算卷积特征，以进行分类，定位和检测。共享卷积特征图上的自适应大小池化（SPP）[1]被开发用于有效的基于区域的对象检测[1]，[30]和语义分割[29]。Fast R-CNN [2]可以对共享卷积特征进行端到端检测器训练，并显示出令人信服的准确性和速度。

3 FASTER R-CNN

Our object detection system, called Faster R-CNN, is composed of two modules. The first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector [2] that uses the proposed regions. The entire system is a single, unified network for object detection (Figure 2). Using the recently popular terminology of neural networks with ‘attention’ [31] mechanisms, the RPN module tells the Fast R-CNN module where to look. In Section 3.1 we introduce the designs and properties of the network for region proposal. In Section 3.2 we develop algorithms for training both modules with features shared.

我们的物体检测系统称为Faster R-CNN，它由两个模块组成。第一个模块是提议区域的深层全卷积网络，第二个模块是使用提议区域的Fast R-CNN检测器[2]。整个系统是用于对象检测的单个统一网络（图2）。RPN模块使用最近流行的带有“注意力” [31]机制的神经网络术语，告诉Fast R-CNN模块应该在哪里查看。在第3.1节中，我们介绍了用于区域提议的网络的设计和属性。在3.2节中，我们开发了用于训练具有共享特征的两个模块的算法。

图2：Faster R-CNN是用于对象检测的单个统一网络。RPN模块充当此统一网络的“注意”。

3.1 Region Proposal Networks

A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score.3We model this process with a fully convolutional network [7], which we describe in this section. Because our ultimate goal is to share computation with a Fast R-CNN object detection network [2], we assume that both nets share a common set of convolutional layers. In our experiments, we investigate the Zeiler and Fergus model [32] (ZF), which has 5 shareable convolutional layers and the Simonyan and Zisserman model [3] (VGG-16), which has 13 shareable convolutional layers.

区域提议网络（RPN）接收（任意大小）图像作为输入，并输出一组矩形的目标提议，每个提议均具有客观性得分。我们使用全卷积网络对该过程进行建模[7]，我们将在此部分进行描述。因为我们的最终目标是与Fast R-CNN对象检测网络共享计算[2]，所以我们假设两个网络共享一组共同的卷积层。在我们的实验中，我们研究了具有5个可共享卷积层的Zeiler和Fergus模型[32]（ZF），以及具有13个可共享卷积层的Simonyan和Zisserman模型[3]（VGG-16）。

To generate region proposals, we slide a small network over the convolutional feature map output by the last shared convolutional layer. This small network takes as input an n × n spatial window of the input convolutional feature map. Each sliding window is mapped to a lower-dimensional feature (256-d for ZF and 512-d for VGG, with ReLU [33] following). This feature is fed into two sibling fullyconnected layers—a box-regression layer (reg) and a box-classification layer (cls). We use n = 3 in this paper, noting that the effective receptive field on the input image is large (171 and 228 pixels for ZF and VGG, respectively). This mini-network is illustrated at a single position in Figure 3 (left). Note that because the mini-network operates in a sliding-window fashion, the fully-connected layers are shared across all spatial locations. This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1× 1 convolutional layers (for reg and cls, respectively).

为了生成区域提议，我们在最后共享的卷积层输出的卷积特征图上滑动一个小型网络。这个小网络将输入卷积特征图的n×n空间窗口作为输入。每个滑动窗口都映射到一个较低维的特征（ZF为256-d，VGG为512-d，后面是ReLU [33]）。此特征被输入两个同级的全连接层——框回归层（reg）和框分类层（cls）。在本文中，我们使用n = 3，注意输入图像上的有效接收场很大（ZF和VGG分别为171和228像素）。该微型网络在图3中的单个位置（左）中进行了说明。请注意，由于微型网络以滑动窗口的方式运行，因此全连接层将在所有空间位置上共享。自然地，该体系结构由n×n卷积层和两个同级1×1卷积层（分别用于reg和cls）实现。

图3：左图：区域提议网络（RPN）。右：在PASCAL VOC 2007测试中使用RPN提议的检测示例。我们的方法可以检测各种比例和纵横比的物体。

3.1.1 Anchors

At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k. So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal4. The k proposals are parameterized relative to k reference boxes, which we call anchors. An anchor is centered at the sliding window in question, and is associated with a scale and aspect ratio (Figure 3, left). By default we use 3 scales and 3 aspect ratios, yielding k = 9 anchors at each sliding position. For a convolutional feature map of a size W × H (typically ∼2,400), there are W Hk anchors in total.

在每个滑动窗口位置，我们同时预测多个区域提议，其中每个位置的最大可能提议数目表示为k。因此，reg层有4k个输出，对k个框的坐标进行编码，而cls层则输出2k个分数，这些分数估计每个提议的目标或非目标的概率。相对于k个参考框（我们称为锚点（anchors）），对k个提议进行了参数化。锚点位于相关滑动窗口的中心，并与比例和长宽比相关（图3，左）。默认情况下，我们使用3个比例和3个纵横比，在每个滑动位置产生k = 9个锚点。对于大小为W×H（通常约为2400）的卷积特征图，总共有 WHk 个锚点。

Translation-Invariant Anchors
An important property of our approach is that it is translation invariant, both in terms of the anchors and the functions that compute proposals relative to the anchors. If one translates an object in an image, the proposal should translate and the same function should be able to predict the proposal in either location. This translation-invariant property is guaranteed by our method5. As a comparison, the MultiBox method [27] uses k-means to generate 800 anchors, which are not translation invariant. So MultiBox does not guarantee that the same proposal is generated if an object is translated.

我们的方法的一个重要特性是，在锚点和计算相对于锚点的提议的函数方面，它都是平移不变的。如果一个人转变了图像中的一个对象，则该提议应进行转变，并且相同的函数应能够在任一位置预测该提议。我们的方法保证了平移不变性。作为比较，MultiBox方法[27]使用k-means生成800个锚点，这些锚点不是平移不变的。因此，MultiBox不保证转变对象时会生成相同的建议。

The translation-invariant property also reduces the model size. MultiBox has a (4 + 1)×800-dimensional fully-connected output layer, whereas our method has a (4 + 2) × 9-dimensional convolutional output layer in the case of k = 9 anchors. As a result, our output layer has 2.8 × 104 parameters (512 × (4 + 2) × 9 for VGG-16), two orders of magnitude fewer than MultiBox’s output layer that has 6.1×106 parameters (1536 × (4 + 1) × 800 for GoogleNet [34] in MultiBox [27]). If considering the feature projection layers, our proposal layers still have an order of magnitude fewer parameters than MultiBox. We expect our method to have less risk of overfitting on small datasets, like PASCAL VOC.

平移不变性还减小了模型大小。MultiBox具有（4 + 1）×800维的全连接输出层，而在k = 9锚的情况下，我们的方法具有（4 + 2）×9维的卷积输出层。结果，我们的输出层具有 $2.8×10^4$ 参数（VGG-16为512×（4 + 2）×9），比具有 $6.1×10^6$ 参数的 MultiBox少（MultiBox [27]中的GoogleNet [34]的参数为1536×（4 + 1）× 800）。如果考虑特征投影层，我们的提议层的参数仍然比MultiBox少一个数量级。我们希望我们的方法在较小的数据集（如PASCAL VOC）上过拟合的风险较小。

Multi-Scale Anchors as Regression References

Our design of anchors presents a novel scheme for addressing multiple scales (and aspect ratios). As shown in Figure 1, there have been two popular ways for multi-scale predictions. The first way is based on image/feature pyramids, e.g., in DPM [8] and CNNbased methods [9], [1], [2]. The images are resized at multiple scales, and feature maps (HOG [8] or deep convolutional features [9], [1], [2]) are computed for each scale (Figure 1(a)). This way is often useful but is time-consuming. The second way is to use sliding windows of multiple scales (and/or aspect ratios) on the feature maps. For example, in DPM [8], models of different aspect ratios are trained separately using different filter sizes (such as 5×7 and 7×5). If this way is used to address multiple scales, it can be thought of as a “pyramid of filters” (Figure 1(b)). The second way is usually adopted jointly with the first way [8].

我们的锚盒设计提出了一种解决多种尺度（和纵横比）的新方案。如图1所示，有两种流行的多尺度预测方法。第一种方法是基于图像/特征金字塔的，例如在DPM [8]和基于CNN的方法[9]，[1]，[2]中。图像会在多个尺度上调整大小，并针对每个尺度计算特征图（HOG [8]或深度卷积特征[9]，[1]，[2]）（图1（a））。这种方法通常有用但很费时。第二种方法是在特征图上使用多个尺度（和/或纵横比）的滑动窗口。例如，在DPM [8]中，使用不同的滤波器大小（例如5×7和7×5）分别训练不同长宽比的模型。如果使用这种方法处理多个尺度，则可以将其视为“滤波器金字塔”（图1（b））。第二种方法通常与第一种方法一起使用[8]。

As a comparison, our anchor-based method is built on a pyramid of anchors, which is more cost-efficient. Our method classifies and regresses bounding boxes with reference to anchor boxes of multiple scales and aspect ratios. It only relies on images and feature maps of a single scale, and uses filters (sliding windows on the feature map) of a single size. We show by experiments the effects of this scheme for addressing multiple scales and sizes (Table 8).

相比之下，我们的基于锚盒的方法是基于锚盒金字塔构建的，这种方法更具成本效益。我们的方法参照多个尺度和纵横比的锚盒对边界框进行分类和回归。它仅依赖于单一尺度的图像和特征图，并使用单一尺寸的过滤器（特征图上的滑动窗口）。我们通过实验证明了该方案对解决多种尺度和规模的影响（表8）。

Because of this multi-scale design based on anchors, we can simply use the convolutional features computed on a single-scale image, as is also done by the Fast R-CNN detector [2]. The design of multiscale anchors is a key component for sharing features without extra cost for addressing scales.

由于基于锚盒的这种多尺度设计，我们可以简单地使用在单尺度图像上计算出的卷积特征，正如Fast R-CNN检测器所做的那样[2]。多尺度锚盒的设计是共享特征而无需花费额外成本解决尺度的关键组成部分。

3.1.2 Loss Function

For training RPNs, we assign a binary class label (of being an object or not) to each anchor. We assign a positive label to two kinds of anchors: (i) the anchor/anchors with the highest Intersection-overUnion (IoU) overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap higher than 0.7 with any ground-truth box. Note that a single ground-truth box may assign positive labels to multiple anchors. Usually the second condition is sufficient to determine the positive samples; but we still adopt the first condition for the reason that in some rare cases the second condition may find no positive sample. We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective.

为了训练RPN，我们为每个锚盒分配一个二分类标签（无论是不是对象）。我们为两种锚盒分配一个正标签：（i）具有最高交并比（Intersection-over Union，IoU）重叠的锚盒/锚盒与标注框，或（ii）具有高于0.7的IoU重叠的锚盒，任何标注框。请注意，单个标注框可能为多个锚盒分配正标签。通常，第二个条件足以确定正样本。但是我们仍然采用第一个条件是因为在极少数情况下，第二个条件可能找不到正样本。如果所有标注框的IoU值均低于0.3，则为非正锚盒分配负标签。既不正也不负的锚盒对训练目标没有贡献。

With these definitions, we minimize an objective function following the multi-task loss in Fast R-CNN [2]. Our loss function for an image is defined as:

利用这些定义，我们在Fast R-CNN [2]中将多任务损失之后的目标函数减至最小。我们对图像的损失函数定义为：

Here, $i$ is the index of an anchor in a mini-batch and pi is the predicted probability of anchor i being an object. The ground-truth label $p^∗_i$ is 1 if the anchor is positive, and is 0 if the anchor is negative. ti is a vector representing the 4 parameterized coordinates of the predicted bounding box, and $t^∗_i$ is that of the ground-truth box associated with a positive anchor. The classification loss $L_{cls}$ is log loss over two classes (object vs. not object). For the regression loss, we use $L_{reg}(ti, t^∗_i) = R(t_i− t^∗_i)$ where R is the robust loss function (smooth L1) defined in [2]. The term $p^∗_iL_{reg}$ means the regression loss is activated only for positive anchors $p^∗_i= 1)$ and is disabled otherwise $p^∗_i= 0)$ . The outputs of the cls and reg layers consist of {pi} and {ti} respectively.

在此， $i$ 是小批量处理中锚盒的索引， $p_i$ 是锚盒 $i$ 作为对象的预测概率。如果锚盒为正，则真实标签 $p^∗_i$ 为1，如果锚盒为负，则为0。 $t_i$ 是代表预测边界框的4个参数化坐标的向量，而 $t^∗_i$ 是与正锚盒关联的真实标注框的参数化坐标。分类损失 $L_{cls}$ 是两个类别（对象与非对象）之间的对数损失。对于回归损失，我们使用 $L_{reg}(ti, t^∗_i) = R(t_i− t^∗_i)$ ，其中 $R$ 是在[2]中定义的鲁棒损失函数（平滑L1）。 $p^∗_iL_{reg}$ 表示仅对正锚盒 $p^∗_i= 1)$ 激活回归损失，否则禁用回归损失 $p^∗_i= 0)$ 。 $c l s$ 和 $r e g$ 层的输出分别由{pi}和{ti}组成。

The two terms are normalized by Nclsand Nreg and weighted by a balancing parameter λ. In our current implementation (as in the released code), the cls term in Eqn.(1) is normalized by the mini-batch size (i.e., Ncls= 256) and the reg term is normalized by the number of anchor locations (i.e., Nreg∼ 2,400). By default we set λ = 10, and thus both cls and reg terms are roughly equally weighted. We show by experiments that the results are insensitive to the values of λ in a wide range (Table 9). We also note that the normalization as above is not required and could be simplified. For bounding box regression, we adopt the parameterizations of the 4 coordinates following [5]:

两项通过 $N_{cls}$ 和 $N_{reg}$ 归一化，并通过平衡参数 $λ$ 加权。在我们当前的实现中（如发布的代码中），等式（1）中的cls项通过小批量大小（即 $N_{cls} = 256$ ）进行归一化，而reg项通过锚盒位置的数量（即 $N_{reg}〜2,400$ ）。默认情况下，我们将 $λ$ 设置为10，因此 $c l s$ 和 $r e g$ 项的权重大致相等。我们通过实验表明，结果对宽范围内的 $λ$ 值不敏感（表9）。我们还注意到，不需要上述归一化，可以简化。对于边界框回归，我们采用[5]中以下的4个坐标的参数化：

where $x, y, w,$ and $h$ denote the box’s center coordinates and its width and height. Variables $x, x_a,$ and $x^∗$ are for the predicted box, anchor box, and ground-truth box respectively (likewise for $y, w, h$ ). This can be thought of as bounding-box regression from an anchor box to a nearby ground-truth box.

其中 $x, y, w,$ 和 $h$ 表示框的中心坐标及其宽度和高度。变量 $x, x_a,$ 和 $x^∗$ 分别是预测框，锚盒框和真实标注框（同样对于 $y, w, h$ ）。可以将其视为从锚盒框到附近的真实标注框的边界框回归。

Nevertheless, our method achieves bounding-box regression by a different manner from previous RoIbased (Region of Interest) methods [1], [2]. In [1], [2], bounding-box regression is performed on features pooled from arbitrarily sized RoIs, and the regression weights are shared by all region sizes. In our formulation, the features used for regression are of the same spatial size (3 × 3) on the feature maps. To account for varying sizes, a set of k bounding-box regressors are learned. Each regressor is responsible for one scale and one aspect ratio, and the k regressors do not share weights. As such, it is still possible to predict boxes of various sizes even though the features are of a fixed size/scale, thanks to the design of anchors.

然而，我们的方法通过与以前的基于RoI的（感兴趣区域）方法[1]，[2]不同的方式实现包围盒回归。在[1]，[2]中，对从任意大小的RoI合并的特征执行边界框回归，并且回归权重由所有区域大小共享。在我们的公式中，用于回归的特征在特征图上具有相同的空间大小（3×3）。为了说明变化的大小，学习了一组 $k$ 个边界框回归器。每个回归器负责一个尺度和一个长宽比，而 $k$ 个回归器不共享权重。这样，由于锚盒的设计，即使特征具有固定的大小/比例，仍然可以预测各种大小的盒。

3.1.3 T raining RPNs

The RPN can be trained end-to-end by backpropagation and stochastic gradient descent (SGD) [35]. We follow the “image-centric” sampling strategy from [2] to train this network. Each mini-batch arises from a single image that contains many positive and negative example anchors. It is possible to optimize for the loss functions of all anchors, but this will bias towards negative samples as they are dominate. Instead, we randomly sample 256 anchors in an image to compute the loss function of a mini-batch, where the sampled positive and negative anchors have a ratio of up to 1:1. If there are fewer than 128 positive samples in an image, we pad the mini-batch with negative ones.

可以通过反向传播和随机梯度下降（SGD）端对端地训练RPN [35]。我们遵循[2]中的“以图像为中心”的采样策略来训练该网络。每个小批量处理均来自包含多个正负示例锚盒的单个图像。可以针对所有锚盒的损失函数进行优化，但是由于它们占主导地位，因此会偏向于负样本。取而代之的是，我们在图像中随机采样256个锚盒，以计算小批量的损失函数，其中正采样和负锚的采样比最高为1：1。如果图像中的正样本少于128个，则用负样本填充小批量。

We randomly initialize all new layers by drawing weights from a zero-mean Gaussian distribution with standard deviation 0.01. All other layers (i.e., the shared convolutional layers) are initialized by pretraining a model for ImageNet classification [36], as is standard practice [5]. We tune all layers of the ZF net, and conv3_1 and up for the VGG net to conserve memory [2]. We use a learning rate of 0.001 for 60k mini-batches, and 0.0001 for the next 20k mini-batches on the PASCAL VOC dataset. We use a momentum of 0.9 and a weight decay of 0.0005 [37]. Our implementation uses Caffe [38].

我们通过从零均值高斯分布中提取权重（标准差为0.01）来随机初始化所有层。所有其他层（即共享卷积层）都通过预先训练ImageNet分类模型来初始化[36]，这是标准做法[5]。我们调整ZF网络的所有层，并转换conv3_1以及VGG网络以节省内存[2]。对于PASCAL VOC数据集，我们对60k小批量使用0.001的学习率，对接下来的20k小批量使用0.0001的学习率。我们使用0.9的动量和0.0005的权重衰减[37]。我们使用Caffe[38]实现。

3.2 Sharing Features for RPN and Fast R-CNN

Thus far we have described how to train a network for region proposal generation, without considering the region-based object detection CNN that will utilize these proposals. For the detection network, we adopt Fast R-CNN [2]. Next we describe algorithms that learn a unified network composed of RPN and Fast R-CNN with shared convolutional layers (Figure 2).

到目前为止，我们已经描述了如何训练网络以生成区域提议，而没有考虑将利用这些提议的基于区域的对象检测CNN。对于检测网络，我们采用Fast R-CNN[2]。接下来，我们描述学习具有RPN和Fast R-CNN并具有共享卷积层的统一网络的算法（图2）。

Both RPN and Fast R-CNN, trained independently , will modify their convolutional layers in different ways. We therefore need to develop a technique that allows for sharing convolutional layers between the two networks, rather than learning two separate networks. We discuss three ways for training networks with features shared:

经过独立训练的RPN和Fast R-CNN都将以不同的方式修改其卷积层。因此，我们需要开发一种技术，允许在两个网络之间共享卷积层，而不是学习两个单独的网络。我们讨论了三种共享特征的网络训练方法：

(i) Alternating training. In this solution, we first train RPN, and use the proposals to train Fast R-CNN. The network tuned by Fast R-CNN is then used to initialize RPN, and this process is iterated. This is the solution that is used in all experiments in this paper.

（i）交替训练。在此解决方案中，我们首先训练RPN，然后使用这些提议来训练Fast R-CNN。然后，将使用经过调参的Fast R-CNN网络初始化RPN，然后重复此过程。这是本文所有实验中使用的解决方案。

(ii) Approximate joint training. In this solution, the RPN and Fast R-CNN networks are merged into one network during training as in Figure 2. In each SGD iteration, the forward pass generates region proposals which are treated just like fixed, pre-computed proposals when training a Fast R-CNN detector. The backward propagation takes place as usual, where for the shared layers the backward propagated signals from both the RPN loss and the Fast R-CNN loss are combined. This solution is easy to implement. But this solution ignores the derivative w.r.t. the proposal boxes’ coordinates that are also network responses, so is approximate. In our experiments, we have empirically found this solver produces close results, yet reduces the training time by about 25-50% comparing with alternating training. This solver is included in our released Python code.

（ii）大约联合训练。在此解决方案中，如图2所示，在训练期间将RPN和快速R-CNN网络合并为一个网络。在每次SGD迭代中，前向生成的区域提议在训练Fast R-CNN检测器时都像固定的预先计算的提议一样对待。反向传播照常进行，对于共享层，来自RPN的损失和快速R-CNN的损失的反向传播信号被组合在一起。该解决方案易于实现。但是此解决方案忽略了关于提议框的坐标也是网络响应的导数，因此是近似值。在我们的实验中，我们凭经验发现该求解器产生的结果接近，但与交替训练相比，训练时间减少了约25-50％。此求解器包含在我们发布的Python代码中。

(iii) Non-approximate joint training. As discussed above, the bounding boxes predicted by RPN are also functions of the input. The RoI pooling layer [2] in Fast R-CNN accepts the convolutional features and also the predicted bounding boxes as input, so a theoretically valid backpropagation solver should also involve gradients w.r.t. the box coordinates. These gradients are ignored in the above approximate joint training. In a non-approximate joint training solution, we need an RoI pooling layer that is differentiable w.r.t. the box coordinates. This is a nontrivial problem and a solution can be given by an “RoI warping” layer as developed in [15], which is beyond the scope of this paper.

（iii）非近似联合训练。如上所述，RPN预测的边界框也是输入的函数。Fast R-CNN中的RoI池化层[2]接受卷积特征，并接受预测的边界框作为输入，因此，理论上有效的反向传播求解器也应包含关于框坐标的梯度。这些梯度在上面的近似联合训练中被忽略。在一个非近似的联合训练解决方案中，我们需要一个关于框坐标是可区分的RoI池化层。这是一个不平凡的问题，可以通过[15]中开发的“RoI翘曲”层来提供解决方案，这超出了本文的范围。

4-Step Alternating T raining. In this paper, we adopt a pragmatic 4-step training algorithm to learn shared features via alternating optimization. In the first step, we train the RPN as described in Section 3.1.3. This network is initialized with an ImageNet-pre-trained model and fine-tuned end-to-end for the region proposal task. In the second step, we train a separate detection network by Fast R-CNN using the proposals generated by the step-1 RPN. This detection network is also initialized by the ImageNet-pre-trained model. At this point the two networks do not share convolutional layers. In the third step, we use the detector network to initialize RPN training, but we fix the shared convolutional layers and only fine-tune the layers unique to RPN. Now the two networks share convolutional layers. Finally, keeping the shared convolutional layers fixed, we fine-tune the unique layers of Fast R-CNN. As such, both networks share the same convolutional layers and form a unified network. A similar alternating training can be run for more iterations, but we have observed negligible improvements.

4步交替训练。在本文中，我们采用务实的4步训练算法来通过交替优化学习共享特征。第一步，我们按照3.1.3节所述训练RPN。该网络使用ImageNet预先训练的模型进行初始化，并针对区域提议任务端到端进行了微调。在第二步中，我们使用步骤1 RPN生成的建议，通过Fast R-CNN训练一个单独的检测网络。该检测网络也由ImageNet预训练模型初始化。此时，这两个网络不共享卷积层。第三步，我们使用检测器网络初始化RPN训练，但是我们修复了共享卷积层，并且仅微调了RPN唯一的层。现在，这两个网络共享卷积层。最后，保持共享卷积层固定，我们微调Fast R-CNN的唯一层。这样，两个网络共享相同的卷积层并形成统一的网络。可以进行类似的交替训练进行更多的迭代，但是我们观察到的改进微不足道。

3.3 Implementation Details

We train and test both region proposal and object detection networks on images of a single scale [1], [2]. We re-scale the images such that their shorter side is s = 600 pixels [2]. Multi-scale feature extraction (using an image pyramid) may improve accuracy but does not exhibit a good speed-accuracy trade-off [2]. On the re-scaled images, the total stride for both ZF and VGG nets on the last convolutional layer is 16 pixels, and thus is ∼10 pixels on a typical PASCAL image before resizing (∼500×375). Even such a large stride provides good results, though accuracy may be further improved with a smaller stride.

我们在单一尺度的图像上训练和测试区域提议和目标检测网络[1]，[2]。我们重新缩放图像，使其短边为s = 600像素[2]。多尺度特征提取（使用图像金字塔）可以提高准确性，但并不能表现出良好的速度精度折衷[2]。在重新缩放的图像上，最后一个卷积层上的ZF和VGG网络的总跨度为16像素，因此在调整大小之前，在典型的PASCAL图像上的总跨度为〜10像素（〜500×375）。即使跨度较大，也可以提供良好的结果，尽管跨度较小时可以进一步提高精度。

For anchors, we use 3 scales with box areas of 1282, 2562, and 5122pixels, and 3 aspect ratios of $1 : 1, 1 : 2,$ and $2 : 1$ . These hyper-parameters are not carefully chosen for a particular dataset, and we provide ablation experiments on their effects in the next section. As discussed, our solution does not need an image pyramid or filter pyramid to predict regions of multiple scales, saving considerable running time. Figure 3 (right) shows the capability of our method for a wide range of scales and aspect ratios. Table 1 shows the learned average proposal size for each anchor using the ZF net. We note that our algorithm allows predictions that are larger than the underlying receptive field. Such predictions are not impossible—one may still roughly infer the extent of an object if only the middle of the object is visible.

对于锚盒，我们使用3个比例，框区域分别为 $128^2、256^2$ 和 $512^2$ 像素，以及3个宽高比为 $1 : 1, 1 : 2,$ 和 $2 : 1$ 。这些超参数不是为特定的数据集精心选择的，我们将在下一部分中提供有关其影响的消融实验。如前所述，我们的解决方案不需要图像金字塔或滤波器金字塔即可预测多个尺度的区域，从而节省了可观的运行时间。图3（右）显示了我们的方法在各种尺度和纵横比下的特征。表1显示了使用ZF网络为每个锚盒学习的平均建议大小。我们注意到，我们的算法所允许的预测大于潜在的接收场。这样的预测并非没有可能-如果只有对象的中间可见，则仍可以大致推断出对象的范围。

The anchor boxes that cross image boundaries need to be handled with care. During training, we ignore all cross-boundary anchors so they do not contribute to the loss. For a typical 1000 × 600 image, there will be roughly 20000 (≈ 60 × 40 × 9) anchors in total. With the cross-boundary anchors ignored, there are about 6000 anchors per image for training. If the boundary-crossing outliers are not ignored in training, they introduce large, difficult to correct error terms in the objective, and training does not converge. During testing, however, we still apply the fully convolutional RPN to the entire image. This may generate crossboundary proposal boxes, which we clip to the image boundary.

跨图像边界的锚盒框需要小心处理。在训练期间，我们将忽略所有跨边界锚盒，因此它们不会造成损失。对于典型的1000×600图像，总共将有大约20000（≈60×40×9）个锚盒。忽略跨边界锚盒，每个图像大约有6000个锚盒用于训练。如果在训练中不忽略跨边界的异常值，则会在目标中引入较大且难以校正的误差项，并且训练不会收敛。但是，在测试期间，我们仍将全卷积RPN应用于整个图像。这可能会生成跨边界提议框，我们会将其裁剪到图像边界。

Some RPN proposals highly overlap with each other. To reduce redundancy , we adopt non-maximum suppression (NMS) on the proposal regions based on their cls scores. We fix the IoU threshold for NMS at 0.7, which leaves us about 2000 proposal regions per image. As we will show, NMS does not harm the ultimate detection accuracy, but substantially reduces the number of proposals. After NMS, we use the top-N ranked proposal regions for detection. In the following, we train Fast R-CNN using 2000 RPN proposals, but evaluate different numbers of proposals at test-time.

一些RPN提议彼此高度重叠。为了减少冗余，我们根据提议区域的cls分数对提议区域采用非最大抑制（NMS）。我们将NMS的IoU阈值固定为0.7，这使得每个图像大约有2000个建议区域。正如我们将显示的那样，NMS不会损害最终的检测准确性，但是会大大减少提议的数量。在NMS之后，我们使用排名前N位的提议区域进行检测。在下文中，我们使用2000 RPN提议训练Fast R-CNN，但在测试时评估不同数量的提议。

4 EXPERIMENTS

4.1 Experiments on P ASCAL VOC

We comprehensively evaluate our method on the PASCAL VOC 2007 detection benchmark [11]. This dataset consists of about 5k trainval images and 5k test images over 20 object categories. We also provide results on the PASCAL VOC 2012 benchmark for a few models. For the ImageNet pre-trained network, we use the “fast” version of ZF net [32] that has 5 convolutional layers and 3 fully-connected layers, and the public VGG-16 model7[3] that has 13 convolutional layers and 3 fully-connected layers. We primarily evaluate detection mean Average Precision (mAP), because this is the actual metric for object detection (rather than focusing on object proposal proxy metrics).

我们根据PASCAL VOC 2007检测基准[11]全面评估了我们的方法。该数据集由大约20个对象类别的5k训练图像和5k测试图像组成。我们还提供了一些型号的PASCAL VOC 2012基准测试结果。对于ImageNet预训练网络，我们使用具有5个卷积层和3个完全连接层的“快速”版本的ZF net [32]，以及具有13个卷积层和3个公共VGG-16 model7 [3]。完全连接的层。我们主要评估平均检测精度（mAP），因为这是对象检测的实际指标（而不是关注对象建议代理指标）。

Table 2 (top) shows Fast R-CNN results when trained and tested using various region proposal methods. These results use the ZF net. For Selective Search (SS) [4], we generate about 2000 proposals by the “fast” mode. For EdgeBoxes (EB) [6], we generate the proposals by the default EB setting tuned for 0.7 IoU. SS has an mAP of 58.7% and EB has an mAP of 58.6% under the Fast R-CNN framework. RPN with Fast R-CNN achieves competitive results, with an mAP of 59.9% while using up to 300 proposals8. Using RPN yields a much faster detection system than using either SS or EB because of shared convolutional computations; the fewer proposals also reduce the region-wise fully-connected layers’ cost (Table 5).

表2（顶部）显示了使用各种区域建议方法进行训练和测试时的Fast R-CNN结果。这些结果使用ZF网络。对于选择性搜索（SS）[4]，我们通过“快速”模式生成了大约2000个提议。对于EdgeBoxes（EB）[6]，我们通过调整为0.7 IoU的默认EB设置生成提议。在Fast R-CNN框架下，SS的mAP为58.7％，EB的mAP为58.6％。具有Fast R-CNN的RPN获得了竞争性的结果，mAP达到59.9％，同时使用了300个提议。由于共享卷积计算，使用RPN产生的检测系统比使用SS或EB的检测系统快得多。较少的提议也降低了区域层面的全连接层的成本（表5）。

Ablation Experiments on RPN. To investigate the behavior of RPNs as a proposal method, we conducted several ablation studies. First, we show the effect of sharing convolutional layers between the RPN and Fast R-CNN detection network. To do this, we stop after the second step in the 4-step training process. Using separate networks reduces the result slightly to 58.7% (RPN+ZF, unshared, Table 2). We observe that this is because in the third step when the detectortuned features are used to fine-tune the RPN, the proposal quality is improved.

RPN上的消融实验。为了研究RPNs作为提议方法的行为，我们进行了一些消融研究。首先，我们展示了在RPN和Fast R-CNN检测网络之间共享卷积层的效果。为此，我们在4步训练过程的第二步之后停止。使用单独的网络可将结果略降至58.7％（RPN + ZF，未共享，表2）。我们观察到这是因为在第三步中，当使用检测器调整的功能来微调RPN时，建议质量得到了改善。

Next, we disentangle the RPN’s influence on training the Fast R-CNN detection network. For this purpose, we train a Fast R-CNN model by using the 2000 SS proposals and ZF net. We fix this detector and evaluate the detection mAP by changing the proposal regions used at test-time. In these ablation experiments, the RPN does not share features with the detector.

接下来，我们将解开RPN对训练Fast R-CNN检测网络的影响。为此，我们使用2000 SS提案和ZF网络训练了快速R-CNN模型。我们修复此检测器，并通过更改测试时使用的建议区域来评估检测mAP。在这些消融实验中，RPN不与检测器共享特征。

Replacing SS with 300 RPN proposals at test-time leads to an mAP of 56.8%. The loss in mAP is because of the inconsistency between the training/testing proposals. This result serves as the baseline for the following comparisons.

在测试时用300个RPN提议替换SS导致的mAP为56.8％。 mAP的损失是由于训练/测试提议之间的不一致。该结果用作以下比较的基准。

Somewhat surprisingly, the RPN still leads to a competitive result (55.1%) when using the top-ranked 100 proposals at test-time, indicating that the topranked RPN proposals are accurate. On the other extreme, using the top-ranked 6000 RPN proposals (without NMS) has a comparable mAP (55.2%), suggesting NMS does not harm the detection mAP and may reduce false alarms.

令人惊讶的是，当在测试时使用排名靠前的100个提议时，RPN仍可产生竞争性结果（55.1％），这表明排名靠前的RPN提议是准确的。另一方面，使用排名靠前的6000 RPN提议（不使用NMS）具有可比的mAP（55.2％），这表明NMS不会损害检测mAP，并且可以减少误报。

Next, we separately investigate the roles of RPN’s cls and reg outputs by turning off either of them at test-time. When the cls layer is removed at testtime (thus no NMS/ranking is used), we randomly sample N proposals from the unscored regions. The mAP is nearly unchanged with N = 1000 (55.8%), but degrades considerably to 44.6% when N = 100. This shows that the cls scores account for the accuracy of the highest ranked proposals.

接下来，我们通过在测试时关闭RPN的cls和reg输出中的任何一个来分别研究它们的作用。当在测试时删除cls层时（因此不使用NMS /排名），我们从未计分的区域中随机抽取了N个提议。当N = 1000（55.8％）时，mAP几乎没有变化，但是当N = 100时，mAP会大幅下降至44.6％。这表明cls分数说明了排名最高的提议的准确性。

On the other hand, when the reg layer is removed at test-time (so the proposals become anchor boxes), the mAP drops to 52.1%. This suggests that the highquality proposals are mainly due to the regressed box bounds. The anchor boxes, though having multiple scales and aspect ratios, are not sufficient for accurate detection.

另一方面，当在测试时删除reg层（因此建议成为锚定框）时，mAP下降至52.1％。这表明高质量的提议主要是由于回归框的边界。尽管锚盒具有多个尺度和纵横比，但不足以进行精确检测。

图5：使用Faster R-CNN系统在PASCAL VOC 2007测试集上检测到的对象检测结果的示例。该模型为VGG-16，训练数据为07 + 12 trainval（2007年测试集的mAP为73.2％）。我们的方法可以检测各种尺度和宽高比的对象。每个输出框都与类别标签和[0,1]中的softmax得分相关联。得分阈值0.6用于显示这些图像。包括所有步骤，获得这些结果的运行时间为每张图像198ms。

We also evaluate the effects of more powerful networks on the proposal quality of RPN alone. We use VGG-16 to train the RPN, and still use the above detector of SS+ZF. The mAP improves from 56.8% (using RPN+ZF) to 59.2% (using RPN+VGG). This is a promising result, because it suggests that the proposal quality of RPN+VGG is better than that of RPN+ZF. Because proposals of RPN+ZF are competitive with SS (both are 58.7% when consistently used for training and testing), we may expect RPN+VGG to be better than SS. The following experiments justify this hypothesis.

我们还评估了更强大的网络对RPN提案质量的影响。我们使用VGG-16训练RPN，但仍使用上述SS + ZF检测器。 mAP从56.8％（使用RPN + ZF）提高到59.2％（使用RPN + VGG）。这是一个令人鼓舞的结果，因为它表明RPN + VGG的建议质量优于RPN + ZF的建议质量。由于RPN + ZF的提议与SS竞争（当一贯用于训练和测试时，两者的比例均为58.7％），因此我们可以预期RPN + VGG会比SS更好。以下实验证明了这一假设。

Performance of VGG-16. Table 3 shows the results of VGG-16 for both proposal and detection. Using RPN+VGG, the result is 68.5% for unshared features, slightly higher than the SS baseline. As shown above, this is because the proposals generated by RPN+VGG are more accurate than SS. Unlike SS that is predefined, the RPN is actively trained and benefits from better networks. For the feature-shared variant, the result is 69.9%—better than the strong SS baseline, yet with nearly cost-free proposals. We further train the RPN and detection network on the union set of PASCAL VOC 2007 trainval and 2012 trainval. The mAP is 73.2%. Figure 5 shows some results on the PASCAL VOC 2007 test set. On the PASCAL VOC 2012 test set (Table 4), our method has an mAP of 70.4% trained on the union set of VOC 2007 trainval+test and VOC 2012 trainval. Table 6 and Table 7 show the detailed numbers.

VGG-16的性能。表3列出了建议和检测的VGG-16结果。使用RPN + VGG，未共享功能的结果为68.5％，略高于SS基准。如上所示，这是因为RPN + VGG生成的提议比SS更准确。与预定义的SS不同，RPN受到了积极的训练，并从更好的网络中受益。对于特征共享的变体，结果为69.9％，比强大的SS基准要好，但几乎没有成本。我们将在PASCAL VOC 2007和2012的联合训练集上进一步训练RPN和检测网络。最低目标为73.2％。图5显示了PASCAL VOC 2007测试集上的一些结果。在PASCAL VOC 2012测试集（表4）上，我们的方法在VOC 2007 trainval + test和VOC 2012 trainval的并集上训练的mAP为70.4％。表6和表7显示了详细数字。

In Table 5 we summarize the running time of the entire object detection system. SS takes 1-2 seconds depending on content (on average about 1.5s), and Fast R-CNN with VGG-16 takes 320ms on 2000 SS proposals (or 223ms if using SVD on fully-connected layers [2]). Our system with VGG-16 takes in total 198ms for both proposal and detection. With the convolutional features shared, the RPN alone only takes 10ms computing the additional layers. Our regionwise computation is also lower, thanks to fewer proposals (300 per image). Our system has a frame-rate of 17 fps with the ZF net.

在表5中，我们总结了整个对象检测系统的运行时间。 SS取决于内容需要1-2秒（平均约1.5秒），而带有VGG-16的Fast R-CNN在2000个SS提案中需要320毫秒（如果在完全连接的层上使用SVD则需要223毫秒[2]）。我们的带有VGG-16的系统用于提议和检测总共需要198毫秒。共享卷积特征后，仅RPN只需10毫秒即可计算出附加层。由于更少的提议（每个图像300个），我们的区域计算也降低了。我们的系统采用ZF网络的帧率为17 fps。

Sensitivities to Hyper-parameters. In Table 8 we investigate the settings of anchors. By default we use 3 scales and 3 aspect ratios (69.9% mAP in Table 8). If using just one anchor at each position, the mAP drops by a considerable margin of 3-4%. The mAP is higher if using 3 scales (with 1 aspect ratio) or 3 aspect ratios (with 1 scale), demonstrating that using anchors of multiple sizes as the regression references is an effective solution. Using just 3 scales with 1 aspect ratio (69.8%) is as good as using 3 scales with 3 aspect ratios on this dataset, suggesting that scales and aspect ratios are not disentangled dimensions for the detection accuracy . But we still adopt these two dimensions in our designs to keep our system flexible.

对超参数的敏感性。在表8中，我们调查了锚盒的设置。默认情况下，我们使用3个尺度和3个纵横比（表8中的69.9％mAP）。如果在每个位置仅使用一个锚盒，则mAP会下降3-4％。如果使用3个尺度（具有1个纵横比）或3个纵横比（具有1个尺度），则mAP更高，表明使用多个尺寸的锚盒作为回归参考是有效的解决方案。在此数据集上仅使用3个尺度1个高宽比的（69.8％）与在此数据集上使用3个尺度3个宽高比一样好，这表明尺度和长宽比不能区分尺寸以提高检测精度。但是我们仍然在设计中采用这两个维度，以保持系统的灵活性。

In Table 9 we compare different values of λ in Equation (1). By default we use λ = 10 which makes the two terms in Equation (1) roughly equally weighted after normalization. Table 9 shows that our result is impacted just marginally (by ∼ 1%) when λ is within a scale of about two orders of magnitude (1 to 100). This demonstrates that the result is insensitive to λ in a wide range.

在表9中，我们比较了式（1）中不同的λ值。默认情况下，我们使用λ= 10，这使等式（1）中的两项在归一化后具有大致相等的加权。表9表明，当λ在大约两个数量级（1到100）的范围内时，我们的结果仅受到很小的影响（约1％）。这表明结果在很宽的范围内对λ不敏感。

Analysis of Recall-to-IoU. Next we compute the recall of proposals at different IoU ratios with groundtruth boxes. It is noteworthy that the Recall-to-IoU metric is just loosely [19], [20], [21] related to the ultimate detection accuracy . It is more appropriate to use this metric to diagnose the proposal method than to evaluate it.

召回IoU的分析。接下来，我们使用真实标注框计算不同IoU比下提议的召回率。值得注意的是，Recall-to-IoU指标与最终检测精度只是松散的[19]，[20]，[21]。使用此指标来诊断提议方法比评估提议更合适。

In Figure 4, we show the results of using 300, 1000, and 2000 proposals. We compare with SS and EB, and the N proposals are the top-N ranked ones based on the confidence generated by these methods. The plots show that the RPN method behaves gracefully when the number of proposals drops from 2000 to 300. This explains why the RPN has a good ultimate detection mAP when using as few as 300 proposals. As we analyzed before, this property is mainly attributed to the cls term of the RPN. The recall of SS and EB drops more quickly than RPN when the proposals are fewer.

在图4中，我们显示了使用300、1000和2000个提议的结果。我们将它们与SS和EB进行比较，根据这些方法所产生的置信度，N个提议是排名前N位的提议。这些图表明，当提议数量从2000个下降到300个时，RPN方法表现得很不错。这解释了为什么当使用300个提议时RPN具有良好的最终检测mAP。正如我们之前分析的那样，此属性主要归因于RPN的cls项。当提议减少时，SS和EB的召回率比RPN下降得更快。

One-Stage Detection vs. Two-Stage Proposal + Detection. The OverFeat paper [9] proposes a detection method that uses regressors and classifiers on sliding windows over convolutional feature maps. OverFeat is a one-stage, class-specific detection pipeline, and ours is a two-stage cascade consisting of class-agnostic proposals and class-specific detections. In OverFeat, the region-wise features come from a sliding window of one aspect ratio over a scale pyramid. These features are used to simultaneously determine the location and category of objects. In RPN, the features are from square (3×3) sliding windows and predict proposals relative to anchors with different scales and aspect ratios. Though both methods use sliding windows, the region proposal task is only the first stage of Faster RCNN—the downstream Fast R-CNN detector attends to the proposals to refine them. In the second stage of our cascade, the region-wise features are adaptively pooled [1], [2] from proposal boxes that more faithfully cover the features of the regions. We believe these features lead to more accurate detections.

一阶段检测与两阶段建议+检测。 OverFeat论文[9]提出了一种在卷积特征图上的滑动窗口上使用回归器和分类器的检测方法。 OverFeat是一阶段的，特定于类的检测管道，而我们的是一个两阶段的级联，包括与类无关的提议和特定于类的检测。在OverFeat中，区域特征来自尺度金字塔上一个长宽比的滑动窗口。这些特征用于同时确定对象的位置和类别。在RPN中，特征来自正方形（3×3）滑动窗口，并预测相对于具有不同比例和纵横比的锚盒的提议。尽管两种方法都使用滑动窗口，但是区域提议任务只是Faster RCNN的第一阶段-下游Fast R-CNN检测器会处理提议以对其进行完善。在级联的第二阶段，从提议框中自适应地池化区域范围内的特征[1]，[2]，该提议框中将更真实地覆盖区域特征。我们相信这些特征可导致更准确的检测。

To compare the one-stage and two-stage systems, we emulate the OverFeat system (and thus also circumvent other differences of implementation details) by one-stage Fast R-CNN. In this system, the “proposals” are dense sliding windows of 3 scales (128, 256, 512) and 3 aspect ratios (1:1, 1:2, 2:1). Fast R-CNN is trained to predict class-specific scores and regress box locations from these sliding windows. Because the OverFeat system adopts an image pyramid, we also evaluate using convolutional features extracted from 5 scales. We use those 5 scales as in [1], [2].

为了比较一阶段系统和两阶段系统，我们通过一阶段Fast R-CNN模拟OverFeat系统（从而也避免了实现细节的其他差异）。在该系统中，“提议”是3个尺度（128、256、512）和3个纵横比（1：1、1：2、2：1）的密集滑动窗口。Fast R-CNN经过训练可以从这些滑动窗口预测特定类的分数和回归框的位置。由于OverFeat系统采用图像金字塔，因此我们还使用从5个尺度提取的卷积特征进行评估。我们在[1]，[2]中使用这5个尺度。

Table 10 compares the two-stage system and two variants of the one-stage system. Using the ZF model, the one-stage system has an mAP of 53.9%. This is lower than the two-stage system (58.7%) by 4.8%. This experiment justifies the effectiveness of cascaded region proposals and object detection. Similar observations are reported in [2], [39], where replacing SS region proposals with sliding windows leads to ∼6% degradation in both papers. We also note that the onestage system is slower as it has considerably more proposals to process.

表10比较了两阶段系统和一阶段系统的两个变体。使用ZF模型，一阶段系统的mAP为53.9％。这比两阶段系统（58.7％）低4.8％。该实验证明了级联区域提议和对象检测的有效性。在[2]，[39]中报道了类似的观察结果，其中用滑动窗口替换SS区域建议导致两篇论文的6％下降。我们还注意到，单阶段系统速度较慢，因为它要处理的提议要多得多。

4.2 Experiments on MS COCO

We present more results on the Microsoft COCO object detection dataset [12]. This dataset involves 80 object categories. We experiment with the 80k images on the training set, 40k images on the validation set, and 20k images on the test-dev set. We evaluate the mAP averaged for IoU ∈ [0.5 : 0.05 : 0.95] (COCO’s standard metric, simply denoted as mAP@[.5, .95]) and [email protected] (PASCAL VOC’s metric).

我们在Microsoft COCO对象检测数据集上提供了更多结果[12]。该数据集涉及80个对象类别。我们在训练集上测试了80k张图像，在验证集上测试了40k张图像，在测试开发集上测试了20k张图像。我们评估了IoU∈[0.5：0.05：0.95]（COCO的标准度量标准，简称为mAP @ [.5，.95]）和[email protected]（PASCAL VOC度量标准）的平均mAP。

There are a few minor changes of our system made for this dataset. We train our models on an 8-GPU implementation, and the effective mini-batch size becomes 8 for RPN (1 per GPU) and 16 for Fast R-CNN (2 per GPU). The RPN step and Fast R-CNN step are both trained for 240k iterations with a learning rate of 0.003 and then for 80k iterations with 0.0003. We modify the learning rates (starting with 0.003 instead of 0.001) because the mini-batch size is changed. For the anchors, we use 3 aspect ratios and 4 scales (adding 642), mainly motivated by handling small objects on this dataset. In addition, in our Fast R-CNN step, the negative samples are defined as those with a maximum IoU with ground truth in the interval of [0,0.5), instead of [0.1,0.5) used in [1], [2]. We note that in the SPPnet system [1], the negative samples in [0.1,0.5) are used for network fine-tuning, but the negative samples in [0,0.5) are still visited in the SVM step with hard-negative mining. But the Fast R-CNN system [2] abandons the SVM step, so the negative samples in [0,0.1) are never visited. Including these [0,0.1) samples improves [email protected] on the COCO dataset for both Fast R-CNN and Faster R-CNN systems (but the impact is negligible on PASCAL VOC).

我们针对此数据集对系统进行了一些细微更改。我们在8-GPU实施上训练模型，有效的最小批量大小对于RPN变为8（每个GPU 1个），对于Fast R-CNN变为16（每个GPU 2个）。 RPN步骤和Fast R-CNN步骤都经过240k迭代训练，学习率为0.003，然后经过80k迭代训练为0.0003。我们更改了学习率（从0.003而不是0.001开始），因为小批量处理大小已更改。对于锚盒，我们使用3个宽高比和4个尺度（添加642），主要是通过处理此数据集上的小对象来实现的。另外，在我们的Fast R-CNN步骤中，将负样本定义为最大IoU且在[0,0.5）区间内具有真实标注的样本，而不是[1]，[2]中使用的[0.1,0.5） ]。我们注意到在SPPnet系统[1]中，[0.1,0.5）中的负样本用于网络微调，但在[0,0.5）中的负样本仍在SVM步骤中通过硬负例挖掘进行访问。但是Fast R-CNN系统[2]放弃了SVM步骤，因此[0,0.1）中的负样本从未被访问过。对于Fast R-CNN和Faster R-CNN系统，包括这些[0,0.1）样本可提高COCO数据集上的[email protected]（但对PASCAL VOC的影响可以忽略不计）。

The rest of the implementation details are the same as on PASCAL VOC. In particular, we keep using 300 proposals and single-scale (s = 600) testing. The testing time is still about 200ms per image on the COCO dataset.

其余的实现细节与PASCAL VOC上的相同。特别是，我们一直使用300个提议和单尺度（s = 600）测试。在COCO数据集上，每个图像的测试时间仍然约为200毫秒。

In Table 11 we first report the results of the Fast R-CNN system [2] using the implementation in this paper. Our Fast R-CNN baseline has 39.3% [email protected] on the test-dev set, higher than that reported in [2]. We conjecture that the reason for this gap is mainly due to the definition of the negative samples and also the changes of the mini-batch sizes. We also note that the mAP@[.5, .95] is just comparable.

在表11中，我们首先使用本文中的实现报告了Fast R-CNN系统的结果[2]。我们的Fast R-CNN基线在测试开发集上具有39.3％的[email protected]，高于[2]中的报告。我们推测造成这种差距的原因主要是由于负例的定义以及小批量大小的变化。我们还注意到，mAP @ [.5，.95]只是可比的。

Next we evaluate our Faster R-CNN system. Using the COCO training set to train, Faster R-CNN has 42.1% [email protected] and 21.5% mAP@[.5, .95] on the COCO test-dev set. This is 2.8% higher for [email protected] and 2.2% higher for mAP@[.5, .95] than the Fast RCNN counterpart under the same protocol (Table 11). This indicates that RPN performs excellent for improving the localization accuracy at higher IoU thresholds. Using the COCO trainval set to train, Faster RCNN has 42.7% [email protected] and 21.9% mAP@[.5, .95] on the COCO test-dev set. Figure 6 shows some results on the MS COCO test-dev set.

接下来，我们评估我们的Faster R-CNN系统。使用COCO训练集进行训练，Faster R-CNN在COCO测试开发集上具有42.1％的[email protected]和21.5％的mAP @ [.5，.95]。在相同的协议下，与Fast RCNN相比，mAP @ 0.5的速度高出2.8％，mAP @ [.5，.95]的速度高出2.2％（表11）。这表明RPN在提高IoU阈值下的定位精度方面表现出色。使用COCO trainval集进行训练，Faster RCNN在COCO测试开发集上具有42.7％的[email protected]和21.9％的mAP@ [.5，.95]。图6显示了MS COCO测试开发集上的一些结果。

Faster R-CNN in ILSVRC & COCO 2015 competitions We have demonstrated that Faster R-CNN benefits more from better features, thanks to the fact that the RPN completely learns to propose regions by neural networks. This observation is still valid even when one increases the depth substantially to over 100 layers [18]. Only by replacing VGG-16 with a 101layer residual net (ResNet-101) [18], the Faster R-CNN system increases the mAP from 41.5%/21.2% (VGG16) to 48.4%/27.2% (ResNet-101) on the COCO val set. With other improvements orthogonal to Faster RCNN, He et al. [18] obtained a single-model result of 55.7%/34.9% and an ensemble result of 59.0%/37.4% on the COCO test-dev set, which won the 1st place in the COCO 2015 object detection competition. The same system [18] also won the 1st place in the ILSVRC 2015 object detection competition, surpassing the second place by absolute 8.5%. RPN is also a building block of the 1st-place winning entries in ILSVRC 2015 localization and COCO 2015 segmentation competitions, for which the details are available in [18] and [15] respectively

Faster R-CNN在ILSVRC和COCO 2015竞赛中获胜已经证明，Faster R-CNN受益于更好的特征，这是由于RPN完全学会了通过神经网络来提议区域。即使将深度大大增加到100层以上，这种观察仍然有效[18]。仅通过用101层残留网（ResNet-101）替换VGG-16 [18]，Faster R-CNN系统就可以将mAP从41.5％/ 21.2％（VGG16）提高到48.4％/ 27.2％（ResNet-101） COCO val set。 He等人[18]提出了与Faster RCNN正交的其他改进。在COCO测试开发集上获得了55.7％/ 34.9％的单模型结果和59.0％/ 37.4％的整体结果，在COCO 2015目标检测比赛中获得了第一名。同样的系统[18]在ILSVRC 2015目标检测比赛中也获得了第一名，以绝对8.5％的优势超过了第二名。 RPN还是ILSVRC 2015本地化和COCO 2015分割比赛第一名获奖作品的基础，其详细信息分别在[18]和[15]中提供。

Large-scale data is of crucial importance for improving deep neural networks. Next, we investigate how the MS COCO dataset can help with the detection performance on PASCAL VOC.

大规模数据对于改善深度神经网络至关重要。接下来，我们研究MS COCO数据集如何帮助提高PASCAL VOC的检测性能。

As a simple baseline, we directly evaluate the COCO detection model on the PASCAL VOC dataset, without fine-tuning on any PASCAL VOC data. This evaluation is possible because the categories on COCO are a superset of those on PASCAL VOC. The categories that are exclusive on COCO are ignored in this experiment, and the softmax layer is performed only on the 20 categories plus background. The mAP under this setting is 76.1% on the PASCAL VOC 2007 test set (Table 12). This result is better than that trained on VOC07+12 (73.2%) by a good margin, even though the PASCAL VOC data are not exploited.

作为一个简单的基准，我们直接在PASCAL VOC数据集上评估COCO检测模型，而无需对任何PASCAL VOC数据进行微调。该评估是可行的，因为COCO上的类别是PASCAL VOC上的类别的超集（superset）。在此实验中，COCO上排他的类别将被忽略，并且仅在20个类别以及背景上执行softmax层。在PASCAL VOC 2007测试集上，此设置下的mAP为76.1％（表12）。即使未使用PASCAL VOC数据，此结果也比在VOC07 + 12上训练的结果（73.2％）更好。

Then we fine-tune the COCO detection model on the VOC dataset. In this experiment, the COCO model is in place of the ImageNet-pre-trained model (that is used to initialize the network weights), and the Faster R-CNN system is fine-tuned as described in Section 3.2. Doing so leads to 78.8% mAP on the PASCAL VOC 2007 test set. The extra data from the COCO set increases the mAP by 5.6%. Table 6 shows that the model trained on COCO+VOC has the best AP for every individual category on PASCAL VOC 2007. Similar improvements are observed on the PASCAL VOC 2012 test set (Table 12 and Table 7). We note that the test-time speed of obtaining these strong results is still about 200ms per image.

然后，我们在VOC数据集上微调COCO检测模型。在此实验中，COCO模型代替了ImageNet预训练的模型（该模型用于初始化网络权重），并且如3.2节中所述对Faster R-CNN系统进行了微调。这样做会导致PASCAL VOC 2007测试集的mAP达到78.8％。来自COCO集的额外数据将mAP提高了5.6％。表6显示，在PASCAL VOC 2007上，使用COCO + VOC训练的模型具有针对每个类别的最佳AP。在PASCAL VOC 2012测试集上也观察到了类似的改进（表12和表7）。我们注意到，获得这些强大结果的测试时间速度仍然约为每张图像200毫秒。

5 CONCLUSION

We have presented RPNs for efficient and accurate region proposal generation. By sharing convolutional features with the down-stream detection network, the region proposal step is nearly cost-free. Our method enables a unified, deep-learning-based object detection system to run at near real-time frame rates. The learned RPN also improves region proposal quality and thus the overall object detection accuracy.

我们已经提出了有效而准确的区域提议生成的RPN。通过与下游检测网络共享卷积特征，区域提议步骤几乎是免费的。我们的方法使统一的，基于深度学习的对象检测系统能够以接近实时的帧速率运行。所学习的RPN还提高了区域提议质量，从而提高了总体目标检测精度。

你可能感兴趣的:(目标检测（Object,Detection）,论文学习（Paper）,计算机视觉,目标检测,Faster,RCNN)

目标检测YOLO实战应用案例100讲-基于毫米波雷达与摄像头协同的道路目标检测与识别（续）林聪木目标检测 YOLO 人工智能
目录3.2实测数据采集与分析3.2.1回波数据处理3.2.2毫米波雷达数据采集实验3.3基于传统图像特征的目标识别算法3.3.1基于灰度共生矩阵的时频图特征提取3.3.2支持向量机分类器3.3.3实验及结果分析3.4基于卷积神经网络的目标识别算法3.4.1卷积神经网络的基本理论3.4.2卷积神经网络框架设计3.4.3实验及结果分析基于图像的目标检测算法4.1目标检测算法一般流程4.2典型目标检测算
计算机视觉毕业设计选题推荐：选题技巧建议收藏 HaiLang_IT 毕业设计人工智能计算机视觉
目录前言毕设选题开题指导建议更多精选选题选题帮助最后前言大家好,这里是海浪学长毕设专题!大四是整个大学期间最忙碌的时光，一边要忙着准备考研、考公、考教资或者实习为毕业后面临的升学就业做准备,一边要为毕业设计耗费大量精力。学长给大家整理了人工智能专业最新精选选题，如遇选题困难或选题有任何疑问，都可以问学长哦(见文末)!对毕设有任何疑问都可以问学长哦!更多选题指导:最新最全计算机专业毕设选题精选推荐汇
Python 的 ultralytics 库详解白.夜人工智能
ultralytics是一个专注于计算机视觉任务的Python库，尤其以YOLO（YouOnlyLookOnce）系列模型为核心，提供了简单易用的接口，支持目标检测、实例分割、姿态估计等任务。本文将详细介绍ultralytics库的功能、安装方法、核心模块以及使用示例。1.ultralytics库简介ultralytics库由Ultralytics团队开发，旨在为YOLO系列模型提供高效、灵活且易
使用bat批量获取WORD中包含对应字符的段落，段落使用回车换行宇宙无敌花心大萝卜批处理文档处理 word 开发语言 bat 批处理 VBS
get_word_paragraphs.vbs'获取命令行参数IfWScript.Arguments.Count=0ThenWScript.Quit1EndIf'获取Word文档路径docPath=WScript.Arguments(0)'创建Word应用程序对象SetobjWord=CreateObject("Word.Application")objWord.Visible=False'打开W
书籍-《动手学深度学习（英文版）》
书籍：DiveintoDeepLearning作者：AstonZhang，ZacharyC.Lipton，MuLi，AlexanderJ.Smola出版：CambridgeUniversityPress编辑：陈萍萍的公主@一点人工一点智能下载：书籍下载-《动手学深度学习（英文版）》01书籍介绍深度学习已经彻底改变了模式识别，为计算机视觉、自然语言处理和自动语音识别等领域提供了强大的工具。应用深度学
CocoaPods 私有库创建 sanjieshenwu1987 iOS 私有仓库
总结流程和pod指令，以及自己操作遇到的问题。参考文章iOS组件化-基础iOS组件化-项目组件化Swift/Objective-C-使用Cocoapods创建/管理私有库（初中级用法）Swift/Objective-C-使用Cocoapods创建/管理私有库（高级用法）文章目录参考文章创建自己的私有库1、创建私有SpecRepo2、创建组件库3、提交组件库3.1验证本地库3.2提交到git3.3将
在Qt代码中使用Windows事件机制WaitForMultipleObjects、SetEvent 令狐掌门深入浅出C++Qt开发技术 qt windows 开发语言 SetEvent
在Qt开发客户端时，经常使用信号槽来处理控件或窗口之间的事件，如果是Windows系统，也可以用windows事件来代替Qt的信号槽，本篇博客来介绍这种用法。首先需要介绍本篇本篇博客需要用到的几个WindowsAPI:一、需要用到的WindowsAPIWaitForMultipleObjectsWaitForMultipleObjects是WindowsAPI中一个用于同步操作的函数。它主要用于等
Opencv计算机视觉编程攻略-第一节图像读取与基本处理 weixin_44242403 深度学习 opencv 计算机视觉
1.图像读取导入依赖项的h文件#include#include#include#include项目Valuecore.hpp基础数据结构和操作（图像存储、矩阵运算、文件I/O）highgui.hpp图像显示、窗口管理、用户交互（图像/视频显示、用户输入处理、结果保存）imgproc.hpp图像处理算法（图像滤波、几何变换、边缘检测、形态学操作）二读取图片Matimage;//图像矩阵std::co
uniapp中使用webview并与原页面通信数学分析分析什么？ uni-app
uniapp中使用webview并与原页面通信1.接收数据主要使用@message与@onPostMessage接收原页面数据，且两个方法只能在APP中使用，其他平台均不支持。/***接收页面返回参数*@param{Object}item*/htmlMessage(item){console.log('收到的消息',item)letdata=item.detail...},2.发送数据（调用原页面
uniapp报错 Right-hand side of ‘instanceof‘ is not an object 学无止境鸭 uni-app 前端 javascript
vue3使用vue2的插件组件时,会报这个错,原因是vue2组件内部的props接收属性时的type类型要写成[Srting,Number]数组的形式,而不是'String|Number'如图所示;
Java调本地接口重定向唯他命 java json 开发语言
/***通用接口*/@PostMapping("/deviceToService/up")@ApiOperationSupport(order=1)@ApiOperation(value="通用接口",notes="传入requestDTO")publicObjectdetail(@RequestBodyRequestDTOrequestDTO){Instructionsdetail=instru
LuaJIT 学习（5）—— string.buffer 库 alenliu0621 Lua LuaJIT
文章目录UsingtheStringBufferLibraryBufferObjectsBufferMethodOverviewBufferCreationandManagement`localbuf=buffer.new([size[,options]])localbuf=buffer.new([options])``buf=buf:reset()``buf=buf:free()`BufferW
Halcon 和 opencv比有什么区别与优劣 yuanpan opencv 人工智能计算机视觉
Halcon和OpenCV都是机器视觉领域的重要工具，但它们的设计目标、功能特点和适用场景有所不同。以下是两者的详细对比：1.定位与目标用户Halcon：定位：商业机器视觉软件，专注于工业应用。目标用户：工业自动化、质量控制、机器人引导等领域的专业开发者。OpenCV：定位：开源计算机视觉库，适用于通用图像处理和计算机视觉任务。目标用户：学术研究、教育、初创公司以及需要低成本解决方案的开发者。2.
【产品小白】什么是AI产品经理百事不可口y 产品经理的一步一步人工智能产品经理学习产品运营内容运营用户运营
一、AI产品经理的定义与角色定位AI产品经理是人工智能技术与商业应用之间的核心桥梁，负责将复杂的AI技术转化为满足市场需求的产品。需同时具备技术理解力、商业洞察力和用户思维，既要参与算法选型与数据建模，又要定义产品功能与市场策略，是贯穿产品全生命周期的关键角色。与传统互联网产品经理相比，AI产品经理的独特之处在于：技术深度参与：需理解机器学习、自然语言处理（NLP）、计算机视觉等技术原理，并参与数
Python 的 ORM（Object-Relational Mapping）工具浅讲 Code_Geo python 开发语言
SQLAlchemy相关讲解1.SQLAlchemy是什么？定义：一个Python的ORM（Object-RelationalMapping）工具，允许开发者通过Python类与对象操作数据库，而非直接编写SQL。核心组件：Core：底层SQL表达式语言，提供数据库无关的SQL操作接口。ORM：基于Core的高层抽象，将数据库表映射为Python类（模型），记录映射为对象。适用场景：需要灵活操作数
82.RadioButton的选中处理逻辑 C#例子 WPF例子军训猫猫头 c#开发语言 wpf
privatevoidRadioButton_Click(objectsender,RoutedEventArgse){//确保sender是RadioButton类型if(senderisRadioButtonradioButton&&radioButton.IsChecked==true){//获取RadioButton的内容if(radioButton.Contentisstringcont
类的创建以及类的继承及其应用对象烈焰猩猩 python
类的创建以及类的继承及其应用场景一,类的创建格式:格式1:class类名:pass格式2:class类名():pass格式3:#class类名(父类名):class类名(object):pass案例:案例需求定义老师类.实现思路定义老师类(三种方式).函数内容.创建该类对象.打印该类对象.#1.定义老师类(三种方式).#classTeacher:#classTeacher():classTeach
使用TensorFlow、OpenCV和Pygame实现图像处理与游戏开发 UwoiGit tensorflow opencv pygame
在本篇文章中，我们将介绍如何结合使用TensorFlow、OpenCV和Pygame来进行图像处理和游戏开发。这三个工具在机器学习、计算机视觉和游戏开发领域都非常流行，并且它们的结合可以提供强大的功能和无限的创造力。我们将逐步介绍如何安装和配置这些工具，并提供相关的源代码示例。安装TensorFlowTensorFlow是一个基于数据流图的开源机器学习框架，提供了丰富的工具和库来构建和训练各种深度
医图论文 CVPR‘24 | 适应医学图像中泛化异常检测的视觉-语言模型小白学视觉医学图像处理论文解读语言模型人工智能计算机视觉医学图像顶会医学图像处理 CVPR 论文解读
论文信息题目：AdaptingVisual-LanguageModelsforGeneralizableAnomalyDetectioninMedicalImages适应医学图像中泛化异常检测的视觉-语言模型作者：ChaoqinHuang，AofanJiang，JinghaoFeng，YaZhang，XinchaoWang，YanfengWang源码：https://github.com/Medi
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算【超级详细版】 AI筑梦师计算机视觉算法深度学习人工智能机器学习计算机视觉 python
智能形状匹配技术全解析：从经典算法到深度学习与神经形态计算1.引言1.1研究背景在计算机视觉、模式识别、医学影像分析和自动驾驶等领域，形状匹配是核心任务之一。然而，现实世界的形状往往存在可变性（Variability），主要体现在以下几个方面：形变（Deformation）：物体可能由于柔性材料、外力作用或生物运动发生非刚性形变。尺度变化（ScaleVariation）：目标形状在不同场景下可能大
基于PyTorch和ResNet18的花卉识别实战（附完整代码）意.远 pytorch 人工智能 python 深度学习
一、项目背景与效果花卉分类是计算机视觉的经典任务。本文使用PyTorch框架，基于ResNet18模型实现了102种花卉的分类任务。完整代码可直接复制运行，最终验证集准确率达8.2%，文中同步分析性能瓶颈与优化方案。二、环境配置与数据准备1.环境要求#主要依赖库importtorchfromtorchimportnn,optimfromtorchvisionimporttransforms,dat
前端 Blob 详解 yqcoder 前端 javascript 开发语言
前端Blob详解1.什么是Blob？Blob（BinaryLargeObject）表示二进制大对象，用于存储二进制数据。在前端开发中，Blob常用于处理文件、图像、视频等二进制数据。2.创建Blob可以通过Blob构造函数创建Blob对象。constblob=newBlob(array,options);array:数组，包含要放入Blob的数据。可以是字符串、ArrayBuffer、ArrayB
【论文精读】PatchTST-基于分块及通道独立机制的Transformer模型打酱油的葫芦娃时序预测算法时序预测 PatchTST Transformer 预训练微调表征学习
《ATIMESERIESISWORTH64WORDS:LONG-TERMFORECASTINGWITHTRANSFORMERS》的作者团队来自PrincetonUniversity和IBMResearch，发表在ICLR2023会议上。动机Transformer模型因其自注意力机制在处理序列数据方面的优势，在自然语言处理（NLP）、计算机视觉（CV）、语音等多个领域取得了巨大成功。这种机制使得模型
智能小程序 Ray 开发界面 API —— 交互 API 合集 IoT砖家涂拉拉前端 javascript 开发语言小程序 API SDK 物联网
showModal显示模态对话框引入import{showModal}from'@ray-js/ray';需引入BaseKit，且在>=1.2.10版本才可使用参数Objectobject属性类型默认值必填说明titlestring是提示的标题contentstring否提示的内容showCancelboolean否是否显示取消按钮cancelTextstring否取消按钮的文字，最多4个字符ca
Objective-C语言的网络编程俞嫦曦包罗万象 golang 开发语言后端
Objective-C语言中的网络编程引言Objective-C是一种面向对象的编程语言，广泛应用于iOS和macOS应用程序的开发。随着移动互联网的快速发展，网络编程成为了现代应用程序开发中不可或缺的一部分。无论是从服务器获取数据、上传文件，还是实现实时通信，网络编程都扮演着至关重要的角色。本文将深入探讨Objective-C语言中的网络编程，涵盖从基础的网络请求到高级的异步处理、安全通信等内容
uniapp 实现微信小程序电影选座功能鹤鸣的日常 uni-app 微信小程序 javascript vue.js 前端小程序
拖动代码/***获取点击或触摸事件对应的座位位置*通过事件对象获取座位的行列信息*@param{Event|TouchEvent}event-点击或触摸事件对象*@returns{Object}返回座位位置对象，包含行(row)和列(col)信息，若未找到有效位置则返回{row:-1,col:-1}*/getSeatPosition(event){//统一处理触摸事件和点击事件//触摸事件时从to
图形编辑器基于Paper.js教程25：材料测试矩阵功能的实现拿我格子衫来激光切割图形编辑器 Paper.js 矩阵线性代数图像处理 javascript 编辑器前端
最近做了一个材料测试矩阵的需求，现在已经上线了，现在来回顾总结一下，有哪些做的好的，有哪些做的不好的。材料测试矩阵在测试激光头在某一种材料上的表现，很有必要，如果你在一种新的材料上进行加工时，最好先做一次材料测试矩阵，挑选出合适的功率和速度。材料测试矩阵的表单比较多横坐标是功率，纵坐标是速度。最终雕刻效果是会把雕刻的木板切割下来。整个表单需要设置，雕刻模式还是切割模式，然后设置最小最大速度，最小最
tkinter报错 tcl和tk报错 _tkinter.TclError: Can‘t find a usable init.tcl in the following directories: 大博士.J java 数据库 python
问了好几个GPT回答的都不是解决问题的，胡编乱造的目前经过尝试好几个解决方案，终于破案了win10系统使用安装python时自动将tcl和tk识别到了新创建的虚拟环境继承中win11系统则需要手动去做一些操作，才可以解决问题我这报错的问题是这样的self.tk=_tkinter.create(screenName,baseName,className,interactive,wantobjects
简单的防止Windows自动锁屏的VBS脚本 snowaterr batch 开发语言
文件后缀名从txt更改为vbs，双击可直接运行原理是通过脚本每隔一段时间自动按2下CAPSLOCK键脚本如下：SetobjShell=CreateObject("Wscript.Shell")'约循环8小时，10分钟执行一次fori=1to50WScript.Sleep590000objShell.SendKeys"{CAPSLOCK}"'WScript.Sleep3000objShell.Sen
auto.js_HTTP协议_get与post请求_ZHOU125disorder_ zjing125 #auto.js基础学习 auto.js HTTP协议 get与post请求
HTTP协议_get与post请求简介：HTTP协议对地址url进行一次HTTPGET请求http.get(url[,options,callback])url(string)请求的URL地址，需要以"http://"或"https://"开头。如果url没有以"http://"开头，则默认为"http://"。options(Object)请求选项。参见[http.request()][]。ca
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam