Yongqiang Cheng

Learning RoI Transformer for Detecting Oriented Objects in Aerial Images

Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu

Wuhan University，WHU：武汉大学，武大
Computational and Photogrammetric Vision Team，CAPTAIN：计算与摄影测量视觉研究组
State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing，LIESMARS：测绘遥感信息工程国家重点实验室
Computer Science，CS：计算机科学
Computer Vision，CV：计算机视觉
aerial ['eərɪəl]：adj. 空中的，航空的，空气的，空想的 n. 天线
orient ['ɔːrɪənt; 'ɒr-]：v. 朝向，确定方位，使适应 n. 东方国家 adj. 东方 (国家) 的，(太阳等) 冉冉升起的，(宝石) 光彩夺目的
transformer [træns'fɔː(r)mə(r)]：n.变压器，转换器
corresponding author：通讯作者

arXiv (archive - the X represents the Greek letter chi [χ]) is a repository of electronic preprints approved for posting after moderation, but not full peer review.

Abstract

Object detection in aerial images is an active yet challenging task in computer vision because of the birdview perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the ﬁnal object classiﬁcation conﬁdence and localization accuracy. Although rotated anchors have been used to tackle this problem, the design of them always multiplies the number of anchors and dramatically increases the computational complexity. In this paper, we propose a RoI Transformer to address these problems. More precisely, to improve the quality of region proposals, we ﬁrst designed a Rotated RoI (RRoI) learner to transform a Horizontal Region of Interest (HRoI) into a Rotated Region of Interest (RRoI). Based on the RRoIs, we then proposed a Rotated Position Sensitive RoI Align (RPS-RoI-Align) module to extract rotation-invariant features from them for boosting subsequent classiﬁcation and regression. Our RoI Transformer is with light weight and can be easily embedded into detectors for oriented object detection. A simple implementation of the RoI Transformer has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the ﬂexibility and eﬀectiveness of our RoI Transformer. The results demonstrate that it can be easily integrated with other detector architectures and signiﬁcantly improve the performances.
由于鸟瞰视角、高度复杂的背景以及物体多样的外观，航拍图像中的物体检测是计算机视觉中的一项活跃且具有挑战性的任务。特别是当在航拍图像中检测密集的物体时，依赖于用于普通物体检测的水平候选区域的方法经常引入感兴趣区域 (RoI) 和物体之间的不匹配。这导致最终物体分类置信度和定位精度之间的常见错位。尽管已经使用旋转 anchor 来解决这个问题，但是它们的设计总是使 anchor 的数量倍增并且显著增加了计算复杂性。在本文中，我们提出了一个 RoI Transformer来解决这些问题。更准确地说，为了提高候选区域的质量，我们首先设计了一个 Rotated RoI (RRoI) learner，将水平感兴趣区域 (HRoI) 转换为旋转感兴趣区域 (RRoI)。基于 RRoI，我们提出了 Rotated Position Sensitive RoI Align (RPS-RoI-Align) 模块，从中提取旋转不变特征，以促进后续分类和回归。我们的轻量级 RoI Transformer 可以轻松嵌入检测器中，用于有向边框物体检测。RoI Transformer 的简单实现在两个常见且具有挑战性的航空数据集 (DOTA 和 HRSC2016) 上实现了最先进的性能，降低的检测速度可忽略。当有向边界框标注可用时，我们的 RoI Transformer 超出了 deformable Position Sensitive RoI pooling。广泛的实验也验证了我们的 RoI Transformer 的灵活性和有效性。结果表明，它可以很容易地与其他检测器架构集成，并显著提高性能。

boost [buːst]：vt. 促进，增加，支援 vi. 宣扬，偷窃 n. 推动，帮助，宣扬
variant ['veərɪənt]：n. 变体，转化 adj. 不同的，多样的
appearance [ə'pɪər(ə)ns]：n. 外貌，外观，出现，露面
bird view：鸟瞰图
perspective [pə'spektɪv]：n. 观点，远景，透视图 adj. 透视的
pack [pæk]：n. 包装，一群，背包，包裹，一副 vt. 包装，压紧，捆扎，挑选，塞满 vi. 挤，包装货物，被包装，群集
horizontal [hɒrɪ'zɒnt(ə)l]：adj. 水平的，地平线的，同一阶层的 n. 水平线，水平面，水平位置
region of interest，ROI：感兴趣区域
tackle ['tæk(ə)l]：v. 应付，处理，与某人交涉，抢球，擒抱摔倒，抓获，对付，打
dramatically [drə'mætɪkəlɪ]：adv. 戏剧地，引人注目地 adv. 显著地，剧烈地
Object Detection in Aerial Images，ODAI：遥感图像目标检测
neglectable [nɪ'ɡlektəbl]：adj. 可忽略不计的
deformable [,di'fɔ:məbl]：adj. 可变形的

基于水平正边框的目标检测 (Horizontal Task) 和基于有向边框的目标检测 (Oriented Task)

1 Introduction

Object detection in aerial images aims at locating objects of interest (e.g., vehicles, airplanes) on the ground and identifying their categories. With more and more aerial images being available, object detection in aerial images has been a speciﬁc but active topic in computer vision [1-4]. However, unlike natural images that are often taken from horizontal perspectives, aerial images are typically taken with birdviews, which implies that objects in aerial images are always arbitrary oriented. Moreover, the highly complex background and variant appearances of objects further increase the diﬃculty of object detection in aerial images. These problems have been often approached by an oriented and densely packed object detection task [5-7], which is new while well-grounded and have attracted much attention in the past decade [8-12].
航拍图像中的物体检测旨在定位地面上的感兴趣物体 (e.g., vehicles, airplanes) 并识别它们的类别。随着越来越多可用的航拍图像，航拍图像中的物体检测已经成为计算机视觉中的一个特定但活跃的主题 [1-4]。然而，与通常从水平视角拍摄的自然图像不同，航拍图像通常采用鸟瞰图拍摄，这意味着航拍图像中的物体始终是任意方向的。此外，高度复杂的背景和物体的多样外观进一步增加了航拍图像中物体检测的难度。这些问题经常被一个有向边框且密集的物体检测任务所处理 [5-7]，这是一个新的，虽然有良好的基础，并在过去十年引起了很多关注 [8-12]。

imply [ɪm'plaɪ]：vt. 意味，暗示，隐含
decade ['dekeɪd; dɪ'keɪd]：n. 十年，十年期，十

Many of recent progresses on object detection in aerial images have beneﬁted a lot from the RCNN frameworks [2, 4, 7, 13-18]. These methods have reported promising detection performances, by using horizontal bounding boxes as region of interests (RoIs) and then relying on region-based features for category identiﬁcation [2, 4, 16]. However, as observed in [5, 19], these horizontal RoIs (HROIs) typically lead to misalignments between the bounding boxes and objects. For instance, as shown in Fig. 1, due to the oriented and densely-distributed properties of objects in aerial images, several object instances are often crowded and contained by one HRoI. As a result, it usually turns to be diﬃcult to train a detector for extracting object features and identifying the object’s accurate localization.
最近关于航空图像中物体检测的许多进展已经从 RCNN 框架 [2, 4, 7, 13-18] 中受益匪浅。这些方法已经报道了有希望的检测性能，通过使用水平边界框作为感兴趣区域 (RoI)，然后依靠基于区域的特征进行类别识别 [2, 4, 16]。然而，如 [5, 19] 中所观察到的，这些水平 RoI (HROI) 通常导致边界框和物体之间的错位。例如，如图 1 所示，由于航拍图像中物体的具备方向和密集分布特性，一些物体实例经常拥挤并由一个 HRoI 包含。通常变得难以训练用于提取物体特征的检测器并识别物体的精确定位。

promise ['prɒmɪs]：n. 许诺，允诺，希望 vt. 允诺，许诺，给人以...的指望或希望 vi. 许诺，有指望，有前途

Figure 1: Horizontal (top) v.s. Rotated RoI warping (bottom) illustrated in an image with many densely packed objects. One horizontal RoI often contains several instances, which leads ambiguity to the subsequent classification and location task. By contrast, a rotated RoI warping usually provides more accurate regions for instances and enables to better extract discriminative features for object detection.
图 1：Horizontal (top) v.s. Rotated RoI warping (bottom) 在具有许多密集物体的图像中的示例。一个水平 RoI 通常包含多个实例，这导致后续分类和定位任务的模糊性。相比之下，旋转的 RoI 扭曲通常为实例提供更准确的区域，并且能够更好地提取用于物体检测的辨别特征。

discriminative [dɪs'krɪmɪnətɪv]：adj. 区别的，歧视的，有识别力的
warp [wɔːp]：n. 弯曲，歪曲，偏见，乖戾 vt. 使变形，使有偏见，曲解 vi. 变歪，变弯，曲解

Instead of using horizontal bounding boxes, oriented bounding boxes have been alternatively employed to eliminate the mismatching between RRoIs and corresponding objects [5, 19, 20]. In order to achieve high recalls at the phase of RRoI generation, a large number of anchors are required with diﬀerent angles, scales and aspect ratios. These methods have demonstrated promising potentials on detecting sparsely distributed objects [8-10, 21]. However, due to the highly diverse directions of objects in aerial images, it is often intractable to acquire accurate RRoIs to pair with all the objects in an aerial image by using RRoIs with limited directions. Consequently, the elaborate design of RRoIs with as many directions and scales as possible usually suffers from its high computational complexity at region classification and localization phases.
不使用水平边界框，而是采用有向的边界框来消除 RRoI 与相应物体之间的不匹配 [5, 19, 20]。为了在 RRoI 生成阶段实现高召回率，需要大量的锚具有不同的角度、尺度和宽高比。这些方法已经证明在检测稀疏分布的物体方面具有很大潜力 [8-10, 21]。然而，由于航拍图像中物体的方向极其不同，通过使用方向有限的 RRoI 获取准确的 RRoI 以与航拍图像中的所有物体配对通常是难以处理的。因此，具有尽可能多的方向和尺度的 RRoI 的精心设计通常会受到其在区域分类和定位阶段的高计算复杂性的影响。

consequently ['kɒnsɪkw(ə)ntlɪ]：adv. 因此，结果，所以
sparsely ['spɑrsli]：adv. 稀疏地，贫乏地
diverse [daɪ'vɜːs; 'daɪvɜːs]：adj. 不同的，相异的，多种多样的，形形色色的
intractable [ɪn'træktəb(ə)l]：adj. 棘手的，难治的，倔强的，不听话的
elaborate [ɪ'læb(ə)rət]：adj. 精心制作的，详尽的，煞费苦心的 vt. 精心制作，详细阐述，从简单成分合成 vi. 详细描述，变复杂
ideal [aɪ'dɪəl; aɪ'diːəl]：adj. 理想的，完美的，想象的，不切实际的 n. 理想，典范

As the regular operations in conventional networks for object detection [14] have limited generalization to rotation and scale variations, it is required of some orientation and scale-invariant in the design of RoIs and corresponding extracted features. To this end, Spatial Transformer [22] and deformable convolution and RoI pooling [23] layers have been proposed to model the geometry variations. However, they are mainly designed for the general geometric deformation without using the oriented bounding box annotation. In the ﬁeld of aerial images, there is only rigid deformation, and oriented bounding box annotation is available. Thus, it is natural to argue that it is important to extract rotation-invariant region features and to eliminate the misalignment between region features and objects especially for densely packed ones.
由于传统的物体检测网络中的常规操作 [14] 对旋转和尺度变化的泛化能力有限，因此在 RoI 和相应的提取特征的设计中需要一些方向和尺度不变。为此，已经提出 Spatial Transformer [22] 和可变形卷积和 RoI pooling [23] 层来模拟几何变化。但是，它们主要是针对一般几何变形而不使用有向的边界框标注而设计的。在航拍图像领域，只有刚性变形，并且可以使用有向的边界框标注。因此，很自然地认为提取旋转不变区域特征并消除区域特征和物体之间的错位是很重要的，特别是对于密集的区域特征和物体。

regular ['regjʊlə]：adj. 定期的，有规律的，合格的，整齐的，普通的 n. 常客，正式队员，中坚分子 adv. 定期地，经常地
rigid ['rɪdʒɪd]：adj. 严格的，僵硬的，死板的，坚硬的，精确的

In this paper, we propose a module called RoI Transformer, targeting to achieve detection of oriented and densely-packed objects, by supervised RRoI learning and feature extraction based on position sensitive alignment through a two-stage framework [13-15, 24, 25]. It consists of two parts. The ﬁrst is the RRoI Learner, which learns the transformation from HRoIs to RRoIs. The second is the Rotated Position Sensitive RoI Align, which extract the rotation-invariant feature extraction from the RRoI for subsequent objects classiﬁcation and location regression. To further improve the eﬃciency, we adopt a light head structure for all RoI-wise operations. We extensively test and evaluate the proposed RoI Transformer on two public datasets for object detection in aerial images i.e.DOTA [5] and HRSC2016 [19], and compare it with state-of-the-art approaches, such as deformable PS RoI pooling [23]. In summary, our contributions are in three-fold:
在本文中，我们提出了一个名为 RoI Transformer 的模块，旨在通过监督的 RRoI 学习和基于位置敏感对齐的特征提取，通过两阶段框架实现对有向且密集物体的检测 [13-15, 24, 25]。它由两部分组成。第一部分是 RRoI 学习器，它学习从 HRoI 到 RRoI 的转变。第二部分是 Rotated Position Sensitive RoI Align，它从 RRoI 中提取旋转不变特征提取，用于后续物体分类和位置回归。为了进一步提高效率，我们采用 light head 结构进行所有 RoI-wise 操作。我们在两个公共数据集上广泛测试和评估所提出的 RoI Transformer，用于航空图像中的物体检测，即 DOTA [5] 和 HRSC2016 [19]，并将其与最先进的方法进行比较，例如 deformable PS RoI pooling [23]。总之，我们的贡献有三方面：

We propose a supervised rotated RoI learner, which is a learnable module that can transform Horizontal RoIs to RRoIs. This design can not only eﬀectively alleviate the misalignment between RoIs and objects, but also avoid a large amount of RRoIs designed for oriented object detection.
我们提出了一种受监督的旋转 RoI 学习器，这是一个可以将水平 RoI 转换为 RRoI 的可学习模块。这种设计不仅可以有效地减轻 RoI 和物体之间的错位，还可以避免为有向物体检测设计的大量 RRoI。
We designe a Rotated Position Sensitive RoI Alignment module for spatially invariant feature extraction, which can eﬀectively boost the object classiﬁcation and location regression. The module is a crucial design when using light-head RoI-wise operation, which grantees the eﬃciency and low complexity.
我们设计了 Rotated Position Sensitive RoI Alignment 模块，用于空间不变特征提取，可以有效地提升物体分类和位置回归。当使用 light-head RoI-wise 操作时，该模块是一个至关重要的设计，它提高了效率和低复杂性。
We achieve state-of-the-art performance on several public large-scale datasets for oriented object detection in aerial images. Experiments also show that the proposed RoI Transformer can be easily embedded into other detector architectures with signiﬁcant detection performance improvements.
我们在几个公共大型数据集上实现了最先进的性能，用于航拍图像中的有向物体检测。实验还表明，所提出的 RoI Transformer 可以很容易地嵌入到其他检测器架构中，并且具有显著的检测性能改进。

alleviate [ə'liːvɪeɪt]：vt. 减轻，缓和
grantee [ɡrɑːn'tiː]：n. 受让人，被授与者

2 Related Work

2.1 Oriented Bounding Box Regression

Detecting oriented objects is an extension of general horizontal object detection. The objective of this problem is to locate and classify an object with orientation information, which is mainly tackled with methods based on region proposals. The HRoI based methods [5, 26] usually use a normal RoI Warping to extract feature from a HRoI, and regress position oﬀsets relative to the ground truths. The HRoI based method exists a problem of misalignment between region feature and instance. The RRoI based methods [9, 10] usually use a Rotated RoI Warping to extract feature from a RRoI, and regress position oﬀsets relative to the RRoI, which can avoid the problem of misalignment in a certain.
检测有向物体是一般水平物体检测的扩展。此问题的目标是使用方向信息定位和分类物体，主要使用基于候选区域的方法进行处理。基于 HRoI 的方法 [5, 26] 通常使用正常的 RoI Warping 从 HRoI 中提取特征，并回归相对于 ground truth 的位置。基于 HRoI 的方法存在区域特征和实例之间错位的问题。基于 RRoI 的方法 [9, 10] 通常使用 Rotated RoI Warping 从 RRoI 中提取特征，并回归相对于 RRoI 的位置，这可以避免某些特定的错位问题。

misalignment [mɪsə'laɪnmənt]：n. 不重合，未对准

However, the RRoI based method involves generating a lot of rotated proposals. The [10] adopted the method in [8] for rotated proposals. The SRBBS [8] is diﬃcult to be embedded in the neural network, which would cost extra time for rotated proposal generation. The [9, 12, 21, 27] used a design of rotated anchor in RPN [15]. However, the design is still time-consuming due to the dramatic increase in the number of anchors (num_scales $\times$ num_aspect_ratios $\times$ num_angles). For example, 3 $\times$ 5 $\times$ 6 = 90 anchors at a location. A large amount of anchors increases the computation of parameters in the network, while also degrades the eﬃciency of matching between proposals and ground truths at the same time. Furthermore, directly matching between oriented bounding boxes (OBBs) is harder than that between horizontal bounding boxes (HBBs) because of the existence of plenty of redundant rotated anchors. Therefore, in the design of rotated anchors, both the [9, 28] used a relaxed matching strategy. There are some anchors that do not achieve an IoU above 0.5 with any ground truth, but they are assigned to be True Positive samples, which can still cause the problem of misalignment. In this work, we still use the horizontal anchors. The diﬀerence is that when the HRoIs are generated, we transform them into RRoIs by a light fully connected layer. Based on this strategy, it is unnecessary to increase the number of anchors. And a lot of precisely RRoIs can be acquired, which will boost the matching process. So we directly use the IoU between OBBs as a matching criterion, which can eﬀectively avoid the problem of misalignment.
但是，基于 RRoI 的方法涉及生成大量旋转的候选区域。[10] 采用 [8] 中的方法生成旋转的候选区域。SRBBS [8] 很难嵌入到神经网络中，这会花费额外的时间来生成旋转的候选区域。[9, 12, 21, 27] 在 RPN [15] 中使用了旋转 anchor 的设计。然而，由于 anchor 的数量急剧增加 (num_scales $\times$ num_aspect_ratios $\times$ num_angles)，设计仍然很耗时。例如，在一个位置处有 3 $\times$ 5 $\times$ 6 = 90 个 anchor。大量的 anchor 增加了网络中参数的计算，同时也降低了候选区域与 ground truth 之间匹配的效率。此外，由于存在大量冗余的旋转 anchor，因此有向边界框 (OBB) 之间的直接匹配比水平边界框 (HBB) 之间的直接匹配更难。因此，在旋转 anchor 的设计中，[9, 28] 都使用了松弛的匹配策略。有些 anchor 在任何 ground truth 上都没有达到 0.5 以上的 IoU，但它们仍被指定为真正的正样本，这仍然可能导致错位问题。在这项工作中，我们仍然使用水平 anchor。不同之处在于，当生成 HRoI 时，我们通过轻量的全连接层将它们转换为 RRoI。基于这种策略，没有必要增加 anchor 的数量。并且可以获得许多精确的 RRoI，这将促进匹配过程。因此我们直接使用 OBB 之间的 IoU 作为匹配标准，这可以有效地避免错位问题。

dramatic [drə'mætɪk]：adj. 戏剧的，急剧的，引人注目的，激动人心的
degrade [dɪ'greɪd]：vt. 贬低，使...丢脸，使...降级，使...降解 vi. 降级，降低，退化
criterion [kraɪ'tɪərɪən]：n. 标准，准则，规范，准据
oriented bounding box，OBB
horizontal bounding box，HBB

2.2 Spatial-invariant Feature Extraction

CNN frameworks have good properties for the generalization of translation-invariant features while showing poor performance on rotation and scale variations. For image feature extraction, the Spatial Transformer [22] and deformable convolution [23] are proposed for the modeling of arbitrary deformation. They are learned from the target tasks without extra supervision. For region feature extraction, the deformable RoI pooling [23] is proposed, which is achieved by oﬀset learning for sampling grid of RoI pooling. It can better model the deformation at instance level compared to regular RoI warping [14, 24, 25]. The STN and deformable modules are widely used for recognition in the ﬁeld of scene text and aerial images [29-33]. As for object detection in aerial images, there are more rotation and scale variations, but hardly nonrigid deformation. Therefore, our RoI Transformer only models the rigid spatial transformation, which is learned in the format of $(d_{x}, d_{y}, d_{w}, d_{h}, d_{\theta})$ . However, diﬀerent from deformable RoI pooling, our RoI Transformer learns the oﬀset with the supervision of ground truth. And the RRoIs can also be used for further rotated bounding box regression, which can also contribute to the object localization performance.
CNN 框架对于平移不变特征的泛化具有良好的性能，但是在旋转和尺度变化上表现出差的性能。对于图像特征提取，Spatial Transformer [22] and deformable convolution [23] 被提出用于任意变形的建模。他们从目标任务中学习而无需额外的监督。对于区域特征提取，提出了 deformable RoI pooling [23]，这是通过对 RoI pooling 的采样网格进行偏移量学习来实现的。与常规的 RoI warping 相比，它可以更好地模拟实例级的变形 [14, 24, 25]。STN 和可变形模块广泛用于场景文本和航空图像领域的识别 [29-33]。对于航拍图像中的物体检测，存在更多的旋转和尺度变化，但几乎没有非刚性变形。因此，我们的 RoI Transformer 仅模拟刚性空间变换，其以 $(d_{x}, d_{y}, d_{w}, d_{h}, d_{\theta})$ 的格式学习。然而，与 deformable RoI pooling 不同，我们的 RoI Transformer 通过对 ground truth 的监督来学习偏移量。并且 RRoI 还可以用于进一步旋转的边界框回归，这也可以有助于物体定位性能。

nonrigid [nɒn'rɪdʒɪd]：adj. 非刚性的

2.3 Light RoI-wise Operations

RoI-wise operation is the bottleneck of eﬃciency on two-stage algorithms because the computation are not shared. The Light-head R-CNN [34] is proposed to address this problem by using a larger separable convolution to get a thin feature. It also employs the PS RoI pooling [24] to further reduce the dimensionality of feature maps. A single fully connected layer is applied on the pooled features with the dimensionality of 10, which can signiﬁcantly improve the speed of two-stage algorithms. In aerial images, there exist scenes where the number of instances is large. For example, over 800 instances are densely packed on a single 1024 $\times$ 1024 image. Our approach is similar to Deformable RoI pooling [23] where the RoI-wise operations are conducted twice. The light-head design is also employed for eﬃciency guarantee.
RoI-wise 操作是两阶段算法效率的瓶颈，因为计算不是共享的。提出了 Light-head R-CNN [34] 通过使用更大的可分离卷积来获得窄特征来解决这个问题。它还采用 PS RoI pooling [24] 来进一步降低特征图的维数。在合并的特征上应用单个全连接的层，其维数为 10，这可以显著提高两阶段算法的速度。在航拍图像中，存在实例数量大的场景。例如，超过 800 个实例密集地分布在单个 1024 $\times$ 1024 图像上。我们的方法类似于 Deformable RoI pooling [23]，其中 RoI-wise 操作进行两次。light-head 设计也用于提高效率。

bottleneck ['bɒt(ə)lnek]：n. 瓶颈，障碍物
guarantee [gær(ə)n'tiː]：n. 保证，担保，保证人，保证书，抵押品 vt. 保证，担保

3 RoI Transformer

In this section, we present details of our proposed ROI Transformer, which contains a trainable fully connected layer termed as RRoI Learner and a RRoI warping layer for learning the rotated RoIs from the estimated horizontal RoIs and then warping the feature maps to maintain the rotation invariance of deep features. Both of these two layers are diﬀerentiable for the end-to-end training. The architecture is shown in Fig.2.
在本节中，我们将详细介绍我们提出的 ROI Transformer，其中包含一个可训练的全连接层，称为 RRoI Learner，一个 RRoI 变形层，用于从估计的水平 RoI 中学习旋转的 RoI，然后扭曲特征图以保持深度特征的旋转不变性。这两个层都是可微的适用于端到端的训练。架构如图 2 所示。

differentiable [,dɪfə'renʃɪəb(ə)l]：adj.可微的，可辨的，可区分的

Figure 2: The architecture of RoI Transformer. For each HRoI, it is passed to a RRoI learner. The RRoI learner in our network is a PS RoI Align followed by a fully connected layer with the dimension of 5 which regresses the oﬀsets of RGT relative to HRoI. The Box decoder is at the end of RRoI Learner, which takes the HRoI and the oﬀsets as input and outputs the decoded RRoIs. Then the feature map and the RRoI are passed to the RRoI warping for geometry robust feature extraction. The combination of RRoI Learner and RRoI warping form a RoI Transformer (RT). The geometry robust pooled feature from the RoI Transformer is then used for classiﬁcation and RRoI regression.
图 2：RoI Transformer 的架构。对于每个 HRoI，它将传递给 RRoI 学习器。我们网络中的 RRoI 学习器是 PS RoI Align，后面跟一个维度为 5 的全连接层，它回归相对于 HRoI 的 RGT 的偏移量。Box 解码器位于 RRoI Learner 的末尾，它将 HRoI 和偏移量作为输入并输出解码的 RRoI。然后将特征图和 RRoI 传递给 RRoI 变形以进行几何鲁棒特征提取。RRoI Learner 和 RRoI 变形的组合构成了 RoI Transformer (RT)。然后使用来自 RoI Transformer 的几何鲁棒合并的特征进行分类和 RRoI 回归。

3.1 RRoI Learner

The RRoI learner aims at learning rotated RoIs from the feature map of horizontal RoIs. Suppose we have obtained n horizontal RoIs denoted by $\{\mathcal{H}_{i}\}$ with the format of $(x, y, w, h)$ for predicted 2D locations, width and height of a HRoI, the corresponding feature maps can be denoted as $\{\mathcal{F}_{i}\}$ with the same index. Since every HRoI is the external rectangle of a RRoI in ideal scenarios, we are trying to infer the geometry of RRoIs from every feature map $\mathcal{F}_{i}$ using the fully connected layers. We follow the oﬀset learning for object detection to devise the regression target as
RRoI 学习器的目标是从水平 RoI 的特征图中学习旋转的 RoI。假设我们已经获得了由 $\{\mathcal{H}_{i}\}$ 表示的 n 个水平 RoI，其格式为 $(x, y, w, h)$ ，用于预测 HRoI 的 2D 位置、宽度和高度，相应的特征图可以用相同的索引表示为 $\{\mathcal{F}_{i}\}$ 。由于在理想情况下每个 HRoI 都是 RRoI 的外部矩形，因此我们尝试使用全连接的层从每个特征图推断出 RRoI 的几何结构。我们遵循用于物体检测的偏移量学习来设计回归目标

$\begin{aligned} t_{x}^{\ast} &= \frac{1}{w_{r}}\left( (x^{\ast} - x_r)\cos\theta_{r} + (y^{\ast} - y_r)\sin\theta_{r} \right),\\ t_{y}^{\ast} &= \frac{1}{h_{r}}\left( (y^{\ast} - y_r)\cos\theta_{r} - (x^{\ast} - x_r)\sin\theta_{r} \right),\\ t_{w}^{\ast} &= \log\frac{w^{\ast}}{w_{r}}, \, t_{h}^{\ast} = \log\frac{h^{\ast}}{h_{r}},\\ t_{\theta}^{\ast} &= \frac{1}{2\pi} \left( (\theta^{*} - \theta_{r}) \, \mod 2\pi \right),\\ \end{aligned} \tag{1}$

where $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ is a stacked vector for representing location, width, height and orientation of a RRoI, respectively. $(x^{\ast}, y^{\ast}, w^{\ast}, h^{\ast}, \theta^{\ast})$ is the ground truth parameters of an oriented bounding box. The modular operation is used to adjust the angle oﬀset target $t_{\theta}^{\ast}$ that falls in $\pi)$ for the convenience of computation. Indeed, the target for HRoI regression is a special case of Eq. (1) if ${\theta}^{\ast} = \frac{3\pi}{2}$ . The relative oﬀsets are illustrated in Fig. 3 as explanation. Mathematically, the fully connected layer outputs a vector $(t_{x}, t_{y}, t_{w}, t_{h}, t_{\theta})$ for every feature map $\mathcal{F}_{i}$ by
其中 $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ 是用于分别表示 RRoI 的位置、宽度、高度和方向的堆叠矢量。 $(x^{\ast}, y^{\ast}, w^{\ast}, h^{\ast}, \theta^{\ast})$ 是有向边界框的 ground truth 参数。模操作用于调整设置目标的角度 $t_{\theta}^{\ast}$ ，该角度落在 ${\theta}^{\ast} = \frac{3\pi}{2}$ 中以便于计算。实际上，HRoI 回归的目标是 Eq. (1) 的特例。如果 ${\theta}^{\ast} = \frac{3\pi}{2}$ 。相对偏移量在图 3 中进行说明。在数学上，全连接的层为每个特征图 $\mathcal{F}_{i}$ 输出一个向量 $(t_{x}, t_{y}, t_{w}, t_{h}, t_{\theta})$

devise [dɪ'vaɪz]：vt. 设计，想出，发明，图谋，遗赠给 n. 遗赠
modular ['mɒdjʊlə]：adj. 模块化的，模数的，有标准组件的

$\mathcal{G}(\mathcal{F}; \Theta) \tag{2}$

where $\mathcal{G}$ represents the fully connected layer and $\Theta$ is the weight parameters of $\mathcal{G}$ and $\mathcal{F}$ is the feature map for every HRoI.
其中 $\mathcal{G}$ 代表全连接的层， $\Theta$ 是 $\mathcal{G}$ 的权重参数， $\mathcal{F}$ 是每个 HRoI 的特征图。

Figure 3: An example explaining the relative oﬀset. There are three coordinate systems. The XOY is bound to the image. The $x_{1}O_{1}y_{1}$ and $x_{2}O_{2}y_{2}$ are bound to two RRoIs (blue rectangle) respectively. The yellow rectangle represents the RGT. The right two rectangles are obtained from the left two rectangles by translation and rotation while keeping the relative position unchanged. The $(\Delta x_{1}, \Delta y_{1})$ is not equal to $(\Delta x_{2}, \Delta y_{2})$ if they are all in the $X O Y$ . They are the same if $(\Delta x_{1}, \Delta y_{1})$ falls in $x_{1}O_{1}y_{1})$ and $(\Delta x_{2}, \Delta y_{2})$ in $x_{2}O_{2}y_{2})$ . The $\alpha_{1}$ and $\alpha _{2}$ denote the angles of two RRoIs respectively.
图 3：解释相对偏移的示例。有三个坐标系。 $X O Y$ 与图像绑定。 $x_{1}O_{1}y_{1}$ and $x_{2}O_{2}y_{2}$ 分别绑定到两个 RRoI (蓝色矩形)。黄色矩形表示 RGT。通过平移和旋转从左侧两个矩形获得右侧两个矩形，同时保持相对位置不变。如果它们都在 $X O Y$ 中，则 $(\Delta x_{1}, \Delta y_{1})$ 不等于 $(\Delta x_{2}, \Delta y_{2})$ 。如果 $(\Delta x_{1}, \Delta y_{1})$ 落在 $x_{1}O_{1}y_{1})$ 中并且 $(\Delta x_{2}, \Delta y_{2})$ 落入 $x_{2}O_{2}y_{2})$ 中它们是相同的。 $\alpha_{1}$ and $\alpha _{2}$ 分别表示两个 RRoI 的角度。

While training the layer $\mathcal{G}$ , we are about to match the input HRoIs and the ground truth of oriented bounding boxes (OBBs). For the consideration of computational eﬃciency, the matching is between the HRoIs and axis-aligned bounding boxes over original ground truth. Once an HRoI is matched, we set the $t_{\theta}^{\ast}$ directly by the deﬁnition in Eq. (1). The loss function for optimization is used as Smooth L1 loss [13]. For the predicted $t$ in every forward pass, we decode it from oﬀset to the parameters of RRoI. That is to say, our proposed RRoI learner can learn the parameters of RRoI from the HRoI feature map $\mathcal{F}$ .
在训练 $\mathcal{G}$ 层时，我们即将匹配输入 HRoI 和有向边界框 (OBB) 的 ground truth。为了考虑计算效率，匹配是在 HRoI 和轴对齐的边界框之间而不是原始的 ground truth。一旦 HRoI 匹配，我们直接通过方程式 (1) 中的定义设置 $t_{\theta}^{\ast}$ 。优化的损失函数用作平滑 L1 损失 [13]。对于每个前向传递中的预测 $t$ ，我们将其从偏移量解码为 RRoI 的参数。也就是说，我们提出的 RRoI 学习器可以从 HRoI 特征图 $\mathcal{F}$ 中学习 RRoI 的参数。

3.2 Rotated Position Sensitive RoI Align

Once the parameters of RRoI are obtained, we are able to extract the rotation-invariant deep features for Oriented Object Detection. Here, we propose the module of Rotated Position Sensitive (RPS) RoI Align to extract the rotation-invariant features within a network.
一旦获得了 RRoI 的参数，我们就能够为有向目标检测提取旋转不变的深度特征。在这里，我们提出了旋转位置敏感 (RPS) RoI Align 模块，以提取网络中的旋转不变特征。

Given the input feature map $\mathcal{D}$ with $\times W \times C$ channels and a RRoI $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ , where $x_{r}, y_{r})$ denotes the center of the RRoI and $w_{r}, h_{r})$ denotes the width and height of the RRoI. The $(\theta_{r})$ gives the orientation of the RRoI. The RPS RoI pooling divides the Rotated RoI into $\times K$ bins and outputs a feature map $\mathcal{Y}$ with the shape of $\times K \times C$ . For the bin with index $\leq i, j < K)$ of the output channel $\leq c < C)$ , we have
给定具有 $\times W \times C$ 通道的输入特征图 $\mathcal{D}$ 和 RRoI $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ ，其中 $x_{r}, y_{r})$ 表示 RRoI 的中心， $w_{r}, h_{r})$ 表示 RRoI 的宽度和高度。 $(\theta_{r})$ 给出了 RRoI 的方向。RPS RoI pooling 将旋转的 Rotated RoI 分成 $\times K$ bin，并输出形状为 $\times K \times C$ 的特征图 $\mathcal{Y}$ 。对于输出通道 $\leq c < C)$ 的索引 $\leq i, j < K)$ 的 bin，我们有

$\mathcal{Y}_{c}(i,j) = \sum_{(x,y) \in bin(i,j)} D_{i,j,c}(\mathcal{T}_{\theta}(x,y))/n_{i,j}, \tag{3}$

where the $D_{i,j,c}$ is a feature map out of the $\times K \times C$ feature maps. The channel mapping is the same as the original Position Sensitive RoI pooling [24]. The $n_{i,j}$ is the number of sampling locations in the bin. The $b i n (i, j)$ denotes the coordinates set $\{ i \frac{w_{r}}{k} + (s_{x} + 0.5) \frac{w_{r}}{k \times n}; s_{x} = 0,1, ... n-1\} \times \{ j \frac{h_{r}}{k} + (s_{y} + 0.5) \frac{h_{r}}{k \times n}; s_{y} = 0,1, ... n-1 \}$ . And for each $\in bin(i,j)$ , it is converted to $(x ’, y ’)$ by $\mathcal{T}_{\theta}$ , where
其中 $D_{i,j,c}$ 是 $\times K \times C$ 特征图中的一个特征图。The channel mapping is the same as the original Position Sensitive RoI pooling [24]. $n_{i,j}$ 是 bin 中的采样位置数目。 $b i n (i, j)$ 表示坐标集 $\{ i \frac{w_{r}}{k} + (s_{x} + 0.5) \frac{w_{r}}{k \times n}; s_{x} = 0,1, ... n-1\} \times \{ j \frac{h_{r}}{k} + (s_{y} + 0.5) \frac{h_{r}}{k \times n}; s_{y} = 0,1, ... n-1 \}$ 。对于每个 $\in bin(i,j)$ ，它由 $\mathcal{T}_{\theta}$ 转换为 $(x ’, y ’)$ ，其中

$\left( \begin{array}{cc} x'\\ y'\\ \end{array} \right) = \left( \begin{array}{c} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta\\ \end{array} \right) \left( \begin{array}{c} x - w_{r}/2\\ y - h_{r}/2\\ \end{array} \right) + \left( \begin{array}{c} x_{r}\\ y_{r}\\ \end{array} \right), \tag{4}$

Typically, Eq. (3) is implemented by bilinear interpolation.
通常，Eq. (3) 通过双线性插值实现。

interpolation [ɪn,tɜːpəʊ'leɪʃən]：n. 插入，篡改，填写，插值
geometry [dʒɪ'ɒmɪtrɪ]：n. 几何学，几何结构

Figure 4: Rotated RoI warping. The shape of the warped feature is a horizontal rectangle (we use 3 $\times$ 3 for example here.) The sampling grid for RoI warping is determined by the RRoI $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ . We employ the image instead of feature map for better explanation. After RRoI warping, the extracted features are geometry robust. (The orientations of all the vehicles are the same).
Figure 4: Rotated RoI warping. 扭曲特征的形状是水平矩形 (例如，我们在这里以 3 $\times$ 3 为例)。用于 RoI 扭曲的采样网格由 RRoI $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ 确定。我们使用图像而不是特征图来更好地解释。在 RRoI 变形之后，提取的特征是几何稳健的。(所有车辆的方向都相同)。

3.3 RoI Transformer for Oriented Object Detection

The combination of RRoI Learner, and RPS RoI Align forms a RoI Transformer (RT) module. It can be used to replace the normal RoI warping operation. The pooled feature from RT is rotation-invariant. And the RRoIs provide better initialization for later regression because the matched RRoI is closer to the RGT compared to the matched HRoI. As mentioned before, a RRoI is a tuple with 5 elements $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ . In order to eliminate ambiguity, we use $h$ to denote the short side and $w$ the long side of a RRoI. The orientation vertical to $h$ and falling in $\pi]$ is chosen as the ﬁnal direction of a RRoI. After all these operations, the ambiguity can be eﬀectively avoided. And the operations are required to reduce the rotation variations.
RRoI Learner 和 RPS RoI Align 的组合形成了 RoI Transformer (RT) 模块。它可以用来代替正常的 RoI 变形操作。 RT 的池化的特征是旋转不变的。并且 RRoI 为后来的回归提供了更好的初始化，因为匹配的 RRoI 与匹配的 HRoI 相比更接近 RGT。如前所述，RRoI 是一个包含 5 个元素的元组 $(x_{r}, y_{r}, w_{r}, h_{r}, \theta_{r})$ 。为了消除歧义，我们使用 $h$ 来表示 RRoI 的短边和 $w$ 来表示 RRoI 的长边。垂直于 $h$ 并且落在 $\pi]$ 中的方向被选择为 RRoI 的最终方向。在所有这些操作之后，可以有效地避免模糊。并且需要操作来减少旋转变化。

ambiguity [æmbɪ'gjuːɪtɪ]：n. 含糊，不明确，暧昧，模棱两可的话
vertical ['vɜːtɪk(ə)l]：adj. 垂直的，直立的，头顶的，顶点的，纵长的，直上的 n. 垂直线，垂直面，垂直位置

IoU between OBBs In common deep learning based detectors, there are two cases that IoU calculation is needed. The ﬁrst lies in the matching process while the second is conducted for (Non Maximum Suppression) NMS. The IoU between two OBBs can be calculated by Equation 5:
IoU between OBBs 在常见的基于深度学习的检测器中，有两种情况需要进行 IoU 计算。第一个在于匹配过程，而第二个是执行 (非极大值抑制) NMS 需要的。两个 OBB 之间的 IoU 可以通过公式 5 计算：

$\frac{area(B_{1} \bigcap B_{2})}{area(B_{1} \bigcup B_{2})}, \tag{5}$

where the $B_1$ and $B_2$ represent two OBBs, say, a RRoI and a RGT. The calculation of IoU between OBBs is similar with that between horizontal bounding boxes (HBBs). The only diﬀerence is that the IoU calculation for OBBs is performed within polygons as illustrated in Fig. 5. In our model, during the matching process, each RRoI is assigned to be True Positive if the IoU with any RGT is over 0.5. It is worth noting that although RRoI and RGT are both quadrilaterals, their intersection may be diverse polygons, e.g. a hexagon as shown in Fig 5(a). For the long and thin bounding boxes, a slight jitter in the angle may cause the IoU of the two predicted OBBs to be very low, which would make the NMS diﬃcult as can be seen in Fig. 5(b).
其中 $B_1$ and $B_2$ 代表两个 OBB，比如一个 RRoI 和一个 RGT。OBB 之间的 IoU 计算与水平边界框 (HBB) 之间的计算类似。唯一不同的是，OBB 的 IoU 计算是在多边形内执行的，如图 5 所示。在我们的模型中，在匹配过程中，如果与任何 RGT 的 IoU 超过 0.5，则这个 RRoI 被指定为 True Positive。值得注意的是，尽管 RRoI 和 RGT 都是四边形，但它们的交叉点可以是不同的多边形，例如，如图 5(a) 所示的六边形。对于长而窄的边界框，角度的轻微抖动可能导致两个预测的 OBB 的 IoU 非常低，这将使 NMS 变得非常复杂，如图 5(b) 所示。

quadrilateral [,kwɒdrɪ'læt(ə)r(ə)l]：n. 四边形 adj. 四边形的
diverse [daɪ'vɜːs; 'daɪvɜːs]：adj. 不同的，相异的，多种多样的，形形色色的
hexagon ['heksəg(ə)n]：n. 六角形，六边形 adj. 成六角的，成六边的
jitter ['dʒɪtə]：n. 紧张不安，晃动 v. 紧张不安，晃动

Figure 5: Examples of IoU between oriented bounding boxes(OBBs). (a) IoU between a RRoI and a matched RGT. The red hexagon indicates the intersection area between RRoI and RGT. (b) The intersection between two long and thin bounding boxes. For long and thin bounding boxes, a slight jitter in the angle may lead to a very low IoU of the two boxes. The red quadrilateral is the intersection area. In such case, the predicted OBB with score of 0.53 can not be suppressed since the IoU is very low.
Figure 5: Examples of IoU between oriented bounding boxes(OBBs). (a) RRoI 和匹配的 RGT 之间的 IoU。红色六边形表示 RRoI 和 RGT 之间的交叉区域。(b) 两个长而窄的边界框之间的交叉点。对于长而窄的边界框，角度的轻微抖动可能导致两个盒子的 IoU 非常低。红色四边形是交叉区域。在这种情况下，由于 IoU 非常低，因此无法抑制得分为 0.53 的预测 OBB。

Targets Calculation After RRoI warping, the rotation-invariant feature can be acquired. Consistently, the oﬀsets also need to be rotation-invariant. To achieve this goal, we use the relative oﬀsets as explained in Fig. 3. The main idea is to employ the coordinate system binding to the RRoI rather than the image for oﬀsets calculation. The Eq. (1) is the derived formulation for relative oﬀsets.
Targets Calculation RRoI 变形后，可以获取旋转不变特征。一致地，偏移量也需要是旋转不变。为了实现这一目标，我们使用相对偏移量，如图 3 所示。主要思想是使用坐标系绑定到 RRoI 而不是图像，然后用于偏移量计算。Eq. (1) 是相对偏移量的推导公式。

consistently [kən'sɪstəntli]：adv. 一贯地，一致地，坚实地

4 Experiments and Analysis

4.1 Datasets

For experiments, we choose two datasets, known as DOTA [5] and HRSC2016 [19], for oriented object detection in aerial images.
对于实验，我们选择两个数据集，名为 DOTA [5] 和 HRSC2016 [19]，用于航拍图像中的有向物体检测。

DOTA [5]. This is the largest dataset for object detection in aerial images with oriented bounding box annotations. It contains 2806 large size images. There are objects of 15 categories, including Baseball diamond (BD), Ground track field (GTF), Small vehicle (SV), Large vehicle (LV), Tennis court (TC), Basketball court (BC), Storage tank (ST), Soccer-ball field (SBF), Roundabout (RA), Swimming pool (SP), and Helicopter (HC). The fully annotated DOTA images contain 188, 282 instances. The instances in this data set vary greatly in scale, orientation, and aspect ratio. As shown in [5], the algorithms designed for regular horizontal object detection get modest performance on it. Like PASCAL VOC [35] and COCO [36], the DOTA provides the evaluation server¹.
DOTA [5]。这是在航拍图像中有向边界框标注进行物体检测的最大数据集。它包含 2806 个大尺寸图像。有 15 个类别的物体，包括棒球场 (BD)，田径场 (GTF)，小型车辆 (SV)，大型车辆 (LV)，网球场 (TC)，篮球场 (BC)，储油罐 (ST) ，足球场 (SBF)，环形交叉口 (RA)，游泳池 (SP) 和直升机 (HC)。完全标注的 DOTA 图像包含 188,282 个实例。此数据集中的实例在尺度、方向和宽高比方面差异很大。如 [5] 所示，为常规水平物体检测而设计的算法在其上获得了适度的性能。与 PASCAL VOC [35] 和 COCO [36] 一样，DOTA 提供评估服务器1。

baseball ['beɪsbɔːl]：n. 棒球，棒球运动
diamond ['daɪəmənd]：n. 钻石，金刚石，菱形，方块牌 adj. 菱形的，金刚钻的
Baseball Diamond：棒球内场，棒球场
ground track field：n. 田径场
tennis court：网球场
court [kɔːt]：n. 法院，球场，朝廷，奉承 vt. 招致，向...献殷勤，设法获得 vi. 求爱
tennis ['tenɪs]：n. 网球
soccer ['sɒkə]：n. 英式足球，足球
roundabout ['raʊndəbaʊt]：n. 环岛，环状交叉路口，旋转平台，旋转木马，转椅，迂回路线 adj. 迂回的，绕道的，圆形的
helicopter ['helɪkɒptə]：n. 直升飞机vi. 乘直升飞机 vt. 由直升机运送
modest ['mɒdɪst]：adj. 谦虚的，谦逊的，适度的，端庄的，羞怯的

¹http://captain.whu.edu.cn/DOTAweb/

We use both the training and validation sets for training, the testing set for test. We do a limited data augmentation. Speciﬁcally, we resize the image at two scales (1.0 and 0.5) for training and testing. After image rescaling, we crop a series of 1024 $\times$ 1024 patches from the original images with a stride of 824. For those categories with a small number of samples, we do a rotation augmentation randomly from 4 angles (0, 90, 180, 270) to simply avoid the eﬀect of an imbalance between diﬀerent categories. With all these processes, we obtain 37373 patches, which are much less than that in the oﬃcial baseline implements (150, 342 patches) [5]). For testing experiments, the 1024 $\times$ 1024 patches are also employed. None of the other tricks is utilized except the stride for image sampling is set to 512.
我们使用训练集和验证集进行训练，测试集进行测试。我们进行有限的数据扩充。具体而言，我们以两个尺度 (1.0 和 0.5) 缩放图像大小以进行训练和测试。在图像重新缩放之后，我们从原始图像中裁剪出一系列 1024 $\times$ 1024 图像块，步幅为 824。对于具有少量样本的那些类别，我们从 4 个角度 (0, 90, 180, 270) 随机进行旋转增强，以简单地避免不同类别之间的不平衡的影响。通过所有这些过程，我们获得了 37373 个图像块，这比实际的 baseline 实现 (150, 342 图像块) [5] 中的图像块少得多。对于测试实验，还使用 1024 $\times$ 1024 的图像块。除了用于图像采样的步幅设置为 512 之外，没有使用任何其他技巧。

HRSC2016 [19]. The HRSC2016 [19] is a challenging dataset for ship detection in aerial images. The images are collected from Google Earth. It contains 1061 images and more than 20 categories of ships in various appearances. The image size ranges from 300 $\times$ 300 to 1500 $\times$ 900. The training, validation and test set include 436 images, 181 images and 444 images, respectively. For data augmentation, we only adopt the horizontal ﬂipping. And the images are resized to (512, 800), where 512 represents the length of the short side and 800 the maximum length of an image.
HRSC2016 [19]. HRSC2016 [19] 是航拍图像中船舶检测的具有挑战性的数据集。图像是从 Google Earth 收集的。它包含 1061 个图像和 20 多种不同类型的船舶。图像尺寸范围从 300 $\times$ 300 to 1500 $\times$ 900。训练、验证和测试集分别包括 436 幅图像、181 幅图像和 444 幅图像。对于数据增加，我们只采用水平翻转。并且图像被缩放大小为 (512, 800)，其中 512 表示短边的长度，800 表示图像的最大长度。

Google Earth：Google地球

4.2 Implementation details

Baseline Framework. For the experiments, we build the baseline network inspired from LightHead R-CNN [34] with backbone ResNet101 [39]. Our ﬁnal detection performance is based on the FPN [40] network, while it is not employed in the ablation experiments for simplicity.
Baseline Framework. 对于实验，我们构建了基于 LightHead R-CNN 的 baseline network [34] (backbone ResNet101 [39])。我们的最终检测性能基于 FPN [40] 网络，而为简单起见，它不用于消融实验。

simplicity [sɪm'plɪsɪtɪ]：n. 朴素，简易，天真，愚蠢
ablation [ə'bleɪʃ(ə)n]：n. 消融，切除
ablation experiment：消融实验
physiological [,fɪzɪə'lɒdʒɪkəl]：adj. 生理学的，生理的
psychology [saɪ'kɒlədʒɪ]：n. 心理学，心理状态
nervous ['nɜːvəs]：adj. 神经的，紧张不安的，强健有力的
surgical ['sɜːdʒɪk(ə)l]：adj. 外科的，手术上的 n. 外科手术，外科病房
removal [rɪ'muːv(ə)l]：n. 免职，移动，排除，搬迁
pioneer [paɪə'nɪə]：n. 先锋，拓荒者 vt. 开辟，倡导，提倡 vi. 作先驱
physiologist [,fɪzɪ'ɑlədʒɪst]：n. 生理学家，生理学者
lesion ['liːʒ(ə)n]：n. 损害，身体上的伤害，机能障碍

A basic research method of physiological psychology based on ablation, especially during the first three-quarters of the 20th century, in which an attempt is made to determine the functions of a specific region of the nervous system by examining the behavioural effects of its surgical removal. It was pioneered in 1824 by the French physiologist Marie Jean Pierre Flourens (1794-1867) and is also called a lesion experiment.
基于消融的生理心理学的基本研究方法，特别是在 20 世纪前四分之三期间，其中通过检查其手术切除的行为影响来尝试确定神经系统的特定区域的功能。它于1824年由法国生理学家 Marie Jean Pierre Flourens (1794-1867) 开创，也被称为病变实验。

Light-Head R-CNN OBB: We modified the regression of fully-connected layer on the second stage to enable it to predict OBBs, similar to work in DOTA [5]. The only difference is that we replace $x_{i}, y_{i}), i = 1, 2, 3, 4)$ with $\theta)$ for the representation of an OBB. Since there is an additional param $\theta$ , we do not double the regression loss as the original Light-Head R-CNN [34] does. The hyperparameters of large separable convolutions we set is $k = 15, C m i d = 56, C o u t = 490$ . And the OHEM [41] is not employed for sampling at the training phase. For RPN, we used 15 anchors same as original Light-Head R-CNN [34]. And the batch size of RPN [15] is set to 512. Finally, there are 6000 RoIs from RPN before Non-maximum Suppression (NMS) and 800 RoIs after using NMS. Then 512 RoIs are sampled for the training of R-CNN. The learning rate is set to 0.0005 for the first 14 epochs and then divided by 10 for the last 4 epochs. For testing, we adopt 6000 RoIs before NMS and 1000 after NMS processing.
Light-Head R-CNN OBB: 我们修改了第二阶段全连接层的回归，使其能够预测 OBB，类似于在 DOTA 中的工作 [5]。唯一的不同之处在于我们用 $\theta)$ 代替 $x_{i}, y_{i}), i = 1, 2, 3, 4)$ 来表示 OBB。由于还有一个额外的参数 $\theta$ ，我们不会像原来的 Light-Head R CNN [34] 那样使回归损失加倍。我们设置的大的可分离卷积的超参数是 $k = 15, C m i d = 56, C o u t = 490$ 。OHEM [41] 未在训练阶段用于抽样。对于 RPN，我们使用了与原始 Light-Head R-CNN 相同的 15 个 anchor [34]。并且 RPN [15] 的批量大小设置为 512。最后，在非最大抑制 (NMS) 之前有来自 RPN 的 6000 个 RoI 和在使用 NMS 之后有 800 个RoI。然后对 512 个 RoI 进行采样以用于 R-CNN 的训练。学习率在前 14 epoch 设置为 0.0005，然后在最后 4 epoch 除以 10。对于测试，我们在 NMS 之前采用 6000 个 RoI，在 NMS 处理之后采用 1000 个 RoI。

Light-Head R-CNN OBB with FPN: The Light-Head R-CNN OBB with FPN uses the FPN [40] as a backbone network. Since no source code was publicly available for Light-Head R-CNN based on FPN, our implementation details could be different. We simply added the large separable convolution on the feature of every level $P_{2}, P_{3}, P_{4}, P_{5}$ . The hyperparameters of large separable convolution we set is $k = 15, C m i d = 64, C o u t = 490$ . The batch size of RPN is set to be 512. There are 6000 RoIs from RPN before NMS and 600 RoIs after NMS processing. Then 512 RoIs are sampled for the training of R-CNN. The learning rate is set to 0.005 for the first 5 epochs and divided by a factor of 10 for the last 2 epochs.
Light-Head R-CNN OBB with FPN: 带有 FPN 的 Light-Head R-CNN OBB 使用 FPN [40] 作为骨干网络。由于基于 FPN 的 Light-Head R-CNN 没有公开的源代码，我们的实现细节可能不同。我们简单地在每个级别的特征上添加了大的可分离卷积 $P_{2}, P_{3}, P_{4}, P_{5}$ 。我们设置的大可分卷积的超参数是 $k = 15, C m i d = 64, C o u t = 490$ 。RPN 的批量大小设置为 512。在 NMS 之前有来自 RPN 的 6000 个 RoI 和在 NMS 处理之后有 600 个 RoI。然后对 512 个 RoI 进行采样以用于 R-CNN 的训练。学习率在前 5 epoch 设置为 0.005，在最后 2 epoch 除以因子 10。

4.3 Comparison with Deformable PS RoI Pooling

In order to validate that the performance is not from extra computation, we compared our performance with that of deformable PS RoI pooling, since both of them employed RoI warping operation to model the geometry variations. For experiments, we use the Light-Head R-CNN OBB as our baseline. The deformable PS RoI pooling and RoI Transformer are used to replace the PS RoI Align in the LightHead R-CNN [34].
为了验证性能不是源于额外的计算，我们与 deformable PS RoI pooling 的性能进行了比较，因为它们都使用 RoI warping 操作来模拟几何变化。对于实验，我们使用 Light-Head R-CNN OBB 作为我们的基线。deformable PS RoI pooling and RoI Transformer 用于替换 LightHead R-CNN 中的 PS RoI Align [34]。

Complexity. Both RoI Transformer and deformable RoI pooling have a light localisation network, which is a fully connected layer followed by the normal pooled feature. In our RoI Transformer, only 5 parameters $(t_{x}, t_{y}, t_{w}, t_{h}, t_{\theta})$ are learned. The deformable PS RoI pooling learns oﬀsets for each bin, where the number of parameters is 7 $\times$ 7 $\times$ 2. So our module is designed lighter than deformable PS RoI pooling. As can be seen in Tab. 4, our RoI Transformer model uses less memory (273MB compared to 273.2MB) and runs faster at the inference phase (0.17s compared to 0.206s per image). Because we use the light-head design, the memory savings are not obvious compared to deformable PS RoI pooling. However, RoI Transformer runs slower than deformable PS RoI pooling on training time (0.475s compared to 0.445s) since there is an extra matching process between the RRoIs and RGTs in training.
Complexity. RoI Transformer 和 deformable RoI pooling 都有一个轻型定位网络，它是一个全连接的层，后面是正常的池化特征。在我们的 RoI Transformer 中，只学习了 5 个参数 $(t_{x}, t_{y}, t_{w}, t_{h}, t_{\theta})$ 。deformable PS RoI pooling 学习每个 bin 的偏移量，其中参数的数量是 7 $\times$ 7 $\times$ 2。所以我们的模块设计比 deformable PS RoI pooling 更轻量。可以在 Tab. 4 中看到，我们的 RoI Transformer 模型使用更少的内存 (273MB与273.2MB) 并且在推理阶段运行得更快 (每个图像对比 0.17s 与0.206s)。因为我们使用 light-head 设计，与 deformable PS RoI pooling 相比，节省的内存并不明显。然而，由于在训练中 RRoI 和 RGT 之间存在额外的匹配过程，因此 RoI Transformer 在训练时 (0.475s 与 0.445s) 的运行速度比 deformable PS RoI pooling 更慢。

localization [,lokəlɪ'zeʃən]：n. 本土化，定位
normal ['nɔːm(ə)l]：adj. 正常的，正规的，标准的 n. 正常，标准，常态，法线

Detection Accuracy. The comparison results are shown in Tab. 4. The deformable PS RoI pooling outperforms the Light-Head R-CNN OBB Baseline by 5.6 percents. While there is only 1.4 points improvement for R-FCN [24] on Pascal VOC [35] as pointed out in [23]. It shows that the geometry modeling is more important for object detection in aerial images. But the deformable PS RoI pooling is much lower than our RoI Transformer by 3.85 points. We argue that there are two reasons: 1) Our RoI Transformer can better model the geometry variations in aerial images. 2) The regression targets of deformable PS RoI pooling are still relative to the HRoI rather than using the boundary of the oﬀsets. Our regression targets are relative to the RRoI, which gives a better initialization for regression. The visualization of some detection results based on Light-Head R-CNN OBB Baseline, Deformable Position Sensitive RoI pooling and RoI Transformer are shown in Fig. 7, Fig. 8 and Fig. 9, respectively. The results in Fig. 7 and the ﬁrst column of Fig. 8 are taken from the same large image. It shows that RoI Transformer can precisely locate the instances in scenes with densely packed ones. And the Light-Head R-CNN OBB baseline and the deformable RoI pooling show worse accuracy performance on the localization of instances. It is worth noting that the head of truck is misclassiﬁed to be small vehicle (the blue bounding box) for the three methods as shown in Fig. 7 and Fig. 8. While our proposed RoI Transformer has the least number of misclassiﬁed instances. The second column in Fig 8 is a complex scene containing long and thin instances, where both Light-Head R-CNN OBB baseline and deformable PS RoI pooling generate many False Negatives. And these False Negatives are hard to be suppressed by NMS due to the reason as explained in Fig. 5(b). Beneﬁting from the consistency between region feature and instance, the detection results based on RoI Transformer generate much fewer False Negatives.
Detection Accuracy. 比较结果显示在 Tab. 4。deformable PS RoI pooling 优于 Light-Head R-CNN OBB 基线 5.6%。虽然如 [23] 所指出的那样，在 Pascal VOC [35] 上 R-FCN [24] 只有 1.4 个点的改善。它表明几何建模对于航拍图像中的物体检测更为重要。但 deformable PS RoI pooling 比我们的 RoI Transformer 低 3.85 个点。我们认为有两个原因：1) 我们的 RoI Transformer 可以更好地模拟航拍图像中的几何变化。2) deformable PS RoI pooling的回归目标仍然相对于 HRoI 而不是使用偏移量的边界。我们的回归目标是相对于 RRoI，它为回归提供了更好的初始化。The visualization of some detection results based on Light-Head R-CNN OBB Baseline, Deformable Position Sensitive RoI pooling and RoI Transformer are shown in Fig. 7, Fig. 8 and Fig. 9, respectively. 图 7 中的结果和图 8 的第一列取自相同的大尺度图像。它表明 RoI Transformer 可以精确地定位具有密集分别的场景中的实例。并且 Light-Head R-CNN OBB baseline and the deformable RoI pooling 在实例的定位上表现出更差的准确性。值得注意的是，卡车的头部被三种方法错误分类为的小型车辆 (蓝色边界框)，如图 7 和图 8 所示。我们提出的 RoI Transformer 具有最少数量的错误分类实例。图 8 中的第二列是包含长和窄实例的复杂场景，其中 Light-Head R-CNN OBB baseline and deformable PS RoI pooling 都产生许多 False Negative。由于如图 5 (b) 所示的原因，这些 False Negative 难以被 NMS 抑制。受益于区域特征和实例之间的一致性，基于 RoI Transformer 的检测结果产生的 False Negative 更少。

Figure 6: Visualization of detection results from RoI Transformer in DOTA.

Figure 7: Visualization of detection on the scene where many densely packed instances exist. We select the predicted bounding boxes with scores above 0.1, and a NMS with threshold 0.1 is applied for duplicate removal.
图 7：存在许多密集排列实例的场景检测结果可视化。我们选择具有高于 0.1 的分数的预测边界框，并且应用具有阈值 0.1 的 NMS 用于重复去除。

Figure 8: Visualization of detection results in DOTA. The first row shows the results from RoT Transformer. The second ros shows the results from Light-Head R-CNN OBB baseline. The last row shows the results from deformable PS RoI pooling. In the visualization, We select the predicted bounding boxes with scores above 0.1, and a NMS with threshold 0.1 is applied for duplicate removal.
图 8：DOTA 中检测结果的可视化。第一行显示 RoT Transformer 的结果。第二行显示了 Light-Head R-CNN OBB baseline的结果。最后一行显示了 deformable PS RoI pooling 的结果。在可视化中，我们选择具有高于 0.1 的分数的预测边界框，并且应用具有阈值 0.1 的 NMS 用于重复去除。

Figure 9: Visualization of detection results in DOTA. The first row shows the results from RoT Transformer. The second ros shows the results from Light-Head R-CNN OBB baseline. The last row shows the results from deformable PS RoI pooling. In the visualization, We select the predicted bounding boxes with scores above 0.1, and a NMS with threshold 0.1 is applied for duplicate removal.
图 9：DOTA 中检测结果的可视化。第一行显示 RoT Transformer 的结果。第二行显示了 Light-Head R-CNN OBB baseline 的结果。最后一行 deformable PS RoI pooling 的结果。在可视化中，我们选择具有高于 0.1 的分数的预测边界框，并且应用具有阈值 0.1 的 NMS 用于重复去除。

Table 1: Results of ablation studies. We used the Light-Head R-CNN OBB detector as our baseline. The leftmost column represents the optional settings for the RoI Transformer. In the right four experiments, we explored the appropriate setting for RoI Transformer.
表 1：消融研究的结果。我们使用 Light-Head R-CNN OBB 检测器作为基线。最左边的列表示 RoI Transformer 的可选设置。在右边的四个实验中，我们探索了 RoI Transformer 的适当设置。

leftmost ['lɛftmost]：adj. 最左边的
enlarge [ɪn'lɑːdʒ; en-]：vi. 扩大，放大，详述 vt. 扩大，使增大，扩展

Table 2: Comparisons with the state-of-the-art methods on HRSC2016.
表 2：与 HRSC2016 上最先进的方法进行比较。

Table 3: Comparisons with state-of-the-art detectors on DOTA [5]. The short names for each category can be found in Section 4.1. The FR-O indicates the Faster R-CNN OBB detector, which is the official baseline provided by DOTA [5]. The RRPN indicates the Rotation Region Proposal Networks, which used a design of rotated anchor. The R2CNN means Rotational Region CNN, which is a HRoI-based method without using the RRoI warping operation. The RDFPN means the Rotation Dense Feature Pyramid Netowrks. It also used a design of Rotated anchors, and used a variation of FPN. The work in Yang et al. [38] is an extension of R-DFPN.
表 3：在 DOTA 上与最先进的检测器进行比较 [5]。每个类别的简称可以在 4.1 节中找到。FR-O 表示 Faster R-CNN OBB 检测器，它是 DOTA [5] 提供的有效基线。RRPN 表示 Rotation Region Proposal Networks，其使用旋转 anchor 的设计。R2CNN 表示 Rotational Region CNN，其是不使用 RRoI 变形操作的基于 HRoI 的方法。RDFPN 表示 Rotation Dense Feature Pyramid Netowrks。它还使用了旋转锚的设计，并使用了 FPN 的变体。Yang et al. [38] 的工作是 R-DFPN 的扩展。

Table 4: Comparison of our RoI Transformer with deformable PS RoI pooling and Light-Head R-CNN OBB on accuracy, speed and memory. All the speed are tested on images with size of 1024 $\times$ 1024 on a single TITAN X (Pascal). The time of post process (i.e.NMS) was not included. The LR-O, DPSRP and RT denote the Light-Head R-CNN OBB, deformable Position Sensitive RoI pooling and RoI Transformer, respectively.
表 4：我们的带有 deformable PS RoI pooling 的 RoI Transformer 和 Light-Head R-CNN OBB 在精度、速度和存储方面的比较。所有的速度测试都是在单个 TITAN X (Pascal) 上测试大小为 1024 $\times$ 1024 的图像上执行的。不包括后处理时间 (即NMS)。The LR-O, DPSRP and RT denote the Light-Head R-CNN OBB, deformable Position Sensitive RoI pooling and RoI Transformer, respectively.

4.4 Ablation Studies

We conduct a serial of ablation experiments on DOTA to analyze the accuracy of our proposed RoI Transformer. We use the Light-Head R-CNN OBB as our baseline. Then gradually change the settings. When simply add the RoI Transformer, there is a 4.87 point improvement in mAP. The other settings are discussed in the following.
我们在 DOTA 上进行了一系列消融实验，以分析我们提出的 RoI Transformer 的准确性。我们使用 Light-Head R-CNN OBB 作为基线，然后逐渐更改设置。当简单地添加 RoI Transformer 时，mAP 有 4.87 点的改进。其他设置将在下面讨论。

Light RRoI Learner. In order to guarantee the eﬃciency, we directly apply a fully connected layer with output dimension of 5 on the pooled features from the HRoI warping. As a comparison, we also tried more fully connected layers for the RRoI learner, as shown at the ﬁrst and second columns in Tab. 1. We ﬁnd there is little drop (0.22 point) on mAP when we add on more fully connected layer with output dimension of 2048 for the RRoI leaner. The little accuracy degradation should be due to the fact that the additional fully connected layer with higher dimensionality requires a longer time for convergence.
Light RRoI Learner. 为了保证效率，我们直接在 HRoI 变形的池化特征上应用输出维数为 5 的全连接层。作为比较，我们还为 RRoI 学习器尝试了更多全连接的层，如 Tab. 1 中的第一列和第二列所示。我们发现当我们为 RRoI 学习器添加更加全连接的层，输出维数为 2048 时，mAP 上的下降很少 (0.22 点)。精度降低很小应归因于具有较高维度的附加全连接层需要较长的收敛时间。

Contextual RRoI. As pointed in [9, 42], appropriate enlargement of the RoI will promote the performance. A horizontal RoI may contain much background while a precisely RRoI hardly contains redundant background as explained in the Fig. 10. Complete abandon of contextual information will make it diﬃcult to classify and locate the instance even for the human. Therefore, it is necessary to enlarge the region of the feature with an appropriate degree. Here, we enlarge the long side of RRoI by a factor of 1.2 and the short side by 1.4. The enlargement of RRoI improves AP by 2.86 points, as shown in Tab. 1.
Contextual RRoI. 正如 [9, 42] 所指出的，适当扩大 RoI 将提升性能。水平 RoI 可以包含很多背景，而精确的 RRoI 几乎不包含冗余背景，如图 10 所示。完全放弃上下文信息将使得甚至人类分类和定位实例变得困难。因此，需要以适当的程度扩大特征的区域。在这里，我们将 RRoI 的长边扩大 1.2 倍，将短边扩大 1.4。RRoI 的扩大使 AP 提高了 2.86 点，如 Tab. 1 所示。

contextual [kɒn'tekstjʊəl]：adj. 上下文的，前后关系的
degradation [,degrə'deɪʃ(ə)n]：n. 退化，降格，降级，堕落
convergence [kən'vɜːdʒəns]：n. 收敛，会聚，集合
enlargement [ɪn'lɑːdʒm(ə)nt; en-]：n. 放大，放大的照片，增补物

NMS on RRoIs. Since the obtained RoIs are rotated, there is ﬂexibility for us to decide whether to conduct another NMS on the RRoIs transformed from the HRoIs. This comparison is shown in the last two columns of Tab. 1. We ﬁnd there is ~ 1.5 points improvement in mAP if we remove the NMS. This is reasonable because there are more RoIs without additional NMS, which could increase the recall.
NMS on RRoIs. 由于获得的 RoI 是旋转的，因此我们可以灵活地决定是否在从 HRoI 转换的 RRoI 上进行另一个 NMS。此比较显示在 Tab. 1 的最后两列中。如果我们删除 NMS，我们发现 mAP 有大约 1.5 分的改进。这是合理的，因为有更多的 RoI 没有额外的 NMS，这可能会增加召回率。

duplicate [ˈdjuːplɪkeɪt]：vt. 复制，使加倍 n. 副本，复制品 adj. 复制的，二重的 vi. 复制，重复
removal [rɪ'muːv(ə)l]：n. 免职，移动，排除，搬迁

Figure 10: Comparison of 3 kinds of region for feature extraction. (a) The Horizontal Region. (b) The rectified Region after RRoI Warping. (c) The rectified Region with appropriate context after RRoI warping.

4.5 Comparisons with the State-of-the-art

We compared the performance of our proposed RoI Transformer with the state-of-the-art algorithms on two datasets DOTA [5] and HRSC2016 [19]. The settings are described in Sec. 4.2, and we just replace the Position Sensitive RoI Align with our proposed RoI Transformer. Our baseline and RoI Transformer results are obtained without using ohem [41] at the training phase.
我们将我们提出的 RoI Transformer 的性能与两个数据集 DOTA [5] 和 HRSC2016 [19] 上的最新算法进行了比较。The settings are described in Sec. 4.2，我们只需用我们提出的 RoI Transformer 替换 Position Sensitive RoI Align。我们的 baseline and RoI Transformer 结果是在训练阶段不使用 ohem [41] 获得的。

Results on DOTA. We compared our results with the state-of-the-arts in DOTA. Note the RRPN [9] and R2CNN [26] are originally used for text scene detection. The results are a re-implemented version for DOTA by a third-party². As can be seen in Tab. 3, our RoI Transformer achieved the mAP of 67.74 for DOTA , it outperforms the previous the state-of-the-art without FPN (61.01) by 6.71 points. And it even outperforms the previous FPN based method by 5.45 points. With FPN, the Light-Head OBB Baseline achieved mAP of 66.95, which outperforms the previous state-of-the-art detectors, but still slightly lower than RoI Transformer. When RoI Transformer is added on Light-Head OBB FPN Baseline, it gets improvement by 2.6 points in mAP reaching the peak at 69.56. This indicates that the proposed RoI Transformer can be easily embedded in other frameworks and signiﬁcantly improve the detection performance. Besides, there is a signiﬁcant improvement in densely packed small instances. (e.g. the small vehicles, large vehicles, and ships). For example, the detection performance for the ship category gains an improvement of 26.34 points compared to the previous best result (57.25) achieved by R2CNN [26]. Some qualitative results of RoI Transformer on DOTA are given in Fig 6.
Results on DOTA. 我们将结果与 DOTA 中的最新技术进行了比较。注意 RRPN [9] 和 R2CNN [26] 最初用于文本场景检测。结果是第三方为 DOTA 重新实现的版本。As can be seen in Tab. 3, our RoI Transformer achieved the mAP of 67.74 for DOTA , it outperforms the previous the state-of-the-art without FPN (61.01) by 6.71 points. 它甚至比先前的基于 FPN 的方法高 5.45 个点。凭借 FPN，Light-Head OBB Baseline 的 mAP 达到 66.95，优于之前最先进的探测器，但仍略低于 RoI Transformer。当在 Light-Head OBB FPN Baseline 上添加 RoI Transformer 时，mAP 的改善达到 2.6 点，达到峰值 69.56。这表明所提出的 RoI Transformer 可以很容易地嵌入到其他框架中，并显著提高检测性能。此外，在密集的小型实例中有显著的改进 (例如小型车辆、大型车辆和船舶)。例如，与 R2CNN [26] 取得的先前最佳结果 (57.25) 相比，船舶类别的检测性能提高了 26.34 点。图 6 给出了 RoI Transformer 在 DOTA 上的一些定性结果。

²https://github.com/DetectionTeamUCAS/RRPN_Faster-RCNN_Tensorflow

Results on HRSC2016. The HRSC2016 contains a lot of thin and long ship instances with arbitrary orientation. We use 4 scales ${64^{2}, 128^{2}, 256^{2}, 512^{2}\}$ and 5 aspect ratios ${1/3, 1/2, 1, 2, 3\}$ , yielding $k = 20$ anchors for RPN initialization. This is because there is more aspect ratio variations in HRSC, but relatively fewer scale changes. The other settings are the same as those in 4.2. We conduct the experiments without FPN which still achieves the best performance on mAP. Speciﬁcally, based on our proposed method, the mAP can reach 86.16, 1.86 higher than that of RRD [37]. Note that the RRD is designed using SSD [43] for oriented object detection, which utilizes multi-layers for feature extraction with 13 diﬀerent aspect ratios of boxes ${1, 2, 3, 5, 7, 9, 15, 1/2, 1/3, 1/5, 1/7, 1/9, 1/15\}$ . While our proposed framework just employs the ﬁnal output features with only 5 aspect ratios of boxes. In Fig. 11, we visualize some detection results in HRSC2016. The orientation of the ship is evenly distributed over $2\pi$ . In the last row, there are closely arranged ships, which are diﬃcult to distinguish by horizontal rectangles. While our proposed RoI Transformer can handle the above mentioned problems eﬀectively. The detected incomplete ship in the third picture of the last row proves the strong stability of our proposed RoI Transformer detection method.
Results on HRSC2016. HRSC2016 包含许多具有任意方向的窄和长的船实例。我们使用 4 个尺度 ${64^{2}, 128^{2}, 256^{2}, 512^{2}\}$ 和 5 个宽高比 ${1/3, 1/2, 1, 2, 3\}$ ，产生用于 RPN 初始化的 $k = 20$ 个 anchor。这是因为 HRSC 中的宽高比变化更多，但尺度变化相对较少。其他设置与 4.2 中的设置相同。我们在没有 FPN 的情况下进行实验，这仍然在 mAP 上实现了最佳性能。具体而言，基于我们提出的方法，mAP 可以达到 86.16，比 RRD 高 1.86 [37]。注意，RRD 是使用 SSD [43] 设计的，用于有向物体检测，它利用多层进行特征提取，具有 13 个不同的 anchor 宽高比 ${1, 2, 3, 5, 7, 9, 15, 1/2, 1/3, 1/5, 1/7, 1/9, 1/15\}$ 。相比我们提出的框架只采用最终输出的特征，只有 5 个宽高比的 anchor。在图 11 中，我们在 HRSC2016 中可视化一些检测结果。船的方向均匀分布在 $2\pi$ 角度上。在最后一排，有紧密排列的船只，难以使用水平矩形区分。虽然我们提出的 RoI Transformer 可以有效地处理上述问题。在最后一行的第三张图片中检测到的不完整船舶证明了我们提出的 RoI Transformer 检测方法的强稳定性。

Figure 11: Visualization of detection results from RoI Transformer in HRSC2016. We select the predicted bounding boxes with scores above 0.1, and a NMS with threshold 0.1 is applied for duplicate removal.
图 11：HRSC2016 中 RoI Transformer 检测结果的可视化。我们选择分数大于 0.1 的预测边界框，并应用具有阈值 0.1 的 NMS 进行重复移除。

5 Conclusion

In this paper, we proposed a module called RoI Transformer to model the geometry transformation and solve the problem of misalignment between region feature and objects. The design brings signiﬁcant improvements for oriented object detection on the challenging DOTA and HRSC with negligible computation cost increase. While the deformable module is a well-designed structure to model the geometry transformation, which is widely used for oriented object detection. The comprehensive comparisons with deformable RoI pooling solidly veriﬁed that our model is more reasonable when oriented bounding box annotations are available. So, it can be inferred that our module can be an optional substitution of deformable RoI pooling for oriented object detection.
在本文中，我们提出了一个名为 RoI Transformer 的模块来模拟几何变换并解决区域特征和物体之间的不重和问题。该设计为具有挑战性的 DOTA 和 HRSC 的有向物体检测带来了显著的改进，计算成本增加可以忽略不计。可变形模块是一种精心设计的结构，用于模拟几何变换，其广泛用于有向物体检测。与 deformable RoI pooling 的综合比较可以确保我们的模型在有向边界框标注可用时更合理。因此，可以推断出我们的模块可以是 deformable RoI pooling 的可选替代，用于有向物体检测。

substitution [sʌbstɪ'tjuːʃn]：n. 代替，置换，代替物
aspect ratio：宽高比
removal [rɪ'muːv(ə)l]：n. 免职，移动，排除，搬迁
stability [stə'bɪlɪtɪ]：n. 稳定性，坚定，恒心

References

[5] DOTA: A Large-scale Dataset for Object Detection in Aerial Images
[8] Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds
[9] Arbitrary-Oriented Scene Text Detection via Rotation Proposals
[10] Rotated region based CNN for ship detection
[12] Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery
[13] Rich feature hierarchies for accurate object detection and semantic segmentation
[15] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
[19] Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds
[21] Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks
[22] Spatial Transformer Networks
[23] Deformable Convolutional Networks
[24] R-FCN: Object Detection via Region-based Fully Convolutional Networks
[26] R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
[27] Automatic Ship Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Multi-Scale Rotation Dense Feature Pyramid Networks
[28] Learning a Rotation Invariant Detector with Rotatable Bounding Box
[34] Light-Head R-CNN: In Defense of Two-Stage Object Detector
[37] Rotation-Sensitive Regression for Oriented Scene Text Detection

WORDBOOK

University of Chinese Academy of Sciences：中国科学院大学，国科大

KEY POINTS

DetectionTeamUCAS
https://github.com/DetectionTeamUCAS

Rotation-sensitive Regression for Oriented Scene Text Detection
https://github.com/MhLiao/RRD

Learning RoI Transformer for Detecting Oriented Objects in Aerial Images
https://github.com/dingjiansw101/RoITransformer_DOTA

DOTA: A Large-scale Dataset for Object Detection in Aerial Images
https://github.com/dingjiansw101/Faster_RCNN_for_DOTA

TextBoxes++: A Single-Shot Oriented Scene Text Detector
https://github.com/MhLiao/TextBoxes_plusplus

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
https://github.com/DetectionTeamUCAS/R2CNN_Faster-RCNN_Tensorflow

你可能感兴趣的:(object,detection,-,目标检测)

mot数据集_MOT数据集转化成VOC格式脚本(mot2voc) 飞啦不休 mot数据集
使用mmdetection检测框架进行相关的训练，由于MOT的数据集标注格式和检测常用的VOC以及COCO格式有很大的差距，因此用于检测任务的时候，需要将mot格式的数据集转化成VOC格式用于检测任务的训练，评估。HUST小菜鸡：将MOT17-Det数据集转成VOC格式zhuanlan.zhihu.com之前我写过一篇将MOT17转化成VOC格式的文章，但是该方法是一个分布的步骤，而且在实际操作过
番外篇 | SEAM-YOLO：引入SEAM系列注意力机制，提升遮挡小目标的检测性能小哥谈 YOLOv8：从入门到实战 YOLO 深度学习人工智能机器学习计算机视觉神经网络
前言：Hello大家好，我是小哥谈。SEAM(Squeeze-and-ExcitationAttentionModule)系列注意力机制是一种高效的特征增强方法，特别适合处理遮挡和小目标检测问题。该机制通过建模通道间关系来自适应地重新校准通道特征响应。在遮挡小目标检测中的应用优势包括：1）通道注意力增强：SEAM通过全局平均池化捕获通道级全局信息，帮助网络在遮挡情况下仍能关注关键特征。2）多尺度特
目标追踪数据标注 sethrsinine 目标跟踪
在将YOLO（目标检测）和DeepSORT（目标追踪）结合时，数据标注需要同时满足检测和追踪的需求。以下是具体的分阶段标注策略和操作指南：一、标注的核心要求检测标注：每帧中目标的边界框（BoundingBox）和类别标签（如行人、车辆）。追踪标注：跨帧的目标ID（TrackID），确保同一目标在不同帧中ID一致。二、分阶段标注流程阶段1：视频预处理•目标：将视频转换为可标注的帧序列。•操作：使用F
PYTHON从入门到实践9-类和实例
#【1】面向对象编程classStudent(object):#可以帮属性值绑定到对象上，self相当于JAVA的thisdef__init__(self,name,age):self.name=nameself.age=agedefspeak(self):print(self.name,'说：老师好')if__name__=='__main__':new_student1=Student('球球
【目标检测】YOLOv13：超图增强的实时目标检测新标杆，值得收藏。 Carl_奕然机器视觉与目标检测目标检测 YOLO 人工智能
一文掌握YOLOv13最新特性1、引言2、Yolov13详细讲解2.1发布时间与背景2.2相对于YOLOv12的核心提升2.2.1精度显著提升2.2.2轻量化与效率优化2.2.3高阶语义建模能力2.3架构设计与核心创新2.3.1超图自适应关联增强（HyperACE）2.3.2全流程聚合-分发（FullPAD）2.3.3轻量化模块设计2.4性能对比2.4代码示例2.4.1环境配置2.4.2训练代码2
RDK X5/X3 yolov5目标检测从环境搭建到设备集成激萌の小宅 YOLO YOLO 目标检测人工智能
1、RDKX5yolov5目标检测之训练环境搭建2、RDKX5yolov5目标检测之pt转onnx3、RDKX5yolov5目标检测之开发机环境部署4、RDKX5yolov5目标检测之onnx转bin5、RDKX5yolov5目标检测之开发板运行
口罩检测数据集-1591张图片疫情防控管理智能门禁系统公共场所安全监控 cver123 数据集目标跟踪人工智能计算机视觉目标检测 pytorch
口罩检测数据集-1591张图片已发布目标检测数据集合集（持续更新）口罩检测数据集介绍数据集概览包含类别应用场景数据样本展示文件结构与使用建议使用建议技术标签YOLOv8训练实战1.环境配置安装YOLOv8官方库ultralytics2.数据准备2.1数据标注格式（YOLO）2.2文件结构示例2.3创建data.yaml配置文件3.模型训练关键参数补充说明：4.模型验证与测试4.1验证模型性能关键参
Delphi编程深度详解教程 Paula-柒月拾
本文还有配套的精品资源，点击获取简介：《Delphi详细教程》是一个全面介绍Delphi编程的资源包，涵盖了Delphi开发环境和ObjectPascal编程语言的深入学习。教程内容包括Delphi体系结构、核心类库、集合与RTTI、接口、抽象类、定制组件开发、界面设计、数据控件使用、SQL程序设计以及创建WindowsNT服务等关键知识点，旨在帮助学习者深入理解和掌握Delphi编程，并应用于实
基于YOLOv5的监控摄像头遮挡检测系统：从数据集到UI界面的完整实现芯作者 D2:YOLO YOLO 神经网络
实时守护监控设备安全，智能识别遮挡攻击的AI解决方案一、问题背景与系统价值在安防监控领域，摄像头遮挡是常见的恶意攻击手段——统计显示35%的安防失效源于摄像头被遮挡。传统方案依赖人工巡查，效率低下且响应延迟。本文将带你构建完整的AI遮挡检测系统，核心创新点：双模检测机制：YOLOv5目标检测+背景建模异常分析轻量化部署：模型量化压缩至1.8MB动态学习：运行时自动更新异常样本库二、系统架构设计[视
HCCDA – AI华为云人工智能开发者认证-60道单选题题库及答案_华为人工智能入门级开发者认证题库 2401_89172925 人工智能华为云华为
单选题及答案AI模型的评测指标主要分为精度指标和性能指标，以下哪一项不属于常用的性能指标？A.FPS(FramesPerSecond)B.FLOPs(Floating-pointOperationsPerSecond)C.aPs（QueryPerSecond）D．F1值Mask＿Detection技能模板提供了口罩检测技能，针对每个人，若没有检测到人脸，也没有检测到口罩，则会显示什么信息？A.No
Python编程：使用 YOLO 目标检测倔强老吕 python 开发语言
YOLO（YouOnlyLookOnce）是一种基于深度学习的实时目标检测算法，由JosephRedmon等人于2016年首次提出。与传统的两阶段目标检测方法（如R-CNN系列）不同，YOLO将目标检测任务视为一个单一的回归问题，直接在图像上进行一次推理即可预测边界框和类别概率。YOLO的核心思想单次前向传播（SingleShotDetection）：YOLO只需对输入图像进行一次神经网络推理，就
基于YOLOv8和Faster R-CNN的输电线路异物目标检测项目检测输电线异物数据集输电线缺陷数据集绝缘子如何使用YOLOv8和Faster R-CNN训练输电线路异物目标检测数据集 QQ67658008 YOLO r语言 cnn 输电线路绝缘子线路异物目标检测
电力篇-输电线路缺陷数据集输电线路异物目标检测数据集16000张5种检测目标：‘burst’-爆裂‘defect’-缺陷‘foreign_obj’-异物‘insulator’-绝缘体‘nest’-窝（巢）带标注-YOLO格式可直接用于YOLO系列目标检测算法模型训练如何使用YOLOv8和FasterR-CNN训练输电线路异物目标检测数据集的详细步骤和代码。假设数据集包含16000张图片和5种检测目
通过网络api获取日期对应的节假日信息白衫长发时光与她 QT 网络 qt
网络接口获取链接：免费节假日API_原百度节假日APIHolidayJudge.h#pragmaonce#include#include"ui_HolidayJudge.h"enumDATESTATE{WORK=0,//工作日DAYOFF,//休息日HOLIDAY//节假日};classHolidayJudge:publicQWidget{Q_OBJECTpublic:HolidayJudge(Q
java 学习底层代码算法好学且牛逼的马 java
#33写算法题黑马的视频争取简单的过一遍要考试啦密码的写底层代码秘密的底层代码有点长啊看不懂难找了几个视频课看看吧想看中文版jdkapi吧算了慢慢看先把几个顶级父类给看会了objectsystemstringstringbuilder算法单路递归packagecom.itheima.Recursion;publicclasssingleRecursion{ publicstaticvoidma
Django ORM 1. 创建模型（Model）博观而约取 Python django 数据库 python
1.ORM介绍什么是ORM？ORM，全称Object-RelationalMapping（对象关系映射），一种通过对象操作数据库的技术。它的核心思想是：我们不直接写SQL，而是用Python对象（类/实例）来操作数据库表和记录。ORM就像一个“翻译官”，帮我们把Python代码翻译成数据库能听懂的SQL命令。为什么使用ORM?Django中的ORM提供了一个高层次、抽象化的接口来操作数据库，它的优
实体，dto，vo三种pojo的区别和联系不爱吃大饼 java
在软件开发，特别是Java应用程序中，实体（Entity）、数据传输对象（DTO，DataTransferObject）和视图对象（VO，ViewObject）是三种常见的对象类型。它们各自有不同的责任和用途。下面是对它们的定义、区别和联系的详细解释。1.实体（Entity）定义：实体是与数据库表直接对应的对象，通常用于持久化层。它映射到数据库中的一行记录，每个实体对象的属性对应数据库表中的字段。
浅谈Qt和C++的关系 Terrarily qt5 qt c++
Qt和C++Qt是QML和JavaScript的C++扩展功能工具包，并且Qt是由C++开发的，所以C++贯穿了整个Qt的项目。我会着重从c++的角度来介绍Qt。从C++的角度分析Qt，然后你会发现Qt通过内省数据的机制实现了许多现代语言的特性。这个是通过Qt的基础类QObject来实现的。Qt使用源对象信息实现了信号和槽的回调绑定。每个信号都能绑定任意数量的槽函数或者其他的信号。当一个信号弄一个
DAO模式红中马喽 java 数据库开发语言笔记学习后端设计模式
前言DAO（DataAccessObject）模式是一种常用的设计模式，主要用于将数据访问逻辑与业务逻辑分离。它提供了一种抽象层，使得应用程序可以与不同的数据源（如数据库、文件系统等）进行交互，而无需了解底层数据存储的细节。DAO模式的核心思想是将数据访问操作封装在独立的类中，从而提高代码的可维护性、可扩展性和可重用性。如何使用DAO模式1.首先导入这个包（有需要的可以私聊我）然后添加配置文件，为
Domain 层完全指南（面向 iOS 开发者）依旧风轻 App Architecture SQI iOS Domain Entity
目录为什么需要Domain层清晰的三层架构核心概念：Entity/ValueObject/UseCase/RepositorySwift代码实战测试策略在旧项目中落地的步骤结语1为什么需要Domain层在传统MVC/MVVM中，我们往往把业务规则写进ViewController或ViewModel。问题随规模放大而爆发：痛点具体表现可测试性差单元测试必须启动UIKit，跑真机或模拟器业务难复用同样
vue动态页面快照截图 html2canvas 懒大王、 vue.js javascript 前端
安装依赖npminstallhtml2canvas新建组件SnapshotPage.vueimporthtml2canvasfrom"html2canvas";exportdefault{name:"SnapshotPage",props:{//你可以通过props传递动态内容数据//data:Object},mounted(){this.$nextTick(()=>{this.capture()
C++中对象传参的几种方式递归书房 c++
在C++中传递对象作为函数参数有多种方式，每种方式都有不同的语义、性能特点和适用场景。以下是全面的分析和最佳实践指南：1.按值传递(PassbyValue)voidprocessObject(MyClassobj){//操作obj的副本}MyClassoriginal;processObject(original);//复制构造新对象特点：创建对象的完整副本函数内修改不影响原始对象调用时发生复制构
道路交通标志检测数据集-智能地图与导航交通监控与执法智慧城市交通管理-2,000 张图像 cver123 数据集智慧城市人工智能目标跟踪计算机视觉目标检测
道路交通标志检测数据集已发布目标检测数据集合集（持续更新）道路交通标志检测数据集介绍数据集概览包含类别应用场景数据样本展示YOLOv8训练实战1.环境配置安装YOLOv8官方库ultralytics2.数据准备2.1数据标注格式（YOLO）2.2文件结构示例2.3创建data.yaml配置文件3.模型训练关键参数补充说明：4.模型验证与测试4.1验证模型性能关键参数详解常用可选参数典型输出指标4.
用 C++ 获取显示器信息：深入 WMI 与 COM 接口
在Windows系统中，获取显示器信息（如制造商、序列号和产品代码）是一项常见任务。本文将展示如何使用C++通过WindowsManagementInstrumentation(WMI)和ComponentObjectModel(COM)接口实现这一功能。我们将以WmiMonitorID类为例，逐步构建一个健壮的程序，并分享实现过程中的关键注意事项。背景显示器信息通常存储在硬件的EDID(Exte
Excel VBA属性、方法、事件大全——Part13（Complete List of Excel VBA attribute/method and event）预见未来to50
对象/属性/方法/事件（Object/Attribute/Method/Event）描述（Description）Save保存指定工作簿所做的更改本示例保存当前活动工作簿。ActiveWorkbook.Save本示例保存所有打开的工作簿，然后关闭MicrosoftExcel。ForEachwInApplication.Workbooksw.SaveNextwApplication.QuitSave
钉钉小程序摸索二：钉钉小程序开发过程中错误解决过程
钉钉小程序开过程中作为小白，很容易遇上各种问题，今天我就以自己开发过程的遇到的问题总结下解决过程或者思路，有小白的同学可以做下参考，发布文章不易，请点赞一下鼓励下，谢谢。目录：TypeError:my.requestisnotafunctionatObject.onSubmit1、钉钉开发过程中接口请求返回TypeError:my.requestisnotafunctionatObject.onS
Densenet模型花卉图像分类深度学习乐园分类数据挖掘人工智能
项目源码获取方式见文章末尾！600多个深度学习项目资料，快来加入社群一起学习吧。《------往期经典推荐------》项目名称1.【基于CNN-RNN的影像报告生成】2.【卫星图像道路检测DeepLabV3Plus模型】3.【GAN模型实现二次元头像生成】4.【CNN模型实现mnist手写数字识别】5.【fasterRCNN模型实现飞机类目标检测】6.【CNN-LSTM住宅用电量预测】7.【VG
【.net core】【sqlsugar】在where条件查询时使用原生SQL MoFe1 .netcore sql 数据库
//初始化查询varquery=repository.IQueryable();//添加原生SQLWHERE条件query=query.Where("fieldAWhere(stringwhereString,objectparameters=null);
Objective-C面向对象编程：类、对象、方法详解（保姆级教程）帅次 iOS Obj-C objective-c ios iphone safari swift macos flutter
目录一、核心概念二、类的定义（分.h和.m文件）1.头文件（.h）——公开声明2.实现文件（.m）——具体实现3.属性特性解析原子性所有权语义(ARC环境下)读写控制三、对象创建与内存管理1.创建对象的两种方式2.关键步骤解析3.instancetype四、方法调用（消息传递机制）1.基本语法2.关键概念五、self与super关键字六、动手实践：完整工作流1.创建Person对象并调用方法2.项
ref() 与 reactive() 前端岳大宝前端框架Vue javascript 前端 vue.js
下面，我们来系统的梳理关于ref()与reactive()的基本知识点：一、响应式编程核心概念1.1什么是响应式编程？响应式编程是一种声明式编程范式，它使数据变化能够自动传播到依赖它的代码部分。在Vue中，响应式系统实现了：数据驱动视图：数据变化自动更新DOM依赖追踪：自动跟踪数据依赖关系高效更新：最小化不必要的DOM操作1.2Vue响应式系统演进版本响应式实现特点Vue2Object.defin
ResNet（Residual Network）不想秃头的程序神经网络语音识别人工智能深度学习网络残差网络神经网络
ResNet（ResidualNetwork）是深度学习中一种经典的卷积神经网络（CNN）架构，由微软研究院的KaimingHe等人在2015年提出。它通过引入残差连接（SkipConnection）解决了深度神经网络中的梯度消失问题，使得网络可以训练极深的模型（如上百层），并在图像分类、目标检测、语义分割等任务中取得了突破性成果。以下是ResNet的详细介绍：一、核心思想ResNet的核心创新是
继之前的线程循环加到窗口中运行 3213213333332132 java thread JFrame JPanel
之前写了有关java线程的循环执行和结束，因为想制作成exe文件，想把执行的效果加到窗口上，所以就结合了JFrame和JPanel写了这个程序，这里直接贴出代码，在窗口上运行的效果下面有附图。 package thread; import java.awt.Graphics; import java.text.SimpleDateFormat; import java.util
linux 常用命令 BlueSkator linux 命令
1.grep 相信这个命令可以说是大家最常用的命令之一了。尤其是查询生产环境的日志，这个命令绝对是必不可少的。但之前总是习惯于使用（grep -n 关键字文件名）查出关键字以及该关键字所在的行数，然后再用（sed -n '100,200p' 文件名），去查出该关键字之后的日志内容。但其实还有更简便的办法，就是用（grep -B n、-A n、-C n 关键
php heredoc原文档和nowdoc语法 dcj3sjt126com PHP heredoc nowdoc
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body> <?
overflow的属性周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
《我所了解的Java》——总体目录 g21121 java
准备用一年左右时间写一个系列的文章《我所了解的Java》，目录及内容会不断完善及调整。在编写相关内容时难免出现笔误、代码无法执行、名词理解错误等，请大家及时指出，我会第一时间更正。 &n
[简单]docx4j常用方法小结 53873039oycg docx
本代码基于docx4j-3.2.0，在office word 2007上测试通过。代码如下: import java.io.File; import java.io.FileInputStream; import ja
Spring配置学习云端月影 spring配置
首先来看一个标准的Spring配置文件 applicationContext.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi=&q
Java新手入门的30个基本概念三 aijuans java 新手 java 入门
17.Java中的每一个类都是从Object类扩展而来的。　　18.object类中的equal和toString方法。　　equal用于测试一个对象是否同另一个对象相等。　　toString返回一个代表该对象的字符串,几乎每一个类都会重载该方法,以便返回当前状态的正确表示.(toString 方法是一个很重要的方法)　　 19.通用编程:任何类类型的所有值都可以同object类性的变量来代替。　
《2008 IBM Rational 软件开发高峰论坛会议》小记 antonyup_2006 软件测试敏捷开发项目管理 IBM 活动
我一直想写些总结,用于交流和备忘,然都没提笔,今以一篇参加活动的感受小记开个头,呵呵! 其实参加《2008 IBM Rational 软件开发高峰论坛会议》是9月4号,那天刚好调休.但接着项目颇为忙,所以今天在中秋佳节的假期里整理了下. 参加这次活动是一个朋友给的一个邀请书,才知道有这样的一个活动,虽然现在项目暂时没用到IBM的解决方案,但觉的参与这样一个活动可以拓宽下视野和相关知识.
PL/SQL的过程编程,异常,声明变量,PL/SQL块百合不是茶 PL/SQL的过程编程异常 PL/SQL块声明变量
PL/SQL; 过程; 符号; 变量; PL/SQL块; 输出; 异常; PL/SQL 是过程语言(Procedural Language)与结构化查询语言(SQL)结合而成的编程语言PL/SQL 是对 SQL 的扩展,sql的执行时每次都要写操作
Mockito(三)--完整功能介绍 bijian1013 持续集成 mockito 单元测试
mockito官网：http://code.google.com/p/mockito/，打开documentation可以看到官方最新的文档资料。一.使用mockito验证行为 //首先要import Mockito import static org.mockito.Mockito.*; //mo
精通Oracle10编程SQL(8)使用复合数据类型 bijian1013 oracle 数据库 plsql
/* *使用复合数据类型 */ --PL/SQL记录 --定义PL/SQL记录 --自定义PL/SQL记录 DECLARE TYPE emp_record_type IS RECORD( name emp.ename%TYPE, salary emp.sal%TYPE, dno emp.deptno%TYPE ); emp_
【Linux常用命令一】grep命令 bit1129 Linux常用命令
grep命令格式 grep [option] pattern [file-list] grep命令用于在指定的文件(一个或者多个,file-list)中查找包含模式串(pattern)的行,[option]用于控制grep命令的查找方式。 pattern可以是普通字符串，也可以是正则表达式，当查找的字符串包含正则表达式字符或者特
mybatis3入门学习笔记白糖_ sql ibatis qq jdbc 配置管理
MyBatis 的前身就是iBatis，是一个数据持久层(ORM)框架。 MyBatis 是支持普通 SQL 查询，存储过程和高级映射的优秀持久层框架。MyBatis对JDBC进行了一次很浅的封装。以前也学过iBatis，因为MyBatis是iBatis的升级版本，最初以为改动应该不大，实际结果是MyBatis对配置文件进行了一些大的改动，使整个框架更加方便人性化。
Linux 命令神器：lsof 入门 ronin47 lsof
lsof是系统管理/安全的尤伯工具。我大多数时候用它来从系统获得与网络连接相关的信息，但那只是这个强大而又鲜为人知的应用的第一步。将这个工具称之为lsof真实名副其实，因为它是指“列出打开文件（lists openfiles）”。而有一点要切记，在Unix中一切（包括网络套接口）都是文件。有趣的是，lsof也是有着最多
java实现两个大数相加，可能存在溢出。 bylijinnan java实现
import java.math.BigInteger; import java.util.regex.Matcher; import java.util.regex.Pattern; public class BigIntegerAddition { /** * 题目：java实现两个大数相加，可能存在溢出。 * 如123456789 + 987654321
Kettle学习资料分享，附大神用Kettle的一套流程完成对整个数据库迁移方法 Kai_Ge Kettle
Kettle学习资料分享 Kettle 3.2 使用说明书目录概述..........................................................................................................................................7 1.Kettle 资源库管
[货币与金融]钢之炼金术士 comsci 金融
自古以来,都有一些人在从事炼金术的工作.........但是很少有成功的那么随着人类在理论物理和工程物理上面取得的一些突破性进展...... 炼金术这个古老
Toast原来也可以多样化 dai_lm android toast
Style 1：默认 Toast def = Toast.makeText(this, "default", Toast.LENGTH_SHORT); def.show(); Style 2：顶部显示 Toast top = Toast.makeText(this, "top", Toast.LENGTH_SHORT); t
java数据计算的几种解决方法3 datamachine java hadoop ibatis r-langue r
4、iBatis 简单敏捷因此强大的数据计算层。和Hibernate不同，它鼓励写SQL，所以学习成本最低。同时它用最小的代价实现了计算脚本和JAVA代码的解耦，只用20%的代价就实现了hibernate 80%的功能,没实现的20%是计算脚本和数据库的解耦。复杂计算环境是它的弱项，比如：分布式计算、复杂计算、非数据
向网页中插入透明Flash的方法和技巧 dcj3sjt126com html Web Flash
将 Flash 作品插入网页的时候，我们有时候会需要将它设为透明，有时候我们需要在Flash的背面插入一些漂亮的图片，搭配出漂亮的效果……下面我们介绍一些将Flash插入网页中的一些透明的设置技巧。　　一、Swf透明、无坐标控制　　首先教大家最简单的插入Flash的代码，透明，无坐标控制：　　注意wmode="transparent"是控制Flash是否透明
ios UICollectionView的使用 dcj3sjt126com
UICollectionView的使用有两种方法，一种是继承UICollectionViewController，这个Controller会自带一个UICollectionView；另外一种是作为一个视图放在普通的UIViewController里面。个人更喜欢第二种。下面采用第二种方式简单介绍一下UICollectionView的使用。 1.UIViewController实现委托，代码如
Eos平台java公共逻辑蕃薯耀 Eos平台java公共逻辑 Eos平台 java公共逻辑
Eos平台java公共逻辑 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:20:4
SpringMVC4零配置--Web上下文配置【MvcConfig】 hanqunfeng springmvc4
与SpringSecurity的配置类似，spring同样为我们提供了一个实现类WebMvcConfigurationSupport和一个注解@EnableWebMvc以帮助我们减少bean的声明。 applicationContext-MvcConfig.xml  <
解决ie和其他浏览器poi下载excel文件名乱码 jackyrong Excel
使用poi,做传统的excel导出，然后想在浏览器中，让用户选择另存为，保存用户下载的xls文件，这个时候，可能的是在ie下出现乱码（ie,9,10,11),但在firefox,chrome下没乱码，因此必须综合判断，编写一个工具类： /** * * @Title: pro
挥洒泪水的青春 lampcy 编程生活程序员
2015年2月28日，我辞职了，离开了相处一年的触控，转过身--挥洒掉泪水，毅然来到了兄弟连，背负着许多的不解、质疑——”你一个零基础、脑子又不聪明的人，还敢跨行业，选择Unity3D？“，”真是不自量力••••••“，”真是初生牛犊不怕虎•••••“，••••••我只是淡淡一笑，拎着行李----坐上了通向挥洒泪水的青春之地——兄弟连！这就是我青春的分割线，不后悔，只会去用泪水浇灌——已经来到
稳增长之中国股市两点意见-----严控做空，建立涨跌停版停牌重组机制 nannan408
对于股市，我们国家的监管还是有点拼的，但始终拼不过飞流直下的恐慌，为什么呢？笔者首先支持股市的监管。对于股市越管越荡的现象，笔者认为首先是做空力量超过了股市自身的升力，并且对于跌停停牌重组的快速反应还没建立好，上市公司对于股价下跌没有很好的利好支撑。我们来看美国和香港是怎么应对股灾的。美国是靠禁止重要股票做空，在
动态设置iframe高度(iframe高度自适应) Rainbow702 JavaScript iframe contentDocument 高度自适应局部刷新
如果需要对画面中的部分区域作局部刷新，大家可能都会想到使用ajax。但有些情况下，须使用在页面中嵌入一个iframe来作局部刷新。对于使用iframe的情况，发现有一个问题，就是iframe中的页面的高度可能会很高，但是外面页面并不会被iframe内部页面给撑开，如下面的结构： <div id="content"> <div id=&quo
用Rapael做图表 tntxia rap
function drawReport(paper,attr,data){ var width = attr.width; var height = attr.height; var max = 0; &nbs
HTML5 bootstrap2网页兼容（支持IE10以下） xiaoluode html5 bootstrap
<!DOCTYPE html> <html> <head lang="zh-CN"> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge">