论文地址:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8124924
Synthetic aperture radar (SAR) images are all-weather, all-time, and wide coverage, increasingly used for ship detection to ensure marine surveillance and transportation security. Currently, deep learning has achieved enormous success in object detection with the capability of representation learning. Combining single shot multiBox detector (SSD) with transfer learning is proposed to address ship detection with complex surroundings, such as both ocean and island in this paper. SSD is chosen because its detection accuracy remains high with relative fast speed and transfer learning is chosen because it performs well even with small training datasets. Two types of SSD models integrated with transfer learning, namely, SSD300 and SSD512 with an input size of 300 pixels and 512 pixels in height and width, are applied to ship detection. To evaluate our approach, SAR images dataset acquired by Sentinel-1 are used. Experimental results reveal that compared with SSD300, SSD512 achieves lower false alarm and slighter lower in detection accuracy. These results demonstrate the effectiveness of our method.
Index Terms—Sentinel-1 images, ship detection, single shot multiBox detector, transfer learning
合成孔径雷达(SAR)图像全天候、全时、全覆盖,被越来越多地用于船舶探测以确保海洋监视和运输安全。目前,深度学习在具有表征学习能力的目标检测方面取得了巨大的成功。针对海洋、岛屿等复杂环境下的船舶检测问题,提出了SSD(single shot multiBox detector)与迁移学习相结合的方法。选择SSD是因为它的检测精度高,速度相对较快,而选择迁移学习是因为它即使在小训练集上表现也良好。将两种融合了迁移学习的SSD模型,SSD300和SSD512应用于船舶检测,其输入大小分别为300300和512512。为了评估我们的方法,使用Sentinel-1获取的SAR图像数据集。实验结果表明,与SSD300相比,SSD512具有较低的误报率和较低的检测精度。这些结果证明了该方法的有效性。
索引词:Sentinel-1图像,船舶检测, SSD, 迁移学习
The Synthetic aperture radar (SAR) images have the advantages of all-weather, all-time, and wide coverage, thus increasingly used for ship detection to ensure marine surveillance and transportation security. There are three typical approaches to addressing ship detection, i.e., statistical models, features extraction and deep learning. The first two kind of methods are highly dependent on the statistics or the handcrafted features. However, the various surroundings of vessels, such as open-ocean or enclosed harbor surroundings, and the orientations of ships relative to sensors, make it difficult to describe their scattering mechanisms and distill features, thus limiting these two kinds of methods. To date, deep learning methods have been adopted for ship detection in SAR images because of its capability with representation learning [1]. [2] firstly used landocean segmentation to obtain the candidate location of ships and then exploited convolutional networks to classify ships. [3] used the faster RCNN to generate
合成孔径雷达(SAR)图像全天候、全时、全覆盖,被越来越多地用于船舶检测以确保海洋监视和运输安全。有三种处理船舶检测的典型方法,即统计模型、特征提取和深度学习。前两种方法高度依赖于统计数据或手工特性。然而,由于船舶所处的各种环境,如公海或封闭的港口环境,以及船舶相对于传感器的方向,使得对其散射机制和特征提取变得困难,从而限制了这两种方法的应用。由于深学习方法具有较强的表征学习能力,目前已被广泛应用于SAR图像中的船舶检测。[2]首先利用landocean分割得到船舶的候选位置,然后利用卷积网络对船舶进行分类。[3]使用Faster-RCNN生成包含船舶的bounding box,然后利用CFAR得到最终结果。[4]将低层特征与高层特征相结合,提高检测精度,并利用上下文特征排除误报警。[5]侧重于对冰山旁船舶的识别,[6]引入了一个用于船舶分类的数据集,但所有的船舶都来自海洋,证明了highway network有助于处理识别问题。这些方法除了[4]主要以海洋为中心导航外,缺乏对船舶复杂环境的分析,没有兼顾速度和检测准确率。
本文采用SSD,具有快速、高精度和表征学习能力。与[4]相比,将考虑复杂的航行船舶环境。此外,迁移学习将用于学习有限的小数据集的权重。为了评估我们的方法,使用Sentinel-1获得的SAR图像。
Single Shot MultiBox Detector (SSD) [7], as shown in Fig.1, is built on the basis of VGG16. It has two variations, namely SSD300 and SSD512 with different size of input images. The red line indicates that SSD300 removes the fully connection layers of VGG16. It has two components in its structure as shown in Fig.1. The first component, called a base network, is used for classification of images. There are typically two kinds of typical convolutional network used for SSD300, including VGG16 [8], and ResNet[9].
SSD[7],如图1所示,是在VGG16的基础上构建的。它有两种变体,即SSD300和SSD512,具有不同大小的输入图像。红线表示SSD300删除了VGG16的全连接层。其结构由两部分组成,如图1所示。第一个部分称为基本网络,用于对图像进行分类。SSD300通常使用两种典型的卷积网络,包括VGG16[8]和ResNet[9]。
In this paper, VGG16 is used. ResNet will be investigated in future studies. The second component is used to produce detection. It fuses multi-scale feature maps by convolutional layers to generate bounding boxes with probabilities containing interested objects. Subsequently, non-maximum suppression is used to generate the detection results.
本文使用了VGG16。ResNet将在未来进行研究。第二部分用于识别,它通过卷积层融合多尺度特征图,生成包含感兴趣对象概率的bounding box。然后,使用非最大抑制NMS(non-maximum suppression )来生成检测结果。
Transfer learning [10, 11] is beneficial to tasks that have few data that are insufficient to train a good model, when the same task in a different domain may have numerous data. Applying transfer learning to deep learning models allows the learning features from multiple layers with less data. Its common routines alter the final structures of models that are pretrained on public datasets but maintains the weights of first few layers, thus only learning the weights of modified layers.
Transfer learning has improved the accuracies in SAR classification. Due to its benefits, it is also adopted to tackle the limited number of ship training datasets. For example [12] classifies SAR data through transfer learning with parameters from the model trained on CIFAR 10 to TerraSAR-X data. [13] exploits transfer learning to deal with SAR target classification. [14] adopts transfer learning to classify polarimetric SAR images by transferring the weights pretrained on PASCAL VOC to the weights in the models trained on Gaofen-3 SAR images. Here, like [14], the weights in the layers directly involving the output are relearned as indicated by red in Fig.2.
迁移学习[10,11]对于数据较少、不足以训练良好模型的任务是有益的,而同一任务在不同领域可能有大量数据。将迁移学习应用到深度学习模型中,可以在更少数据的情况下实现深层学习特性。它通常改变模型的最终层,这些模型是在公共数据集上预训练得到,但前几层的权重保留,因此只要学习修改后层的权重。
迁移学习提高了SAR分类的准确率。由于它的优点,它也被用来处理有限的船舶训练数据集。例如,[12]通过用CIFAR10模型训练TerraSAR-X得到模型参数并用于SAR数据分类。[13]利用迁移学习来处理SAR目标分类。[14]采用迁移学习对极化SAR图像进行分类,将在PASCAL VOC上预先训练的权值转换为在Gaofen-3 SAR图像上训练的模型权值。在这里,与[14]一样,直接涉及到输出层中的权重将被重新学习,如图2中的红色所示。
Our experimental workflow is shown in Fig.3.
我们的实验过程如图3
Three polarimetric dual polarimetric VH-VV imageswith Interferometric Wide Swath (IW) mode were acquired by Sentinel-1 at Shanghai Harbor, and their information is shown in Table 1. Among these images, the first two are used to train and validate the models, and the final image is used to evaluate the robustness of trained models.
Sentinel-1在上海港口获得了三类IW(Interferometric Wide Swath )模式的VH-VV图像,其信息如表1。其中,前两类图像用于训练和验证模型,最后一类图像用于评估训练后模型的鲁棒性。
To acquire the training data, we first identify and choose some containing ships subareas in the first two images. Five candidate regions are selected visually. After identifying the regions, the five candidate images are split into sub images with the size of 256 pixels × 256 pixels in height and width respectively. Finally, 269 sub images with 514 small ships are labeled by open source LabelImg. There is an example showing the image and its corresponding labeled image with red rectangles indicating the locations of ships.
为了获得训练数据,我们首先在前两类图像中识别并选择了一些包含船舶子区域的图像。可视化地选择五个候选区域。识别区域后,将5类候选图像分割为高、宽256像素×256像素的子图像。最后,用LabelImg对带有514艘船舶的269张子图像标记。下图中显示了原图及其对应的标记图像,红色矩形表示船舶的位置。
After construction of ship SAR images and their corresponding labels, 80% of sub images are used to train the model, thus leading 215 sub images for training. Since the number of ships available for the training sample is limited, both transfer learning (section 2.2) and data augmentation are adopted. As shown in [7], data augmentation methods, such as horizontal flip, random crop and color distortion, and random expansion, increase the detection accuracy by 11.7%. Such methods are utilized in this experiment. Cumulatively, 11 methods are employed to expand the dataset. Even though only 232 images are used, the number of the training data points increases to 2552.
构建SAR船舶图像及其对应标签后,80%的子图像用于训练模型,从而利用215个子图像进行训练。由于可供训练样本使用的船舶数量有限,因此采用了迁移学习(第2.2节)和数据扩充两种方法。如[7],采用水平翻转、随机裁剪和颜色失真、随机展开等数据增强方法,使检测准确率提高了11.7%。本实验总共使用了11种方法来扩展数据集,虽然只使用232幅图像,但是训练数据的数量增加到了2552。
Learning policies are guided by [7, 15, 16]. The followings give the main hyperparameters used in this paper and other parameters are same as [7]. Base learning rate is chosen to be 0.000001. This is an empirical value because if it is set larger than 0.000001 or the same as in [7], underflow occurs, leading to failure to train the model. Batch size is chosen as 6. This value is chosen to make the full use of our GPU, NVIDAI GTX 1070, with the memory 8G. Moment is set 0.99 because of its fast convergence. Weight decay is 0.0005 and iterations 240,000.
学习策略由[7,15,16]指导。下面给出了本文使用的主要超参数,其他参数与[7]相同。选择基础学习率为0.000001。这是一个经验值,因为如果它被设置为大于0.000001或与[7]相同,就会发生下溢,导致无法训练模型。Batch size为6。选择这个值是为了充分利用GPU,NVIDAI GTX 1070的内存为8G。Moment(矩)取0.99,因为它收敛速度较快。 Weight decay 为0.0005,iterations 为240,000。
There are two ways to evaluate our methods, including testing images to validate trained models and comparing to typical images of various contexts to evaluate the robustness of trained models. To evaluate the performance of ship detection, probability of false alarm a F , probability of detection d P and 1 F are used as defined from formulas (1)-(3).
评估我们的方法有两种方法,一种是通过图像测试集来验证训练后的模型,另一种是通过比较不同上下文的典型图像来评估训练后的模型的鲁棒性。为评价船舶识别性能,采用式(1)-(3)定义的误报率Fa、检测概率Pd和F1来评价船舶检测性能。
After training, test images are used to evaluate our approach. There are 26 images containing 37 ships used for the test trained model. The results are shown in Table 2.
训练后,测试集被用来评估。训练模型使用了26张包含37艘船的测试图像。结果如表2所示。
SSD512 has the lower false alarm, but SSD300 performs slightly better than SSD512 in ship detection. Besides, SSD512 has higher value in F1 than to evaluate the outputs of our models, SAR images that contain ships near islands, harbors, and in the open ocean are tested. SSD300 performs better in Fa , but SSD512 performs better in Pd .
To evaluate the robustness of our approach. Three typical areas containing ships, i.e., ocean, island and land are used. The results are shown in Table 3 and Fig.5-Fig.7. It is obvious that even in complex surroundings, such as near island, land and ocean, SSD300 has the advantage of detection over 95%, whereas SSD512 has the advantage of lower false alarm probability less than 15.11%. Specifically, 1) both SSD300 and SSD512 achieve more than 97% detection accuracy and lower than 4% false alarm on the ocean; 2) SSD300 has 15% higher false alarm compared with SSD512 near the island; 3) compared to the resutls on the ocean, both SSD300 and SSD512 perform more than 12% lower in false alarm near the land, which may be caused by the scattering mechanisms of nearby building in the harbor and land. Form the results above, SSD512 performs better than SSD300 with regard to F1 . Besides, even the data used for ship detection is quite small, both SSD300 and SSD512 perform well, demonstrating the effectiveness of our approach.
SSD512具有较低的误报率,但SSD300在船舶检测方面略优于SSD512。此外,SSD512对岛屿附近、港口附近和开阔海域的船舶SAR图像进行了测试,在F1的值要高于对模型输出值的估计。SSD300在Fa表现较好,而SSD512在Pd表现较好。
为了评估我们的方法的鲁棒性,使用了海洋、岛屿和陆地三个船舶所在的典型地区。结果如表3和图5-图7所示。很明显,即使在岛屿、陆地、海洋等复杂环境中,SSD300的检测优势也超过95%,而SSD512的误报率小于15.11%。具体而言,1)在海洋环境,SSD300和SSD512检测准确率均在97%以上,误报率低于4%;2) 在岛屿环境下,SSD300的误报率比SSD512高15%;3) 在陆地,SSD300和SSD512误报率都比海洋上的研究结果低12%以上,这可能是由于附近建筑物在港口和陆地上的散射机制造成的。从以上结果可以看出,SSD512在F1的性能优于SSD300。此外,即使是用于船舶检测的数据也非常小,SSD300和SSD512的性能都很好,证明了我们方法的有效性。
SSD is combined with transfer learning to perform ship detection. SSD considers both the computational efficiency and detection accuracy of the program, thus making it possible to use interferometric wide swath SAR images to monitor marine traffic and ensure the security of ships. Additionally, transfer learning is used to address problems associated with the limited number of ships that are available for training. In this paper, two different models, SSD300 and SSD512, are used to assess ship detection. Experimental results reveal that SSD300 has the advantage of detection over 0.95, whereas SSD512 has the advantage of lower false alarm probability less than 0.1511 even in complex navigation surroundings.
SSD与迁移学习相结合,实现船舶检测。SSD既考虑了程序的计算效率,又考虑了程序的检测精度,使得利用IW SAR图像监测海上交通,保证船舶安全成为可能。此外,迁移学习用于解决可用于训练船舶数量有限的问题。本文采用SSD300和SSD512两种不同的模型对船舶检测进行了评估。实验结果表明,SSD300的检测性能优于0.95,而SSD512即使在复杂的导航环境中,其误报率也低于0.1511。