遥感道路提取的补充C-UNet: Complement UNet for Remote Sensing Road Extraction

这里写自定义目录标题

遥感道路提取的补充C-UNet: Complement UNet for Remote Sensing Road Extraction
摘要:道路是一种重要的交通方式,对人们的日常工作和生活都非常方便。然而,从高分辨率遥感图像中准确地提取道路信息是一项挑战。本文提出了一种基于互补UNet(C-UNet)的遥感图像道路提取方法。C-UNet包含四个模块。首先,利用标准UNet从遥感图像中粗略提取道路信息,得到第一个分割结果;其次,利用固定的阈值擦除部分提取信息;第三,引入多尺度密集扩张卷积UNet(MD-UNet)发现擦除掩模中的补体道路区域,得到第二次分割结果;最后,将第一和第三模块的提取结果进行融合,得到最终的分割结果。在马萨诸塞州道路数据集上的实验结果表明,我们的C-UNet得到的结果高于最先进的方法,证明了其有效性
Abstract: Roads are important mode of transportation, which are very convenient for people’s daily work and life. However, it is challenging to accuratly extract road information from a high-resolution remote sensing image. This paper presents a road extraction method for remote sensing images with a complement UNet (C-UNet). C-UNet contains four modules. Firstly, the standard UNet is used to roughly extract road information from remote sensing images, getting the first segmentation result; secondly, a fixed threshold is utilized to erase partial extracted information; thirdly, a multi-scale dense dilated convolution UNet (MD-UNet) is introduced to discover the complement road areas in the erased masks, obtaining the second segmentation result; and, finally, we fuse the extraction results of the first and the third modules, getting the final segmentation results. Experimental results on the Massachusetts Road dataset indicate that our C-UNet gets the higher results than the state-of-the-art methods, demonstrating its effectiveness
1.介绍道路作为遥感图像的重要特色,包括高速公路、城乡道路、小路等。道路提取在道路自动导航、救灾、城市规划、地理信息更新[1]等多个领域具有重要意义。然而,由于遥感图像[2]中的噪声、遮挡和道路结构的复杂性,这是一项具有挑战性的任务。道路分割主要有两种图像,主要是航空红外热成像和遥感图像。空中红外热成像可以每天24小时监测,而不受强光的影响。然而,红外图像的对比度较低,缺乏图像的细节。这一缺点使得提取道路分割[3]变得困难。遥感图像受地面条件限制较小,实时传输小,探测范围较大。所有这些优点使它更适合开采道路。学者们对遥感图像的道路提取进行了研究,并提出了多种方法。这些方法大致可以分为四种类型:基于数据的方法、图切割法
2.1. Introduction
3.Road, as a vital special feature in remote sensing images, includes highways, urbanrural roads, byway, and so on. Road extraction has important significance in many fields,such as automatic road navigation, disaster relief, urban planning, and geographic information update [1]. However, it is a challenging task because of the noise, occlusions, and complexity of the strcture of roads in remote sensing image [2].There are mainly two types of images, i.e., aerial infrared thermography and remote sensing images, used to road segmentation. The aerial infrared thermography can be monitored 24 h a day, without being affected by strong light. However, the contrast of infrared images is low, lacking of image details. This disadvantages make it difficult to extract road segmentation [3]. Remote sensing images are less limited by ground conditions,real time transmission, and its detection range is large. All these advantages make it more suitable to extract roads. Scholars have studied the road extraction of remote sensing image and put forward a variety of methods. These methods can be roughly divided into four types: data-based method, graph cut method, semi-automatic method, and automatic method [4]. Databased method generally extracts roads from remote sensing images with the information of data. For example, Wegner et al. [5] suggested to get the road segmentation results with conditional random fields. They significantly improved both the per-pixel accuracy and the topological correctness of the extracted roads on two different datasets. Maurya et al. [6] proposed a clustering method to do road segmentation. The method extracts roads very rapidly and give satisfactory results with small number of simple images. Mattyus et al. [7] utilized Markov random fields to finish road segmentation. They demonstrated their approach outperforming the state-of-the-art in the two datasets they collected. These
该方法存在一定的局限性,如对不同类型道路的泛化能力较差,不能处理多尺度道路。图切方法属于无监督学习方法。它依靠颜色特征来提取道路信息。例如,Cheng等[8]提出了一种基于图切割的概率传播方法,从复杂的遥感图像中提取道路。他们在他们收集的两个数据集的定性和定量比较中都取得了更好的表现。Cheng等人[9]介绍了一种具有多种特征的图切割方法。它们在25张图像上的性能比其他方法更好。Yuan等[10]提出了一种新的图切割方法,获得了比其他方法更高的结果。虽然这些方法在一定程度上缓解了传统的基于数据的问题,但对于道路[4]上具有多种颜色的图像却不能取得更好的效果。道路半自动提取、人机交互应用于道路特征提取和识别[11]过程。其主要思想如下:首先,手动设定道路的初始种子点,必要时设定初始方向然后由计算机根据相应的规则进行判断和识别,同时适当地使用人机交互,以保证识别的准确性。常用的方法包括动态规划[12,13]、基于蛇[14,15]或主动轮廓[16,17]的模型、基于主动测试[18]的模型、模板匹配[19,20]等。半自动道路提取需要持续的人工干预,增加了遥感图像口译器[21]的工作量。此外,在路段的形成和修复以及路段[22]的连续阶段中,都需要人工辅助信息。半自动道路提取客观地提高了精度,同时降低了工作效率。因此,它不利于推广。
methods have certain limitations, such as poor generalization ability for different types of
roads, and cannot handle multi-scale roads.
Graph cut method belongs to the unsupervised learning. It relies on the color features
to extract roads information. For example, Cheng et al. [8] proposed a graph cut based
probability propagation approach to extract road from complex remote sensing images.
They achieved better performance both in qualitative and quantitative comparisons in the
two datasets they collected. Cheng et al. [9] introduced a graph cut method with multiple
features. They got better performance than other methods on 25 images. Yuan et al. [10]
presented a novel graph cut method and obtained higher results than other methods.
Although these methods alleviate the traditional data-based problems to a certain extent,
they cannot achieve better results for images with multiple colors on the road [4].
The semi-automatic road extraction, man-machine interaction is used in the process of
road feature extraction and recognition [11]. The main idea is as follows: firstly, the initial
seed point of the road is set manually, and initial direction is set if necessary; then judgment
and recognition is conducted by the computer according to the corresponding rules, and
at the same time man-machine interaction is appropriately used to ensure the accuracy
of recognition. Commonly-used methods include dynamic programming [12,13], models
based on snakes [14,15] or active contour [16,17], models based on active testing [18], template matching [19,20], etc. Constant manual intervention is needed in the semi-automatic
road extraction, increasing the workload of remote sensing image interpreters [21]. Additionally, artificial auxiliary information is required in the formation and repair of road
segments and in the continuous stage of road segments [22]. The semi-automatic road extraction objectively improves accuracy rate, while reducing the work efficiency. Therefore,
it is not conducive to promotion.

As for automatic road feature extraction method, roads are automatically interpreted
and recognized by extracting and understanding road image features [23]. Specifically, the
features of the roads in the image are firstly analyzed, and then the roads are automatically identified by pattern recognition methods [24]. Among them, convolutional neural
networks (CNNs) based methods are the most representative [25–28]. Zhong et al. [29]
proposed a CNN model that combines low-level fine-grained features and high-level semantic features to extract road and building targets in satellite images. Alshehhi et al. [30]
proposed a patch-based CNN model for extracting road and building parts simultaneously
from remote sensing imagery. Subsequently, a road extraction method based on the fully
convolutional network (FCN) model appeared. Varia et al. [31] applied a deep learning
technique FCN-32 for extracting road parts from extremely high-resolution Unmanned
Aerial Vehicle (UAV) imagery. Kestur et al. [32] presented U-shaped FCN based on the FCN
to extract roads from UAV images. Panboonyuen et al. [33] presented a technique based
on landscape metrics and the exponential linear unit function to extract road objects from
remote sensing imagery. Hong et al. [34] applied a block based on richer convolutional features for road segmentation from high-resolution remote sensing imagery. Cheng et al. [35]
proposed the cascaded end-to-end CNN model based extracting road centerlines from
remote sensing imagery. CNN-based models can automatically explore road characteristics by using strong generalization ability, the arbitrary function fitting ability and high
stability, then predict the probability value of pixel-level road images through discriminant
function [36,37]. They achieved better performance than the other three types of methods.
However, they rely heavily on abundant images, and the number of remote sensing images
is generally limited
[33]等人提出了一种基于景观度量和指数线性单位函数,从遥感图像中提取道路物体的技术。Hong等人[34]应用了一个基于更丰富的卷积特征的块,从高分辨率遥感图像中进行道路分割。Cheng等[35]提出了基于从遥感图像中提取道路中心线的级联端到端CNN模型。基于cnn的模型可以利用较强的泛化能力、任意函数拟合能力和较高的稳定性,自动探索道路特征,然后通过判别函数[36,37]预测像素级道路图像的概率值。它们比其他三种方法具有更好的性能。然而,它们严重依赖于丰富的图像,而遥感图像的数量普遍有限。
Panboonyuen et al. [33] presented a technique based on landscape metrics and the exponential linear unit function to extract road objects from remote sensing imagery. Hong et al. [34] applied a block based on richer convolutional features for road segmentation from high-resolution remote sensing imagery. Cheng et al. [35] proposed the cascaded end-to-end CNN model based extracting road centerlines from remote sensing imagery. CNN-based models can automatically explore road characteristics by using strong generalization ability, the arbitrary function fitting ability and high stability, then predict the probability value of pixel-level road images through discriminant function [36,37]. They achieved better performance than the other three types of methods. However, they rely heavily on abundant images, and the number of remote sensing images is generally limited…
为了用有限的遥感图像分割道路,罗内伯格等人提出了基于FCN的UNet,通过加深网络层数,并在相应的层[38]之间添加跨层连接[38]。在2014年和2015年的国际生物医学成像研讨会(ISBI)细胞跟踪挑战[39]和电子显微镜(EM)分割挑战[40]中,我们分别获得了最高的平均交叉比和最小的翘曲误差
To segment roads with limited remote sensing images, Ronneberger et al. proposed
the UNet based on FCN by deepening the number of network layers and adding cross layer
connections between corresponding layers [38]. UNet obtained the highest mean crossover ratio and the smallest warping error, respectively, in the International Symposium
on Biomedical Imaging (ISBI) Cell Tracking Challenge [39] and Electron Microscopy (EM)
Segmentation Challenge [40] in 2014 and 2015, respectively
目前,大多数主流的遥感图像道路提取模型都是基于UNet的。例如,通过在原始的UNet网络中添加残差模块[41],得到了一个深度ResUNet,在马萨诸塞州数据集[42]上获得了最高的召回率。此外,Oktay等人在UNet的解码器中添加了注意门,并提出了一个注意UNet模型,该模型在不引入多个模型参数[37]的情况下,通过抑制不相关背景区域的特征响应来突出分割后的目标。Zhou等人提出了DinkNet34模型[43]。基于UNet模型,同时使用扩展卷积模块[44],在保持分辨率的同时,扩展了接受域,并赢得了2018年深度全球挑战赛[42]的冠军。
At present, most of the mainstream remote sensing image road extraction models
are based on UNet. For example, by adding residual module [41] to the original UNet
network, Zhengxin et al. got a deep ResUNet which obtained the highest recall rate at
the Massachusetts dataset [42]. Furthermore, Oktay et al. added the Attention Gate to
the decoder of UNet, and proposed an attention UNet model, which highlighted the
segmented targets by suppressing the characteristic responses of unrelated background
regions without introducing many model parameters [37]. Zhou et al. proposed the
DinkNet34 model [43]. Based on the UNet model, it expanded the receptive field while
maintaining the resolution by simultaneously using dilated convolution module [44], and
won the championship of the DeepGlobe 2018 Challenge [42].
上述所有模型均采用单一网络进行道路遥感图像提取。然而,单一网络,因为它不能处理各种形状、长度和宽度的道路,限制了路网提取的性能。因此,本文提出了一种基于补充UNet(C-UNet)的遥感图像提取方法。该模型具有两个特点:一是设置固定的阈值来去除第一个分割结果中的像素;另一方面,引入多尺度扩张卷积UNet(MD-UNet)来提取更困难的道路信息。
All of the above models conduct remote sensing image road extraction by means
of a single network. However, a single network limits the performance of road network
extraction because it cannot handle the roads of various shapes, lengths and widths.
Therefore, a method of remote sensing image road extraction based on complement UNet
(C-UNet) was proposed in this paper. The model has two characteristics: one the one hand,
it sets a fixed threshold to erase the pixels in the first segmentation result; on the other
hand, it introduces the multi-scale dilated convolution UNet (MD-UNet) to extract more
difficult road information.
UNet和C-UNet之间主要有两个区别。首先,UNet用单一网络的遥感图像分割道路,不考虑道路宽度和长度的多样性,而C-UNet利用两种UNet变体,即UNet和MD-UNet,依次对道路进行分割。前者用于提取更容易的道路信息,后者禁止提取补充和更困难的道路信息。其次,UNet不能获得更大的接受野,而C-UNet则通过放大卷积操作获得更大的接受野,使其更适合于高分辨率遥感图像。
There are mainly two differences between UNet and C-UNet. Firstly, UNet segments
the roads from remote sensing images with a single network, without considering the
diversity of road width and length, while C-UNet utilized two UNet variations, i.e., UNet
and MD-UNet, to successively segment the roads. The former is used to extract easier road
information, and the latter is forbidden to extract complement and more difficult road
information. Secondly, UNet cannot obtain larger receptive fields, while C-UNet armed
with the dilated convolution operation to obtain larger receptive fields, making it more
suitable for high-resolution remote sensing images.
本研究的主要贡献如下: (1)为了提高遥感道路图像提取的精度,我们提出了一种补充的UNet模型C-UNet,用于高分辨率遥感道路图像提取。该模型采用标准的UNet和MD-UNet依次提取遥感图像中的道路信息,然后对分割结果进行融合,得到最终的分割结果,优于现有的分割方法。(2)提出了一种针对固定显著性面积的擦除方法。通过使用固定的阈值,去除标准UNet提取的遥感图像中的部分道路区域,使网络能够第二次提取更细、更弱的道路区域。(3)通过将我们的模型与近年来提出的UNet系列模型进行比较,实验结果表明,我们的模型比之前先进的先进模型取得了更好的效果,验证了我们的模型的有效性。此外,还建立了一些消融研究来验证其整体结构和主要模块。研究其余部分的主要内容如下: UNet是简要介绍
The main contributions of the study were summarized as follows:
(1) To improve the accuracy of remote sensing image road extraction, we propose a complement UNet model, called C-UNet, for high-resolution remote sensing image road
extraction. The model used standard UNet and MD-UNet to extract road information
in remote sensing image successively, then fused the results of segmentations, and,
lastly, obtained the final segmentation result, which was better than the state-of-theart methods.
(2) A kind of erasing method for fixed significant area was proposed. By using a fixed
threshold, it erased part of the road area in the remote sensing image extracted by
standard UNet, so that the network could extract finer and weaker road area for the
second time.
(3) By comparing our model with the UNet SERIES models proposed in recent years,
the experimental results showed our model achieved better results than the previous
state-of-the-art models, verifying the effectiveness of our model. In addition, some
ablation studies were established to verify the overall structure and major modules.
The major contents of the rest part of the study are as follows: UNet is briefly introduced in Section 2. The model is introduced in detail in Section 3. The experimental results
are shown in Section 4. Discussion is showed in Section 5. Summary and conclusion are
given in Section 6.
2.UNet UNet是一种改进的全卷积网络模型,其结构类似于U [38]形状。UNet的详细架构如图1所示。与其他卷积神经网络相比,UNet需要更少的训练集,并具有更高的分割精度。从图1可以看出,它由编码器和解码器组成,它们与中间层的对称轴对称。该编码器通过卷积层的降采样(也称为池化)层来提取图像特征。通过比较,解码器对特征图像进行上采样
2. UNet
UNet is an improved fully convolutional network model, and its structure is similar
to shape U [38]. The detailed architecture of UNet is shown in Figure 1. Compared with
other convolutional neural networks, UNet requires less training sets and has higher
segmentation accuracy. As seen from the Figure 1, it is composed of encoder and decoder,
which are symmetrical with the symmetry axis of the intermediate layer. The encoder
extracts image features through convolutional layers down-sampling (also known as
pooling) layers. By comparison, the decoder conducts up-sampling of feature images, and
相应的编码器层和解码器层之间存在跨层连接,可以帮助上采样层恢复图像的细节
there are cross-layer connections between the corresponding encoder and decoder layers,
which can help the up-sampling layer to recover the details of the image
具体来说,编码器通过卷积层提取的图像特征信息由3×3卷积层、ReLU函数和2×2最大池化层组成。进行了四次降采样。每次池化操作后,特征图像的大小减小,通道数量增加一倍。解码器通过2×2的反褶积层(或换位卷积)进行上采样,并逐步恢复图像信息。对应于编码器部分,解码器部分完成四次上采样。在每次上采样后,特征图像的大小增加了,通道的数量减少了一半。浅层网络可以更有效地保存的详细位置信息,通过连接编码器和解码器对应的特征模式来帮助进行分割。UNet总共包含23个卷积层。
Specifically, the image feature information that the encoder extracts through convolutional layer is composed of 3 × 3 convolutional layer, ReLU function and 2 × 2 max-pooling
layer. Four times of down-sampling are conducted. The size of the feature images decreases and the number of channels doubles after each pooling operation. The decoder
performs up-sampling by 2 × 2 deconvolution layer (or transposing convolution) and
gradually recovers the image information. Corresponding to the encoder part, the decoder
part completes up-sampling for four times. The size of the feature images increases and
the number of channels reduces by half after each up-sampling. The detailed location
information that can be more effectively saved with shallow network assists segmentation
through concatenation of the corresponding feature pattern of encoder and decoder. The
UNet contains a total of 23 convolutional layers.
3.在本节中,我们首先介绍C-UNet的整体体系结构。然后,我们依次描述了擦除模块、多尺度展开卷积UNet和融合模块。最后,给出了训练我们的模型的损失函数。
3. C-UNet
In this section, we first introduce the overall architecture of C-UNet. Then, we describe
the erasing module, multi-scale dilated convolution UNet, and the fusion module, in turn.
Finally, the loss function that trained our model was given.
3.1.为了提高从高分辨率遥感图像中提取道路的性能,我们提出了带有四个模块的C-UNet。首先,将遥感图像输入标准UNet进行道路提取,得到第一个分割结果。其次,设置一个固定的阈值来擦除超过阈值的像素,得到擦除后的分割结果。然后,将擦除后的分割结果输入到多尺度扩张卷积UNet中,在第三个模块中进行道路分割。最后,对第一个模块和第三个模块的分割结果进行融合,得到最终的分割结果。由于其优势,该模型通过标准的UNet和多尺度扩张卷积UNet依次完成了道路分割。前者用于提取易于分割的道路信息,后者用于提取前者未提取的道路信息,即较难分割的道路信息。C-UNet的流程图和总体架构是 而后者则用于提取前者未提取的道路信息,即更难以分割的道路信息。C-UNet的流程图和总体架构分别如图2和图3所示。
3.1. Overall Network Architecture
To impove the performance of road extraction from high-resolution remote sensing
images, we propose C-UNet with four modules. First of all, remote sensing images were
input into standard UNet for road extraction in the first module and the first segmentation
results were obtained. Secondly, a fixed threshold value was set to erase the pixels that
exceeded the threshold value and the segmentation result after erasing was obtained. Then,
the segmentation results after erasing were input into the multi-scale dilated convolution
UNet for road segmentation in the third module. Lastly, the segmentation results of the
first module and the third module were fused, obtaining the final segmentation results. As
for the advantage, the model completed road segmentation sequentially through standard
UNet and multi-scale dilated convolution UNet. The former was used to extract road
information that was simpler to be segmented, while the latter was used to extract the
road information that was not extracted by the former, namely the road information that
was more difficult to be segmented. The flowchart and overall architecture of C-UNet are
shown in Figures 2 and 3, respectively.
具体来说,让XH×W×C表示遥感图像,Funet表示标准UNet的输出,这个过程可以用下式表示:Funet=UNet(XH×W×C)。(1)第一个分割结果Pre1可以将Funet放入s型函数并应用二进制运算,可以表示形式如下: Preunet = σ(Funet),(2) Pre1 =二值化(Preunet),
To be specific, let XH×W×C denote a remote sensing image and Funet denote the output
by the standard UNet, the process could be expressed with the following equation:
Funet = UNet(XH×W×C). (1)
The first segmentation result Pre1 could be obtained by putting Funet into the sigmoid
function and applied a binary operation, and it could be expressed in the following form:
Preunet = σ(Funet), (2) Pre1 = binarized(Preunet),
其中,σ(·)表示s型函数,二值化的(·)为二进制运算。其次,设置一个固定的阈值,擦除特征模式Funet中大于阈值δ的像素,得到擦除的特征模式,可以用以下方式表示:
where σ(·) denotes the sigmoid function, and binarized(·) is the binary operation.
Secondly, a fixed threshold was set to erase the pixels in the feature pattern Funet that
were larger than the threshold δ, and the erased feature pattern was obtained, which could
be expressed in the following way:
图2。补体UNet的流程图(C-UNet)。F0统一了= E(Funet,δ),(4),其中E(·)代表擦除操作。然后,将擦除的特征模式输入MD-UNet,得到第二次分割结果。该过程可以用如下式表示:Premd=Fmd−(−0),(5) Pre2 =二值化(Premd−et),(6)Pre2表示第三个模块的分割结果,二值化(·)表示二进制操作。
Figure 2. The flowchart of complement UNet (C-UNet).F0unet = E(Funet, δ), (4) where E(·) stands for the erasing operation. Later on, the erased feature pattern was input into the MD-UNet, and the second segmentation result was obtained. The process could be expressed as with the following equation: Premd−unet = Fmd−unet(F0unet), (5) Pre2 = binarized(Premd−unet), (6) where Pre2 stands for the segmentation results of the third module, and binarized(·) denotes the binary operation
.最后,将第一个模块和第三个模块的分割结果进行融合,得到融合后的预最终结果,作为最终的分割结果。它可以表达为:预最终的=融合(Pre1,Pre2)。
Finally, the segmentation results of the first module and the third module were fused
and the fused result Pre f inal was obtained, as the final segmentation results. It could be
expressed as follows:
Pre f inal = Fusion(Pre1, Pre2).
图3。C-UNet的体系结构。3.2.擦除方法采用阈值擦除方法去除标准UNet已经分割的部分道路区域,使第三模块中的多尺度扩张卷积UNet模型可以对难以分割的道路区域进行分割。具体来说,让δ表示阈值,Funet(i,j)表示标准UNet、F0i,j的输出特征图像的值,表示删除后特征图像的第i行和第j列的值。阈值擦除可以表示如下:
Figure 3. The architecture of C-UNet.
3.2. Erasing Methods
The partial road areas that had already been segmented by the standard UNet were
erased by using threshold erasing method, so that the model, namely multi-scale dilated
convolution UNet in the third module could segment the road areas that were difficult to
be segmented. Specifically, let δ denote the threshold value, Funet(i, j) represent the value of the row i and column j of the output feature images of the standard UNet, F0i,j indicate the value of the row i and column j of the feature images after erasing. The threshold erasing could be
expressed as follows:
使用阈值擦除不仅消除了错误的分割区域,而且使MD-UNet分割成为难以分割的道路区域。3.3.多尺度扩展卷积单元扩展卷积经常应用于语义分割[45,46]、目标检测和其他领域的[47,48]。它可以扩大接受域,捕获多尺度的上下文信息。考虑到遥感图像的道路的形状、宽度和长度不同,多尺度信息在遥感图像网络提取中是相当重要的。膨胀速率作为膨胀卷积的一个参数,是指在标准卷积核中填充的膨胀数。膨胀卷积,通过该参数,扩展了接受域而不引入额外的参数。为了获得更丰富的多尺度特征,我们使用了不同膨胀速率的多尺度膨胀卷积模块,具体结构如图4所示。扩张块作为一种多尺度扩张卷积模块,可以放大特征图像的接受域,获得更详细的局部信息。弧线多尺度扩张卷积UNet的体系结构如图5所示。
3.3. Multi-Scale Dilated Convolution UNet
Dilated convolution is frequently applied in semantic segmentation [45,46], target
detection, and other fields [47,48]. It can enlarge the receptive field and capture multi-scale
context information. Considering the roads of remote sensing image have different shapes,
width and length, multi-scale information is quite crucial in remote sensing image network
extraction. The dilation rate, as one parameter of dilated convolution, refers to the number
of dilation filled in the standard convolution kernel. Dilated convolution, by means of
the parameter, expands the receptive field without introducing additional parameters. In
order to access to more abundant multi-scale features, the multi-scale dilated convolution
module with different dilation rates was used, with the specific structure shown in Figure 4.
As a multi-scale dilated convolution module, dilated block can enlarge the receptive field
of feature images and obtain more detailed local information. The architecture of the
multi-scale dilated convolution UNet is shown in Figure 5.
图5。多尺度展开卷积单元的体系结构。3.4.融合过程采用标准UNet对第一模块提取的道路信息相对容易提取,而采用多尺度扩张卷积UNet对第三模块提取的道路信息相对较薄、较弱。这两个模块的结果是互补的。为了获得完整、准确的分割结果,需要在得到UNet和多尺度展开卷积UNet的结果后,将两种分割结果进行融合。融合过程如下式所示。
Figure 5. The architecture of multi-scale dilated convolution UNet.
3.4. Fusion Process
The road information extracted in the first module by standard UNet was relatively
easy to extract, while road information extracted in the third module by multi-scale dilated
convolution UNet was relatively thin and weak. The results of the two modules were
complementary to each other. In order to obtain complete and accurate segmentation
results, it was necessary to fuse the two segmentation results after obtaining the results
of UNet and multi-scale dilated convolution UNet. The fusion process is shown in the
following formula.
融合(Pre1、Pre2)= (0,如果Pre1(i、j)= Pre2(i、j)= 0 1,否则,(9)其中Pre1和Pre2表示要融合的两个分割图像,Pre1(i、j)和Pre2(i、j)表示要融合位置(i、j)的两个特征图像的像素值。从上式可以看出,在第一模块和第三模块的两个分割结果中,对未分割区域进行互补融合后,可以获得更完整、更准确的道路信息,进一步提高分割性能。
Fusion(Pre1, Pre2) = ( 0, if Pre1(i, j) = Pre2(i, j) = 0
1, otherwise
, (9)
where Pre1 and Pre2 represent the two segmentation images to be fused, and Pre1(i, j) and
Pre2(i, j) represent the pixel values of the two feature images to be fused at the position (i, j).
As seen from the above formula, after complementary fusing of the unsegmented regions
in the two segmentation results of the first and the third module, we could achieve more
complete and accurate road information, further improving the segmentation performance.
3.5.本研究以损失函数和二值交叉熵作为目标函数。让Prep f表示模型预测的第p幅图像,GTp表示图像的p的地面真实值,Gtip、j表示位置(i、j)的像素值,preip、j表示模型预测的图像位置(i、j)的像素值,N表示训练样本数,W和H分别表示图像的宽度和高度。然后,二进制交叉熵损失可以表示为以下形式:Lbce=BCE损失(准备最终,=最终)=−N∑p=1W∑i=1H∑j=1[(×,j×日志,j)+(1−gtip,j)×日志(1−准备,j))。

3.5. Loss Function
Binary cross-entropy was used as the target function in the study. Let Prep
f inal represent the p-th image predicted by the model, GTp denote the ground truth of the image p, gtip,j be the pixel values of ground truth at position (i, j), preip,j indicate the pixel values of the image predicted by the model at the position (i, j), N express the number of training samples, and W and H suggest the width and height of the image, respectively. Then, the binary
cross-entropy loss could be expressed in the following form:
Lbce = BCELoss(Prep f inal, GTp
f inal) = − N∑p=1 W∑i=1 H∑j=1 [(gtip,j × log preip,j
) + (1 − gtip,j) × log(1 − preip,j )]
4.实验结果在本节中,我们首先介绍实验中使用的公共数据集,即马萨诸塞州路数据集[42]。然后,我们给出了实现细节和评价指标。接下来,我们进行了消融研究,以验证该模型及其子模块的有效性。最后,我们将我们的模型与最先进的模型进行了比较,以证明该方法的优越性。所有实验均采用Pypoush(1.3.0版本)框架实现,并在英伟达特斯拉K40c GPU服务器上进行,内存大小为11 GB,CPU为英特尔XeonE5-2643CPU,Windows 7操作系统。
4. Experimental Results
In this section, we first introduce the public dataset used in the experiment, Massachusetts Road dataset [42]. Then, we give the implementation details and evaluation indexes. Next, we perform the ablation studies to verify the effectiveness of the model and its submodules. Finally, we compare our model with the state-of-the-art models to prove the superiority of our method. All the experiments were realized with the Pytorch (version of 1.3.0) framework and were conducted on Nvidia Tesla K40c GPU server, with the memory size of 11 GB, the CPU of Intel Xeon E5-2643, and the operating system of Windows 7.
4.1.数据集马萨诸塞州道路数据集通常用于道路提取遥感图像[42]。它包含1171张图像,分辨率为1500×1500,二进制分割标签(黑色表示非道路区域,白色表示道路面积)。它覆盖的地区范围广泛,涉及到美国超过2600平方公里的城市、郊区和农村地区。图6显示了马萨诸塞州的地理参考地图。
4.1. Dataset
Massachusetts Road dataset is usually used for road extraction of remote sensing images [42]. It contains 1171 images, with the resolution of 1500 × 1500, and binary segmentation label (black represents non-road area, and white represents road area). It covers a wide range of areas, involving more than 2600 km2 of urban, suburban and rural areas in the United States. Figure 6 displayes the Geo-referenced Map of Massachusetts.
图6.马萨诸塞州的地理参考地图。由于实验中使用的服务器的视频内存有限,分辨率为1500×1500的遥感图像不能直接用于训练。因此,将数据集中的图像分割为分辨率为1500×1500的图像和对应的地面真实图像,分辨率为512×512的图像。具体的分割步骤如下:首先,在1500×1500图像(基础图像)上滑动一个512×512边界盒模板;切割四个角和最中间的图像;依次拍摄四个角上四个边界框中相邻两个模板中间的图像。这个
Figure 6. The geo-referenced map of Massachusetts.
Due to the limited video memory of the server used in the experiment, remote sensing
images with the resolution of 1500 × 1500 could not be directly used for training. Thus,
the images in the dataset were pretreated by dividing each image with the resolution
of 1500 × 1500 and its corresponding ground truth image into 9 with the resolution of
512 × 512. The specific dividation steps were as follows: first, a 512 × 512 bounding box
template was taken and slid on the 1500 × 1500 image (base image); the images on the
four corners and the most middle were cut out; the images in the middle of each adjacent
two templates in the four bounding boxes on the four corners were taken, in turn. The
具体的划分方法如图7所示。这9张图片的区域相互重叠。此外,通过分析,发现一些图像的地面真实度被错误标记,因此排除了这些图像。最终获得了8960张分辨率为512×512的图像。根据训练集的划分,对原始数据集的验证集和测试集,分别获得了8361张训练图像、126张验证图像和433张测试图像。图8显示了马萨诸塞州道路数据集中的一个图像示例及其相应的地面真相。
specific dividing method is shown in Figure 7. The regions of the 9 images overlapped
each other. Furthermore, based on analysis, it was found that the ground truth of some
images was wrongly marked, so such images were excluded. In the end, 8960 images with
the resolution of 512 × 512 were obtained. According to the division of the training set,
verification set and test set of the original data set, 8361 training images, 126 verification
images, and 433 test images were obtained, respectively. Figure 8 shows an example of the
image in the Massachusetts road dataset and its corresponding ground truth.
图7:划分方法。对于左边的图像,我们用5个方框表示,用红方框和绿方框表示,得到5个不同的子图像。对于右边的图像,我们将图像分割为另外5个框,在图像中分别表示有红色框、蓝色框和绿色框,从而得到另外5个子图像。
Figure 7. The dividing method. For the left image, we divide the image with five with five boxes, as denoted in the image with red boxes and the green box, getting 5 different sub-images. For the right image, we divide the image with another five boxes, as denoted in the image with red boxes, the blue boxes, and the green box, achieving another 5 sub-images.
图8。马萨诸塞州数据集中的图像示例。4.2.实施细节和评价指标4.2.1。我们使用小批量随机梯度下降来优化我们的模型的参数。在训练过程中,所有实验均通过Adam方法对模型参数进行了优化。训练迭代次数为15次,初始学习率设置为2×10−4。经过8次训练后,学习率变为2×10−5。输入图像的大小为512×512,小批量图像的大小设置为1。
Figure 8. Examples of images in the Massachusetts dataset.
4.2. Implementation Details and Evaluation Indicators
4.2.1. Implementation Details
We use mini-batch stochastic gradient descent to optimize the parameters of our model. During training, the parameters of our model were optimized through Adam method in all experiments of the study. The number of training iterations was 15, and the initial learning rate was set as 2 × 10−4 . After 8 training, the learning rate became 2 × 10−5. The size of the input image was 512 × 512, and the size of the mini-batch was set as 1.
4.2.2.评价指标实验采用两个评价指标,即Union(mIOU)[49]和平均骰子系数(mDC)[50],来辅助评价不同模型的质量。mIOU是指生成的候选框与原始标记框的重叠率,即交集与并集的比率。mIOU越大,分割结果越好。让pii表示正确的元素的数量预测,pij代表的数量的真实值和j的预测值,pji的真实值j和我的预测值,和k意味着类别分类的数量。然后,mIOU可以表示为:
.4.2.2. Evaluation Indicators
Two evaluation indexes, namely the Mean Intersection over Union (mIOU) [49] and
Mean Dice coefficient (mDC) [50], were used in the experiment to assist the evaluation of
the quality of different models.
mIOU refers to the overlap rate of the generated candidate boxes and the original
marker boxes, that is, the ratio between the intersection and the union. A greater mIOU

means a better segmentation result. Let pii denote the number of correct elements predicted,
pij represent the number of ones with the true values of i and the predicted value of j, pji
be the number of ones with the true values of j and the predicted value of i, and k mean
the number of categories to be classified. Then, the mIOU could be expressed as follows:
mIOU = 1 k + 1 k∑i=0 pii ∑kj=0 pij + ∑kj=0 pji − pii . (11) As a measurement function of set similarity, the Dice coefficient can be used to calculate the similarity between the segmentation images and ground truth. A larger mDC stands for a better segmentation result. Let Prep f inal represent the segmentation result of image p, GTp final denote the ground truth of image p, and N represent the number of training samples. Then, the mDC could be expressed in the following form: mDC = Dice(Prep f inal, GTp f inal) = ∑Np=1 |Prep f inal ∩ GTp| ∑Np=1(|Prep
f inal| + |GTp|) mIOU = 1 k + 1 k∑i=0 pii ∑kj=0 pij + ∑kj=0 pji − pii .(11)利用骰子系数作为集合相似度的测量函数,可以用来计算分割图像与地面真实值之间的相似度。分割的mDC越大,分割效果越好。设Prep f inal表示图像p的分割结果,GTp f分别表示图像p的地面真实值,N表示训练样本的数量。然后,mDC可以以以下形式表达: mDC =骰子(准备最终,GTp最终)=∑Np=1|准备最终∩GTp|∑Np=1(|准备f inal| + |GTp|)

4.3. Ablation Study
In this subsection, we first explore the effect of different erasing methods and erasing
thresholds on network performance. Then, we verify the effectiveness of multi-scale dilated
convolution UNet and the fusion process. Finally, we compare our model with the previous
state-of-the-art models to demonstrate the superiority of our model.
4.3.1. Ablation Study on the Method of Erasing
Firstly, we discuss the influence and the necessary the erasing method on the performance of C-UNet. We use two erasing methods after the first module of UNet to obtain
the first segmentation result. The first one was threshold erasing, represented as C-UNetthreshold (i.e., C-UNet). In this method, we set the threshold with 0.7 in advance, and then
erase the pixels in the segmentation result that greater the threshold to get the segmentation
results after erasing. The other one was random bounding box block erasing [51], expressed
as C-UNet-random. In this method, we use a rectangular box with random size and all
pixels of 0 to randomly block a certain region in the segmentation result, so as to achieve
the purpose of erasing. Besides, we omit the erasing method after the first module and
represented the model as C-UNet_no_erase. The specific experimental results of these
three methods are shown in Table 1, Figures 9 and 10.
4.2.消融研究在本小节中,我们首先探讨了不同的擦除方法和擦除阈值对网络性能的影响。然后,我们验证了多尺度扩张卷积UNet和融合过程的有效性。最后,我们将我们的模型与之前的先进模型进行比较,以证明我们的模型的优越性。4.3.1.通过对烧蚀方法的研究,首先讨论了烧蚀方法对C-UNet性能的影响和必要性。我们在UNet的第一个模块之后使用两种擦除方法来得到第一个分割结果。第一个是阈值擦除,表示为C-UNet阈值(即C-UNet)。在该方法中,我们预先设置阈值为0.7,然后擦除分割结果中的像素,大于阈值,擦除后得到分割结果。另一个是随机边界框块删除除,表示为c-unet-[51]。在该方法中,我们使用一个随机大小的、所有像素都为0的矩形框来随机阻塞分割结果中的某个区域,以实现擦除的目的。此外,我们还省略了在f之后的擦除方法,此外,我们在第一个模块之后省略了擦除方法,并将该模型表示为C-UNet_no_erase。三种方法的具体实验结果见表1、图9和图10。
能够使用1。不同的擦除方法的结果。
table 1. Results of different erasing methods.

图9.采用不同清除方法的C-UNet折线图
Figure 9. Line chart of C-UNet with different erasing methods
图10:三种不同方法的分割结果。
Figure 10. The segmentation results of three different methods.
从表1、图9和图10可以看出:(1)C-Unet-随机得到的mIOU和mDC值分别为0.613和0.739,而C-Unet-阈值得到的mIOU和mDC值分别为0.635和0.758。与C-UNet-randod结果相比,C-UNet阈值的结果分别提高了0.021和0.017。这可能是因为第一个模块分割结果中较明显的分割区域被阈值抹去,使得第三个模块中的UNet,即多尺度扩张卷积UNet,更加关注那些难以分割的目标区域。(2)矩形框随机擦除方法采用不同的矩形框对第一个模块中的分割结果进行随机擦除。此时,擦除区域的布局并没有成为目标,直接使第三个模块中的UNet分割没有目的。因此,固定擦除有助于提高C-UNet的分割性能。
From Table 1, Figures 9 and 10, we can find that:
(1) The values of mIOU and mDC obtained by C-UNet-random were 0.613 and 0.739,
while the values of mIOU and mDC obtained by C-UNet-threshold were 0.635 and
0.758, respectively. Comparing to the result of C-UNet-random, the results of C-UNetthreshold were improved by 0.021 and 0.017, respectively. It is possibly because the
more obvious segmentation regions in the segmentation results of the first module
were erased by threshold erasing method, making the UNet in the third module, i.e.,
the multi-scale dilated convolution UNet, pay more attention to those targeted regions
that were difficult to be segmented.
(2) The rectangular box random erasing method used different rectangular boxes to
randomly erase the segmentation results in the first module. At this time, the erasing
area layout was not targeted, directly making the UNet segmentation in the third
module be purposeless. Therefore, fixed erasing could help improve the segmentation
performance of C-UNet.
(3)C-UNet得到的mIOU和mDC值分别为0.635和0.758,比C-UNet_no_erase得到的结果提高了0.006(mIOU和mDC值分别为0.629和0.752)。结果表明,采用擦除过程的C-UNet的分割效果优于不采用C-UNet的分割过程,即采用擦除过程需要提高C-UNet的性能。
(3) The values of mIOU and mDC obtained by C-UNet were, respectively, 0.635 and 0.758,
which were improved by 0.006 to the result of C-UNet_no_erase (0.629 and 0.752 for
mIOU and mDC, respectively). It was indicated that better segmentation results were
obtained in C-UNet with erasing process than C-UNet without erasing process, i.e.,
the erasing process was necessary to improve the performance of C-UNet.
4.3.2.其次,我们探讨了不同的擦除阈值对C-UNet分割性能的影响。依次选择0.5、0.5、0.7和0.9,得到相应的C-UNet分割结果。具体实验结果见表2、图11、图12。表2。在不同的擦除阈值下,C-UNet的性能
4.3.2. Ablation Study on the Threshold of Erasing
Secondly, we explore the influence of different erasing thresholds on the segmentation
performance of C-UNet. The erasing thresholds of 0.5, 0.7, and 0.9 were selected, in turn, and the corresponding segmentation results of C-UNet were obtained. The specific experimental results are shown in Table 2, Figures 11 and 12. Table 2. Performance of C-UNet with different erasing thresholds
表2.具有不同擦除阈值的C-UNet的性能。
Table 2. Performance of C-UNet with different erasing thresholds.
图11.具有不同清除阈值的C-UNet折线图
Figure 11. Line chart of C-UNet with different erasing thresholds
根据表2,图11和图12,在0.5、0.7和0.9的阈值下,C-UNet得到的mIOU分别为0.632、0.635和0.636,C-UNet得到的mDC分别为0.755、0.758和0.756。当阈值为0.7时,C-UNet得到的mIOU和mDC分别为0.635和0.758,高于阈值为0.5时的结果。相对于阈值为0.9下的结果,mIOU下降了0.001,mDC增加了0.02。因此,在综合考虑结果后,最终选择0.7的阈值作为擦除阈值。
Based on Table 2, Figures 11 and 12, under the thresholds of 0.5, 0.7, and 0.9, the mIOU
obtained by C-UNet were 0.632, 0.635, and 0.636, respectively, and the mDC obtained by
C-UNet were 0.755, 0.758, and 0.756, respectively. With the threshold of 0.7, the mIOU and
mDC obtained by C-UNet were 0.635 and 0.758, respectively, higher than the results when
the threshold was 0.5. Relatively to the results under threshold of 0.9, mIOU decreased by
0.001, while mDC increased by 0.02. Therefore, the threshold of 0.7 was finally selected as
the erasing threshold in the study after comprehensive consideration of the results.

图12。具有不同消除阈值的C-UNet的最终结果。4.3.3.随后,我们对MD-UNet的有效性进行了讨论。对于第三个模块中的UNet的选择,我们使用了UNet [38]、非本地Block [52]、FCN [29]和MD-UNet [43]进行实验。因此,我们得到了四种不同的模型,分别表示为UNet_UNet、UNet_Non-local、UNet_FCN和UNet_MD-UNet(即C-UNet)。表3,图13和图14显示了它们的结果。表3.在第三个模块中,具有不同模型的C-UNet的分割结果。
Figure 12. The final results of C-UNet with different erasing thresholds.
4.3.3. Ablation Study on Dilated UNet
Later on, we argue about the effectiveness of MD-UNet. For the selection of UNet
in the third module, UNet [38], Non-local Block [52], FCN [29], and MD-UNet [43] were
used for experiments. Therefore, we got four different models represented as UNet_UNet,
UNet_Non-local, UNet_FCN, and UNet_MD-UNet (i.e., C-UNet), respectively. Table 3,
Figures 13 and 14 show their results.
Table 3. Segmentation results of C-UNet with different models in the third module.

如表3所示,图13和图14,UNet_UNet、UNet_Non-local、UNet_FCN和UNet_MD-UNet(C-UNet)得到的mIOU值分别为0.622、0.615、0.606和0.635,它们得到的mDC值分别为0.744、0.739、0.730和0.758。UNet_MD-UNet(C-UNet)分别获得了最高的mIOU和mDC,这可能是因为多尺度扩张卷积可以融合不同尺度的特征图像,在解码器中获得更详细的分割结果。
As shown in Table 3, Figures 13 and 14, the values of mIOU obtained by UNet_UNet,
UNet_Non-local, UNet_FCN, and UNet_MD-UNet(C-UNet) was 0.622, 0.615, 0.606, and
0.635, respectively, and the results of mDC obtained by them was 0.744, 0.739, 0.730, and
0.758, in turn. UNet_MD-UNet(C-UNet) obtained the highest mIOU and mDC, respectively,
possibly because multi-scale dilated convolution could fuse feature images from different
scales and obtain more detailed segmentation results in the decoder.

图13.第二阶段不同模型的C-UNet分割结果柱状图
Figure 13. Bar graph of segmentation results of C-UNet with different models in the second stage

图14.第二阶段使用不同模型的C-UNet的最终结果。
Figure 14. The final results of C-UNet with different models in the second stage.
4.3.4.最后,我们说明了C-UNet融合模块的有效性。我们比较了C-UNet的第一个模块(即UNet)、C-UNet的第三个模块(即MD-UNet)的分割结果,以及融合两个模块的结果后的分割结果。表4、图15和图16为相应的结果。表4、融合前后结果比较
4.3.4. Ablation Study on the Fusion
Finally, we illustrate the effectiveness of fusion module of C-UNet. We compare the
segmentation results of the first module (i.e., UNet) of C-UNet, the third module (i.e.,
MD-UNet) of C-UNet, and the segmentation results after fusing the results of the two
modules. Table 4, Figures 15 and 16 show the corresponding results.
Table 4. Comparison of results before and after fusion

从表4、图15和图16可以看出,第一模块UNet对应的mIOU和mDC值分别为0.614和0.738,第三模块MD-UNet对应的mDC值分别为0.618和0.743。最后,融合模块的mIOU和mDC分别为0.635和0.758,比UNet高2.1%,比MD-UNet高1.7%。这表明有必要融合两个不同模块的结果。
As seen from Table 4, Figures 15 and 16, the values of mIOU and mDC corresponding
to the UNet in the first module were 0.614 and 0.738, respectively, and those corresponding
to the MD-UNet in the third module were 0.618 and 0.743, respectively. Besides, finally, the
fusion module got mIOU and mDC of 0.635 and 0.758, 2.1% higher than that of UNet and
1.7% higher than that of MD-UNet. This indicates that it is necessary to fuse the results of
two different modules.

图15:融合前后结果的折线图。
Figure 15. Line chart of results before and after fusion.
4.4. Comparison of C-UNet with Other Models in Remote Sensing Image Road Extraction
In this subsection, we verify the effectiveness of the C-UNet method by comparing
it with existing models. The state-of-the-art models in the previous remote sensing road
segmentation task, namely UNet [38], ResUNet [36], AttUNet [37], and DinkNet34 [43]
were selected for comparison. The corresponding results are shown in Table 5, and the
corresponding segmentation examples are listed in Figures 17 and 18.
As seen from Table 5 and Figures 17 and 18, the values of mIOU obtained by UNet [38],
ResUNet [36], AttUNet [37], DinkNet34 [43], and C-UNet were 0.599, 0.600, 0.616, 0.607,
and 0.635, respectively, and the corresponding value of mDC was 0.725, 0.721, 0.740,
0.733, and 0.758, respectively. C-UNet obtained the highest mIOU and mDC, respectively.
Compared with the results of the other 4 models, the mIOU obtained by C-UNet was
improved by 0.036, 0.035, 0.019, and 0.028, respectively, and mDC was improved by 0.033,
0.037, 0.018, and 0.025, respectively. Meanwhile, C-UNet obviously got better segmentation
results than the other four model, especially for small roads. Therefore, C-UNet obtained
better results than other models and achieved state-of-the-art results.
4.3.在本小节中,我们通过对C-UNet方法与现有模型的比较,验证了C-UNet方法的有效性。我们选择了之前的遥感道路分割任务中最先进的模型,即UNet [38]、ResUNet [36]、AttUNet [37]和DinkNet34 [43]进行比较。对应的结果如表5所示,对应的分割示例如图17、图18所示。从表5和图17和图18可以看出,UNet [38]、ResUNet [36]、AttUNet [37]、DinkNet34 [43]和C-UNet得到的mIOU值分别为0.599、0.600、0、0.616、0.607和0.635,mDC值分别为0.725、0.721、0.740、0.733和0.758。C-UNet分别获得了最高的mIOU和mDC。与其他4个模型相比,C-UNet得到的mIOU分别提高了0.036、0.035、0.019和0.028,mDC分别提高了0.033、0.037、0.018和0.025。同时,C-UNet明显得到了比其他四种模型更好的分割结果,尤其是在小型道路上。因此,C-UNet
表5.不同型号的性能
Table 5. Performance of different models

图16、融合前后的结果图像
Figure 16. Image of results before and after fusion
图17、不同型号性能的折线图
Figure 17. Line chart of different model performance

Figure 18. Segmentation results of different models.
5. Discussion
This section first identifies the simulation of our work and the organization of this paper, then discusses the open research lines generated by this work, establishing a roadmap
图18。不同模型的分割结果。 5.本节首先确定了我们工作的模拟和本文的组织,然后讨论了由本工作产生的开放研究线,建立了一个路线图
for future works and improvements. These can be concluded in four aspects: simulation of
our work, the structure of this paper, further research and application of our work.
5.1. Simulation of Our Work
In order to implement the model, we use Python and the open source neural network
library Pytorch.
For the image processing, we first split a remote sensing image into 9 subimages,
which saves memory space while training the model; then, we shuffle the images to finish
the image processing.
For building the model C-UNet, we first employ the UNet model to get the first
segmentation result. Secondly, we use the fixed threshold erasing method to erase the
obvious segmentation regions. Thirdly, we construct the MD-UNet to do the second
segmentation. Finally, we fusion the results of the first and the second segmentation.
为未来的工作和改进做准备。这些可以从四个方面得出:我们工作的模拟、本文的结构、我们工作的进一步研究和应用。5.1.为了实现这个模型,我们使用了Python和开源神经网络库Pytorch。在图像处理中,我们首先将一个遥感图像分割成9个子图像,在训练模型时节省了内存空间;然后,我们洗牌图像来完成图像处理。在建立C-UNet模型时,我们首先使用UNet模型得到第一个分割结果。其次,我们使用固定阈值擦除的方法来去除明显的分割区域。第三,我们构造了MD-UNet来进行第二次分割。最后,我们融合了第一次和第二次分割的结果。
对于损失函数,我们使用传统的二值交叉熵损失,分割结果中的道路部分为1,非道路部分为0。对于C-UNet的参数部分,我们使用Adam优化器进行优化,学习速率设置为0.0002,epoch设置为15,第9阶段的学习速率设置为原来的10倍。我们的代码运行在特斯拉K40上。为了可视化结果(在网络经过训练和测试之后),我们使用了矩阵库库。对于得到的分割特征图,我们将大于0.5的图像重置为255,并将其他部分重置为0以进行可视化。
对于损失函数,我们使用传统的二值交叉熵损失,分割结果中的道路部分为1,非道路部分为0。对于C-UNet的参数部分,我们使用Adam优化器进行优化,学习速率设置为0.0002,epoch设置为15,第9阶段的学习速率设置为原来的10倍。我们的代码运行在特斯拉K40上。为了可视化结果(在网络经过训练和测试之后),我们使用了矩阵库库。对于得到的分割特征图,我们将大于0.5的图像重置为255,并将其他部分重置为0以进行可视化。
For the loss function, we use the traditional binary cross-entropy loss, the road part in
the segmentation results is 1, and the non-road part is 0. For the parameter part of C-UNet,
we use the Adam optimizer to optimize, the learning rate is set to 0.0002, the epoch is set to
15, and the learning rate in the 9th stage becomes 10 times of the original. Our code runs on
Tesla K40. In order to visualize the results (after the network was trained and tested), we
use the matplotlib library. For the obtained segmentation feature map, we reset the image
larger than 0.5 to 255, and we reset the other parts to 0 for visualization.
5.2.本文的结构本文由六个部分组成。它们分别是导言、UNet、C-UNet、实验结果、讨论和结论。具体来说,在引言部分,我们首先解释了道路开采的重要性,并指出了道路开采的挑战。然后,我们分析了所提出的遥感图像道路提取方法,总结了存在的问题,并提出了我们的方法。其次,在UNet部分中,我们简要介绍了UNet的总体架构,然后给出了UNet的具体参数值。这是我们的模型的基础。
5.2. The Structure of this Paper
This paper includes six parts. They are Introduction, UNet, C-UNet, Experimental
Results, Discussion, and Conclusions, respectively.
Specifically, in the Introduction section, we first explained the significance of road
extraction and pointed out the challenge of road extraction. Then, we analyzed the methods proposed for road extraction from remote sensing images, summarized the existing
problems, and presented our method.
Secondly, in the UNet section, we briefly introduced the overall architecture of UNet,
and then gave the specific parameter values of UNet. This is the basis of our model.
第三,在C-UNet部分中,我们首先给出了C-UNet的总体描述,包括其流程图、模型框架和数学描述。随后,我们依次详细描述了这四个模块。最后,我们在训练过程中引入了损失函数。第四,在实验结果部分,我们首先描述了在我们的实验中使用的公共数据集。然后,我们指定了实施细节和评价指标。随后,我们进行了一系列的消融研究来评估我们模型的每个模块的有效性。最后,我们将其与其他四种最先进的遥感图像道路分割方法进行了比较。第五,在讨论部分,我们解释了我们的模型的仿真,并指出了未来的研究,并展示了我们的模型的应用。最后,在结论部分,我们给出了我们的工作的结论,分析了其性能较好的原因,并指出了未来的研究方向。
第四,Thirdly, in the C-UNet section, we first gave an overall depiction of C-UNet, including
第五,its flowchart, the model framework, and mathematical description. Later on, we described
第六,the four modules successively in detail. Finally, we introduced the loss function in the
第七,training process.
第八,Fourthly, in the Experimental results section, we first described the public dataset
第九,used in our experiments. Then, we specified the implementation details and the evaluation
第十,indicators. Later on, we conducted a series of ablation studies to evaluate the effectiveness
第十一,of each module of our model. Finally, we compared it with other four state-of-the-art
第十二,methods on road segmentation from remote sensing images.
第十三,Fifthly, in the Discussion section, we explained the simulation of our model and point
out the future research, as well as showed the applications of our model.
Finally, in the Conclusion section, we give a conclusion of our work, analyzed the
第十四,reason of its better performance, and pointed out the future research direction.
5.3.在未来的研究中,我们将以C-UNet作为更通用的模型,从两个方面进一步提高其性能。一是结合注意方案,选择重要的特征来分割道路。注意力是大脑神经系统解决信息过载问题的主要手段。它是一种计算能力有限情况下的资源分配方案,将计算资源分配给更重要的信息。在注意方案的帮助下,我们可以引导C-UNet更加关注道路,而忽略其他对象,如建筑物、河流、汽车等。另一种方法是用轻量级的方法简化其参数,以实现实时分割。为了提高C-UNet的表示能力,我们需要
5.3. Further Research
In the future, we will take C-UNet as a more general model and further improve
its performance from two aspects. One is to combine with the attention scheme to select
important features to segment the roads. Attention is the main means of the brain nervous
system to solve the problem of information overload. It is a resource allocation scheme
in the case of limited computing power, which allocates computing resources to more
important information. With the help of attention scheme, we can guide C-UNet to pay
more attention on roads, while ignoring other objects, such as buildings, rivers, cars, etc.
The other is to simplify its parameters with the light-weight methods, with the purpose
of real-time segmentation. To improve the representation ability of C-UNet, we need to
build a deep enough architecture and design a complex enough network structure. This
means a long model training cycle and more machine memory. Therefore, it is a problem
worthy of further study to build a lightweight model and to speed up the convergence of
the model.
5.4. Application of C-UNet
This work has many applications. First, it is often difficult to acquire ground information in disaster relief if the ground object targets in the natural disasters (such as
earthquakes, landslides, and torrential rain) are seriously damaged. This work can help to
analyze the disaster situation quickly and conveniently. Secondly, roads play an important
role in urban development, and this work can assist the government in road planning.
Last but not least, considering the high similarity between roads and retinas, this work
can be directly employed to extract retinas from medical images, helping doctors to better
diagnose and treat diseases.
构建一个足够深的架构,设计出一个足够复杂的网络结构。这意味着一个较长的模型训练周期和更多的机器内存。因此,建立一个轻量级的模型并加快模型的收敛速度是一个值得进一步研究的问题。5.4.这项工作有许多应用。首先,如果自然灾害(如地震、滑坡、暴雨)中的地面物体目标受到严重破坏,那么在救灾中往往难以获取地面信息。本工作可以帮助人们快速、方便地分析灾害情况。其次,道路在城市发展中发挥着重要的作用,这项工作可以帮助政府进行道路规划。最后但并非最不重要的是,考虑到道路和视网膜之间的高度相似性,这项工作可以直接从医学图像中提取视网膜,帮助医生更好地诊断和治疗疾病。

5.结论本文提出了一种新的遥感图像道路提取模型。我们提出的模型包括四个模块来连续提取道路信息,并证明了它更适合于从高分辨率遥感图像中提取道路信息。在互补学习的帮助下,利用标准的UNet提取相对明显的道路,然后引入MD-UNet提取补充和更精细的道路信息。利用多尺度扩张卷积,MD-UNet可以在不降低特征图分辨率的情况下增加接受域。这使得它能够更好地应对不同的宽度、长度和形状的道路。消融研究表明,所提出的模型及其子模块是有效的。对比实验表明,该模型的计算性能优于现有的计算方法。在未来,我们将扩展拟议的C-UNet,以划分不同类型的道路,如森林和砾石道路。此外,我们将把它作为一个更一般的模型,与基于注意力的方法和轻量级的方法相结合,使其性能更好。
6.6. Conclusions
7.In this paper, we proposed a new model for road extraction in remote sensing images.
8.Our proposed model includes four modules to extract the road information successively,
9.which is demonstrated to be more suitable for extracting road information from highresolution remote sensing images. With the help of complementary learning, a standard
10.UNet is utilized to extract relatively obvious roads, and then the MD-UNet is introduced
11.to extract complement and finer road information. Utilizing the multi-scale dilated convolution, MD-UNet can increase the receptive field without reducing the resolution of
12.the feature map. This makes it better cope with different width, length, and shapes of
13.roads. Ablation studies indicated that the proposed model and its submodules were effective. Comparative experiments showed that the proposed model outperformed the
14.state-of-the-art methods.
15.In the future, we will extend the proposed C-UNet to segment different types of roads,
16.such as forest and gravel roads. Besides, we will take it as a more general model to combine
17.with attention-based methods and light-weight methods to make its performance better.
作者贡献:软件,写作-初稿,Y.H.。;写作-评论,编辑,z。;概念化,方法论,写作-评论,编辑,T.Z。;所有作者都已阅读并同意了该手稿的出版版本。基金资助:国家自然科学基金项目(61876010,61806013,61906005);市科委科技计划项目(KM202110005028)。机构审查委员会的声明:不适用。知情同意声明:不适用。数据可用性声明:不适用。感谢:作者希望感谢编辑和审稿人提出的宝贵建议。利益冲突:作者声明没有利益冲突。
Author Contributions: Software, Writing—original draft, Y.H.; Writing—review, editing, Z.L.;
Conceptualization, Methodology, Writing—review, editing, T.Z.; Supervision, Y.L. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by National Natural Science Foundation of China (No. 61876010,
61806013, 61906005), and Scientific and Technology Program Municipal Education Commission
(KM202110005028).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The authors wish to thank the editors and reviewers for their valuable advise.
Conflicts of Interest: The authors declare no conflict of interest.

你可能感兴趣的:(big,data,大数据,视觉检测,图像处理)