An Efficient Solution for Breast Tumor Segmentation and Classification in Ultrasound Images Using Deep Adversarial Learning论文翻译
摘要:
Abstract. This paper proposes an efficient solution for tumor segmentation and classification in breast ultrasound (BUS) images. We propose to add an atrous convolution layer to the conditional generative adversarial network (cGAN) segmentation model to learn tumor features at different resolutions of BUS images. To automatically re-balance the relative impact of each of the highest level encoded features, we also propose to add a channel-wise weighting block in the network. In addition, the SSIM and L1-norm loss with the typical adversarial loss are used as a loss function to train the model. Our model outperforms the state-of-the-art segmentation models in terms of the Dice and IoU metrics, achieving top scores of 93.76% and 88.82%, respectively. In the classification stage, we show that few statistics features extracted from the shape of the boundaries of the predicted masks can properly discriminate between benign and malignant tumors with an accuracy of 85%.
本文提出了一种有效的乳腺超声(BUS)图像肿瘤分割和分类解决方案。 我们提出在条件生成对抗网络(cGAN)分割模型中添加一个空洞卷积层,以便在BUS图像的不同分辨率下学习肿瘤特征。 为了自动重新平衡每个最高级别编码特征的相对影响,我们还建议在网络中添加频道方向加权块。 此外,具有典型对抗性损失的SSIM和L1范数损失被用作训练模型的损失函数。 我们的模型在Dice和IoU指标方面优于最先进的细分模型,分别达到93.76%和88.82%的最高分。 在分类阶段,我们表明从预测掩模的边界形状中提取的很少的统计特征可以恰当地区分良性和恶性肿瘤,准确度为85%。
1 Introduction
Breast cancer is one of the most commonly diagnosed causes of death in women worldwide [14]. Screening with mammography can recognize tumors in the early stages. Despite, some breast cancers may not be captured in mammographies (e.g., in the case of dense breasts). Ultrasound has been recommended as a powerful adjunct screening tool for detecting breast cancers that may be occluded in mammographies [8]. Computer-aided diagnosis (CAD) systems are widely used to detect, segment and classify masses in breast ultrasound (BUS) images. One of the main steps of BUS CAD systems is tumor segmentation。
乳腺癌是全世界女性最常被诊断的死因之一[14]。 使用乳房X线照相术进行筛查可以识别早期阶段的肿瘤。 尽管如此,一些乳腺癌可能不会在乳房X线照相术中被捕获(例如,在乳房密集的情况下)。 超声波被推荐作为一种强大的辅助筛查工具,用于检测可能在乳房X线照相术中被遮挡的乳腺癌[8]。 计算机辅助诊断(CAD)系统广泛用于检测,分割和分类乳房超声(BUS)图像中的质量。 BUS CAD系统的主要步骤之一是肿瘤分割。
Over the last two decades, several BUS image segmentation methods have been proposed, which can be categorized into semi-automated and fully automated according to the degree of human intervention. In [1], a region growing based algorithm was used to automatically extract the regions that contain the tumors, and image super-resolution and texture analysis methods were used to discriminate benign tumors from the malignant ones. Recently, some deep learning based models have been proposed to improve the performance of breast tumor segmentation methods. In [16], two convolutional neural network (CNN) architectures have been used to segment BUS images into the skin, mass, fibro-glandular, and fatty tissues (an accuracy of 90%). Hu et al [5] combined a dilated fully convolutional network with a phase-based active contour model to segment breast tumors, achieving dice score of 88.97%.
在过去的二十年中,已经提出了几种BUS图像分割方法,可以根据人为干预的程度将其分为半自动和全自动。 在[1]中,使用基于区域生长的算法来自动提取包含肿瘤的区域,并且使用图像超分辨率和纹理分析方法来区分良性肿瘤和恶性肿瘤。 最近,已经提出了一些基于深度学习的模型来改善乳腺肿瘤分割方法的性能。 在[16]中,已经使用两种卷积神经网络(CNN)架构将BUS图像分割成皮肤,质量,纤维腺和脂肪组织(准确度为90%)。 Hu等[5]将扩张的完全卷积网络与基于相位的活动轮廓模型相结合,对乳腺肿瘤进行分割,骰子评分达到88.97%。
Although these methods and others proposed in the literature do provide useful techniques, there are still challenges due to the high degree of speckle noise present in the ultrasound images, as well as to the high variability of tumors in shape, size, appearance, texture, and location. In this paper, we propose an efficient solution for breast tumor segmentation and classification in BUS images using deep adversarial learning.
The main contribution of this paper is to develop an efficient deep model for segmenting the breast tumor in BUS by combining an atrous convolution network (AC) and channel attention with channel weighting (CAW) in a cGAN model in order to enhance the discriminant ability of feature representations at multi-scale. Besides, we demonstrate that the proposed segmentation model can be used for characterizing accurate shape features from the segmented mask to discriminate between benign and malignant tumors. The rest of this paper is structured as follows. Section 2 presents the proposed model. Section 3 includes the results. Section 4 concludes our study and provides some lines of future work.
尽管文献中提出的这些方法和其他方法确实提供了有用的技术,但是由于超声图像中存在高度的斑点噪声,以及肿瘤在形状,大小,外观,质地方面的高度可变性,和位置,仍然存在挑战。。在本文中,我们提出了一种有效的解决方案,用于使用深层对抗学习在BUS图像中进行乳腺肿瘤分割和分类。
本文的主要贡献是通过在cGAN模型中将空洞卷积网络(AC)和信道注意与信道加权(CAW)相结合,开发一种用于分割BUS中乳腺肿瘤的高效深度模型,以提高多尺度的特征判别表达能力。此外,我们证明所提出的分割模型可用于表征来自分段掩模的精确形状特征以区分良性和恶性肿瘤。本文的其余部分结构如下。第2节介绍了拟议的模型。第3节包括结果。第4节总结了我们的研究,并提供了一些未来的工作。
2 Proposed Methodology
Generative adversarial network architecture: The proposed BUS image segmentation technique is based on generative adversarial training, which involves two interdependent networks: a generator G and a discriminator D. (Fig. 1). The generator generates a fake example from input noise z, while discriminator determines the probability that the fake example is from training data rather than generated by the generator.
2 提出的方法
生成对抗网络架构:提出的BUS图像分割技术基于生成对抗训练,其涉及两个相互依赖的网络:生成器G和鉴别器D.(图1)。 生成器从输入噪声z生成伪示例,而鉴别器确定假示例来自训练数据而不是生成器生成的概率。
Generator: The generator network incorporates an encoder section, made of seven convolutional layers (En1 to En7), and a decoder section, made of seven deconvolutional layers (Dn1 to Dn7) layers. We have modified the plain encoderdecoder structure by inserting an atrous convolution block [18] between En3 and En4, in addition to a CAW block between En7 and Dn1. The CAW block is an aggregation of a channel attention module [3] with channel weighting block [4]. In turn, the CAW block increases the representational power of the highest level features of the generator network, which turns out in a clear improvement of the accuracy of the breast tumor segmentation in ultrasound images (for more details, see suppl. A.1 and A.2).
生成器:生成器网络包括一个由七个卷积层(En1到En7)组成的编码器部分,以及一个由七个反卷积层(Dn1到Dn7)层组成的解码器部分。 除了En7和Dn1之间的CAW块之外,我们还通过在En3和En4之间插入一个空洞卷积[18]来修改普通的编码器解码器结构。 CAW块是具有信道加权块[4]的信道关注模块[3]的聚合。 反过来,CAW块增加了发生器网络的最高级别特征的代表性能力,这明显改善了超声图像中乳腺肿瘤分割的准确性(更多细节,参见附录A.1和 A2)。
By including the atrous convolutional block in-between encoder layers En3 and En4, the generator network is enabled to characterize features at different scales and also to expand the actual receptive field of the filters. As a consequence, the network is more aware of contextual information without increasing the number of parameters or the amount of computation. We use 1, 6 and 9 dilation rates with kernel size 3×3 and a stride of 2.
通过在编码器层En3和En4之间包括空洞卷积块,生成器网络能够以不同的比例表征特征并且还扩展滤波器的实际接收场。 结果,网络更多地意识到上下文信息而不增加参数的数量或计算量。 我们使用1,6和9膨胀率,内核大小为3×3,步幅为2。
Each layer in the encoder section is followed by batch normalization (except for En1 and En7) and LeakyReLU with slope 0.2, except for En7, where the regular non-linearity ReLU activation function is used. The decoder section is a sequence of transposed-convolutional layers followed by batch normalization, dropout with rate 0.5 (only in Dn1, Dn2, and Dn3) and ReLU. The filters of the convolutional and deconvolutional layers are defined by a kernel of 4 × 4 and they are shifted with a stride of 2. We add padding of 2 after En4, yielding a 4 × 4 × 512 output feature map. We also add skip connection between the corresponding layers in the encoder and decoder sections, which improve the features in the output image by merging deep, coarse, semantic information and simple, fine, appearance information. After the last decoding layer (Dn7), the tanh activation function is used as a non-linear output of the generator, which is trained to generate a binary mask of the breast tumor
编码器部分中的每一层之后是批量归一化(En1和En7除外)和LeakyReLU,斜率为0.2,En7除外,其中使用常规非线性ReLU激活功能。 解码器部分是一系列转置卷积层,然后是批量归一化,丢失速率为0.5(仅在Dn1,Dn2和Dn3中)和ReLU。 卷积层和反卷积层的滤波器由4×4的内核定义,它们以2的步幅移位。我们在En4之后添加2的填充,产生4×4×512输出特征映射。 我们还在编码器和解码器部分中的相应层之间添加跳过连接,这通过合并深度,粗略,语义信息和简单,精细的外观信息来改善输出图像中的特征。 在最后一个解码层(Dn7)之后**,tanh激活函数被用作生成器的非线性输出**,其被训练以生成乳房肿瘤的二元掩模。
Discriminator: It is a sequence of convolutional layers applying kernels of size 4×4 with a stride of 2, except for Cn4 and Cn5 where the stride is 1. Batch normalization is employed after Cn2 to Cn4. LeakyReLU with slope 0.2 is the non-linear activation function used after Cn1 to Cn4, while the sigmoid function is used after Cn5. The input of the discriminator is the concatenation of the BUS image and a binary mask marking the tumor area, where the mask can either be the ground truth or the one predicted by the generator network. The output of the discriminator is a 10×10 matrix having values varying from 0.0 (completely fake) to 1.0 (real).
****鉴别器:****它是一系列卷积层,应用大小为4×4且步幅为2的核,除了Cn4和Cn5,其中步幅为1.在Cn2至Cn4之后采用批量归一化。 具有斜率0.2的LeakyReLU是在Cn1至Cn4之后使用的非线性激活函数,而在Cn5之后使用S形函数。 鉴别器的输入是BUS图像和标记肿瘤区域的二元掩模的串联,其中掩模可以是基础事实或由发生器网络预测的事实。 鉴别器的输出是10×10矩阵,其值从0.0(完全假)到1.0(实数)变化。
Loss Functions: Assume x is a BUS image containing a breast tumor, y is the ground truth mask of that tumor within the image, G(x,z) and D(x,G(x,z)) are the outputs of the generator and the discriminator, respectively. The loss function of the generator G comprises three terms: adversarial loss (binary cross entropy loss), L1-norm to boost the learning process, and SSIM loss [15] to improve the shape of the boundaries of segmented masks: Gen(G,D) = Ex,y,z(−log(D(x,G(x,z))))+ λEx,y,z(
L1(y,G(x,z))) + αEx,y,z(`SSIM(y,G(x,z))) (1)
where z is a random variable and λ and α are empirical weighting factors. The variable z is introduced as a dropout in the decoding layers Dn1, Dn2 and Dn3 at both training and testing phases, which helps to generalize the learning processes and avoid overfitting. If the generator network is properly optimized, the values of D(x,G(x,z)) should approach 1.0, meaning that discriminator cannot distinguish generated tumor masks from ground truth masks, while L1 and SSIM losses should approach to 0.0, indicating that every generated mask matches the corresponding ground truth both in overall pixel-to-pixel distances (L1) and in basic statistic descriptors (SSIM). For more details and analysis of loss functions, the reader is referred to suppl. A.3. The loss function of the discriminator D can be formulated as follows:
`Dis(G,D) = Ex,y,z(−log(D(x,y))) +Ex,y,z(−log(1−D(x,G(x,z)))) (2)
The optimizer will fit D to maximize the loss values for ground truth masks (by minimizing−log(D(x,y))) and minimize the loss values for generated masks (by minimizing −log(1−D(x,G(x,z))). These two terms compute BCE loss using both masks, assuming that the expected class for ground truth and generated masks are 1 and 0, respectively. G and D networks are optimized concurrently: one optimization step for both networks at each iteration, where G tries to generate a valid tumor segmentation and D learns how to differentiate between the synthetic and real segmentation.
损失函数:假设x是包含乳腺肿瘤的BUS图像,y是图像中该肿瘤的基础真实掩模,G(x,z)和D(x,G(x,z))是发生器和鉴别器。发生器G的损失函数包括三个项:对抗性损失(二元交叉熵损失),用于促进学习过程的L1范数,以及用于改善分段掩模边界形状的SSIM损失[15]:Gen(G ,D)= Ex,y,z(-log(D(x,G(x,z))))+λEx,y,z(L1(y,G(x,z)))+αEx,y ,z(
SSIM(y,G(x,z)))(1)
其中z是随机变量,λ和α是经验加权因子。在训练和测试阶段,变量z作为解码层Dn1,Dn2和Dn3中的丢失引入,这有助于生成器的学习以及并且避免过拟合。如果生成器网络得到适当优化,D(x,G(x,z))的值应接近1.0,这意味着鉴别器无法区分生成的肿瘤掩模和地面实况掩模,而L1和SSIM损耗应接近0.0,表明每个生成的掩码在整个像素到像素距离(L1)和基本统计描述符(SSIM)中都匹配相应的基础事实。有关损失函数的更多详细信息和分析,请参阅suppl。 A.3。
鉴别器D的损失函数可以表述如下:'Dis(G,D)= Ex,y,z(-log(D(x,y)))+ Ex,y,z(-log(1-) D(x,G(x,z))))(2)优化器将通过D来最大化地面实况掩模的损耗值(通过最小化-log(D(x,y)))并最小化损耗值生成掩码(通过最小化-log(1-D(x,G(x,z)))。这两个术语使用两个掩码计算BCE损失,假设地面实况和生成掩码的预期类别分别为1和0 G和D网络同时优化:每次迭代时两个网络的一个优化步骤,其中G尝试生成有效的肿瘤分割,D学习如何在合成和真实分割之间进行区分。
Model training: In the preprocessing step, each BUS images is rescaled to 96x96 pixels, and pixel values are normalized between [0,1]. In the postprocessing step, morphological operations (3 × 3 closing, 2 × 2 erosion) are used to suppress most of the outlier predictions (speckled pixels). The hyperparameters of the model were experimentally tuned. We also explored several optimizers, such as SGD, AdaGrad, Adadelta, RMSProp, and Adam with different learning rates (see suppl. A.3). We achieved the best results with Adam optimizer (β1= 0.5, β2= 0.999) and learning rate =0.0002 with a batch size of 8. The SSIM loss and L1-norm loss weighting factors λ and α were set to 10 and 5, respectively. The best results were achieved by training both generator and discriminator from scratch for 40 epochs.
**模型训练:**在预处理步骤中,每个BUS图像重新缩放为96x96像素,像素值在[0,1]之间归一化。 在后处理步骤中,使用形态学操作(3×3闭合,2×2侵蚀)来抑制大多数异常值预测(斑点像素)。 该模型的超参数通过实验调整。 我们还研究了几种优化器,如SGD,AdaGrad,Adadelta,RMSProp和Adam,学习率不同(参见附录A.3)。 我们用Adam优化器(β1= 0.5,β2= 0.999)和学习率= 0.0002获得了最佳结果,批量大小为8. SSIM损失和L1范数损失加权因子λ和α分别设置为10和5。 通过从头开始训练40个时期的发生器和鉴别器,实现了最好的结果。
Breast Tumor Classification: To classify the BUS image into benign and malignant, we propose to rely on statistic features of the segmented tumor mask to discriminate between both classes. Malignant breast tumors and benign lesions have different shape characteristics: the malignant lesion usually is irregular, speculated, or microlobulated. However, benign lesion mainly has smooth boundaries, round, oval, or macrolobulated shape [17].
**乳腺肿瘤分类:**为了将BUS图像分类为良性和恶性,我们建议依靠分割的肿瘤面罩的统计特征来区分两种类型。 恶性乳腺肿瘤和良性病变具有不同的形状特征:恶性病变通常是不规则的,推测的或微量的。 然而,良性病变主要有光滑的边界,圆形,椭圆形或宏观形状[17]。
In the classification method, each BUS image is fed into the trained generative network to obtain the boundary of the tumor, and then we compute 13 statistical features from that boundary: fractal dimension, lacunarity, convex hull, convexity, circularity, area, perimeter, centroid, minor and major axis length, smoothness, Hu moments (6) and central moments (order 3 and below). We implemented an Exhaustive Feature Selection (EFS) algorithm to select the best set of features. The EFS algorithm indicates that the fractal dimension, lacunarity, convex hull, and centroid are the 4 optimal features. The selected features are fed into a Random Forest classifier, which is later trained to discriminate between benign and malignant tumors.
在分类方法中,每个BUS图像被送入训练的生成网络以获得肿瘤的边界,然后我们从该边界计算13个统计特征:分形维数,空隙度,凸包,凸度,圆度,面积,周长, 质心,次要和长轴长度,平滑度,Hu矩(6)和中心矩(3阶及以下)。 我们实现了穷举特征选择(EFS)算法来选择最佳特征集。 EFS算法表明分形维数,空隙度,凸壳和质心是4个最优特征。 将选定的特征输入随机森林分类器,随后对其进行训练以区分良性和恶性肿瘤。