论文下载: 地址
Abstract. Automatic brain tumor segmentation plays an important role for diagnosis, surgical planning and treatment assessment of brain tumors. Deep convolutional neural networks (CNNs) have been widely used for this task. Due to the relatively small data set for training, data augmentation at training time has been commonly used for better performance of CNNs. Recent works also demonstrated the usefulness of using augmentation at test time, in addition to training time, for achieving more robust predictions. We investigate how test time augmentation can improve CNNs’ performance for brain tumor segmentation. We used different underpinning network structures and augmented the image by 3D rotation, flipping, scaling and adding random noise at both training and test time. Experiments with BraTS 2018 training and validation set show that test-time augmentation helps to improve the brain tumor segmentation accuracy and obtain uncertainty estimation of the segmentation results
摘要脑肿瘤的自动分割对脑肿瘤的诊断、手术计划和治疗评价具有重要意义。深度卷积神经网络(CNNs)已被广泛用于这一任务。由于用于训练的数据集相对较小,因此为了提高CNNs的性能,通常使用训练时的数据扩充。最近的工作也证明了除了训练时间外,在测试时间使用增强对于实现更可靠的预测也是有用的。我们研究了测试时间的增加如何提高CNNs在脑肿瘤分割中的表现。我们使用不同的托换网络结构,并在训练和测试时通过3D旋转、翻转、缩放和添加随机噪声来增强图像。BraTS 2018训练和验证集实验表明,测试时间的增加有助于提高脑肿瘤分割的准确性,获得分割结果的不确定性估计
Gliomas are the most common primary brain tumors that start in the glial cells of the brain in adults. They can be categorized according to their grade: LowGrade Gliomas (LGG) exhibit benign tendencies and portend a better prognosis for the patient, while High-Grade Gliomas (HGG) are malignant and lead to a worse prognosis [22]. Medical imaging of brain tumors plays an important role for evaluating the progression of the disease before and after treament. Currently the most widely used imaging modality for brain tumors is Magnetic Resonance Imaging (MRI) with different sequences, such as T1-weighted, contrast enhanced T1-weighted (T1ce), T2-weighted and Fluid Attenuation Inversion Recovery (FLAIR) images. These sequences provide complementary information for different subregions of brain tumors [24]. For example, the tumor region and peritumoral edema can be highlighted in FLAIR and T2 images, and the tumor core region without peritumoral edema is more visible in T1 and T1ce images.
胶质瘤是最常见的原发脑瘤,在成人大脑的胶质细胞中发生。低级别胶质瘤(LGG)为良性,预后较好,高级别胶质瘤(HGG)为恶性,预后较差,[22]。摘要脑肿瘤的医学影像学检查对评价脑肿瘤治疗前后的病情进展起着重要的作用。目前应用最广泛的脑肿瘤成像方式是不同序列的磁共振成像(MRI),如t1加权、t1增强增强(T1ce)、t2加权和液体衰减反转恢复(FLAIR)图像。这些序列为脑肿瘤[24]的不同亚区提供了互补信息。例如在FLAIR和T2图像中可以突出肿瘤区域和瘤周水肿,在T1和T1ce图像中更明显的是无瘤周水肿的肿瘤核心区域。
Automatic segmentation of brain tumors and substructures from medical images has a potential for accurate and reproducible measurement of the tumors, which can help more efficient and better diagnosis, surgical planning and treatment assessment of brain tumors [24,5]. However, accurate automatic segmentation of the brain tumors is a challenging task for several reasons. First, the boundary between brain tumor and normal tissues is often ambiguous due to the smooth intensity gradients, partial volume effects, and bias field artifacts. Second, the brain tumors vary largely across patients in terms of size, shape, and localization. This prohibits the use of strong priors on shape and localization that are commonly used for robust segmentation of many other anatomical structures, such as the heart [11] and the liver [31].
从医学图像中自动分割脑肿瘤及其子结构具有准确、可重复性测量肿瘤的潜力,有助于脑肿瘤更有效、更好的诊断、手术计划和治疗评估[24,5]。然而,准确的自动分割脑肿瘤是一个具有挑战性的任务,有几个原因。首先,由于平滑的强度梯度、部分体积效应和偏场效应,脑瘤和正常组织之间的边界常常是模糊的。第二,脑肿瘤在大小、形状和定位上因患者而异。这就禁止了在形状和定位上使用强先验,而这种强先验通常用于其他许多解剖结构的健壮分割,如心脏[11]和肝脏[31]。
In recent years, deep Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance for multi-modal brain tumor segmentation [29,15]. As a type of machine learning approach, they require a set of annotated training images for learning. Compared with traditional machine learning approaches such as support vector machines [20] and decision trees [34], they do not rely on hand-crafted features and can learn features automatically. In [12], a CNN was proposed to exploit both local and global features for robust brain tumor segmentation. It replaces the final fully connected layer used in traditional CNNs with a convolutional implementation that obtains 40 fold speed up. This approach employs a two-phase training procedure and a cascade architecture to tackle difficulties related to the imbalance of tumor labels. Despite the better performance than traditional methods, this approach works on individual 2D slices without considering 3D contextual information. DeepMedic [16] uses a dual pathway 3D CNN with 11 layers to make use of multi-scale features for brain tumor segmentation. For post-processing, it uses a 3D fully connected Conditional Random Field (CRF) [19] that helps to remove false positives. DeepMedic achieved better performance than using 2D CNNs. However, it works on local image patches and therefore has a relatively low inference efficiency. In [29], a triple cascaded framework was proposed for brain tumor segmentation. The framework uses three networks to hierarchically segment whole tumor, tumor core and enhancing tumor core sequentially. It uses a network structure with anisotropic convolution to deal with 3D images, taking advantage of dilated convolution [32], residual connection [6] and multi-scale fusion [30]. It demonstrated an advantageous trade-off between receptive field, model complexity and memory consumption. This method also fuses the output of CNNs in three orthogonal views for more robust segmentation of brain tumors. In [15], an ensemble of multiple models and architectures including DeepMedic [16], 3D Fully Convolutional Networks (FCN) [21] and U-Net [27,2] was used for robust brain tumor segmentation. The ensemble method reduces the influence of the meta-parameters of individual CNN models and the risk of overfitting the configuration to a specific training dataset. However, it requires much more computational resources to train and run a set of models.
近年来,深度卷积神经网络(deep Convolutional Neural Networks, CNNs)在多模态脑瘤分割方面取得了最先进的性能[29,15]。作为一种机器学习方法,它们需要一组带注释的训练图像来进行学习。与传统的机器学习方法如支持向量机[20]和决策树[34]相比,它们不依赖手工特征,可以自动学习特征。在[12]中,我们提出了一种利用局部和全局特征来对脑瘤进行稳健分割的CNN。它用卷积实现代替了传统CNNs中使用的最后一个全连接层,该实现的速度提高了40倍。该方法采用两阶段训练程序和级联架构来处理与肿瘤标签不平衡相关的困难。尽管这种方法比传统方法具有更好的性能,但它只适用于单个的2D切片,而不考虑3D上下文信息。DeepMedic[16]使用11层的双通道3D CNN,利用多尺度特征对脑瘤进行分割。在后期处理方面,它使用了3D全连接条件随机域[19]来帮助消除误报。DeepMedic实现了比使用2D CNNs更好的性能。然而,该方法只对局部图像块有效,因此推理效率相对较低。在[29]中,提出了一种用于脑肿瘤分割的三重级联框架。该框架利用三种网络对整个肿瘤、肿瘤核心进行分层分割,并依次增强肿瘤核心。利用扩容卷积[32]、残差连接[6]和多尺度融合[30],采用各向异性卷积的网络结构对三维图像进行处理。它在接受域、模型复杂度和内存消耗之间进行了有利的权衡。该方法还融合了三个正交视图中CNNs的输出,用于更健壮的脑肿瘤分割。在[15]中,我们使用了包括深度医疗[16]、3D全卷积网络[21]和U-Net[27,2]在内的多个模型和架构的集合来对脑瘤进行稳健的分割。集成方法减少了单个CNN模型的元参数的影响和配置过度拟合到特定训练数据集的风险。然而,它需要更多的计算资源来训练和运行一组模型。
Training with a large dataset plays an important role for the good performance of deep CNNs. For medical images, collecting a very large training set is usually time-consuming and challenging. Therefore, many works have used data augmentation to partially compensate this problem. Data augmentation applies transformations to the samples in a training set to create new ones, so that a relatively small training set can be enlarged to a larger one. Previous works have used different types of transformations such as flipping, cropping, rotation and scaling training images [2]. In [33], a simple and data-agnostic data augmentation routine termed mixup was proposed for training neural networks. Recently, several studies have empirically found that the performance of deep learning-based image recognition methods can be improved by combining predictions of multiple transformed versions of a test image, such as in data distillation [26], pulmonary nodule detection [14] and skin lesion classification [23]. In [13], test images were augmented by mirroring for brain tumor segmentation. In [28], a mathematical formulation was proposed for test-time augmentation, where a distribution of the prediction was estimated by Monte Carlo simulation with prior distributions of parameters in an image acquisition model. That work also proposed a test-time augmentation-based aleatoric uncertainty estimation method that can help to reduce overconfident predictions. The framework in [28] has been validated with binary segmentation tasks, while its application to multi-class segmentation has yet to be demonstrated.
大数据集训练对提高深度CNNs的性能起着重要作用。对于医学图像,收集一个非常大的训练集通常是耗时和具有挑战性的。因此,许多研究都采用了数据扩充的方法来弥补这一不足。数据扩充对训练集中的样本进行转换,以创建新的样本,从而将一个相对较小的训练集扩大为一个较大的训练集。之前的工作使用了不同类型的转换,如翻转、剪切、旋转和缩放训练图像[2]。在[33]中,提出了一种简单的、与数据无关的数据扩展例程mixup,用于训练神经网络。最近的几项研究经验发现,基于深度学习的图像识别方法的性能可以通过结合测试图像的多个转换版本的预测来提高,如数据蒸馏[26]、肺结节检测[14]和皮肤病变分类[23]。在[13]中,通过镜像增强测试图像以进行脑肿瘤分割。在[28]中,提出了一个用于测试时间增加的数学公式,其中预测的分布由蒙特卡罗模拟估计与先验分布的参数在一个图像采集模型。该工作还提出了一种基于测试时间增加的随机不确定性估计方法,可以帮助减少过度自信的预测。[28]中的框架已经通过二值分割任务进行了验证,但其在多类分割中的应用还有待进一步的论证。
In this paper, we extend the work of [29] and [28], and apply test-time augmentation to automatic multi-class brain tumor segmentation. For a given input image, instead of obtaining a single inference, we augment the input image with different transformation parameters to obtain multiple predictions from the input, with the same network and associated trained weights. The multiple predictions help to obtain more robust inference of a given image. We explore the use of different CNNs as the underpinning network structures. Experiments with BraTS 2018 training and validation set showed that an improvement of segmentation accuracy was achieved by test-time augmentation, and our method can provide uncertainty estimation for the segmentation output.
在本文中,我们扩展了[29]和[28]的工作,并将TTA应用于多类脑肿瘤的自动分割。对于给定的输入图像,我们使用相同的网络和相关的训练权值,用不同的转换参数来增加输入图像,从而获得来自输入图像的多个预测,而不是获得一个单一的推理。多个预测有助于获得一个给定图像的更可靠的推论。我们探讨了不同CNNs作为基础网络结构的应用。通过BraTS 2018训练和验证集的实验表明,测试时间的增加提高了分割精度,我们的方法可以为分割输出提供不确定性估计。
We explore three network configurations as underpinning CNNs for the brain tumor segmentation task: 1) 3D UNet [2], 2) the cascaded networks in [29] where a WNet, TNet and ENet was used to segment whole tumor, tumor core and enhancing tumor core respectively, and 3) adapting WNet [29] for one-pass multi-class prediction without using cascaded prediction, which is referred to as multi-class WNet.
我们尝试将下面三个网络作为cnn对大脑肿瘤进行分割任务:1)3d UNet [2], 2) 在[29]中提到的级联网络,有WNet、TNet和ENet,它们分别分割WT、TC、ET . 3)不使用级联预测而是用WNet一次通过多分类预测[29]。
The 3D U-Net has a downsampling and an upsampling path each with four resolution steps. In the downsampling path, each layer has two 3 × 3 × 3 convolutions each followed by a Rectified Linear Unit (ReLU) activation function, and then a 2×2×2 max pooling layer was used for downsampling. In the upsamping path, each layer uses an deconvolution with kernel size 2×2×2, followed by two 3×3×3 convolutions with ReLU. The network has shortcut connections between corresponding layers with the same resolution in the downsampling path and the upsampling path. In the last layer, a 1 × 1 × 1 convolution is used to reduce the number of output channels to the number of segmentation labels, i.e., 4 for the brain tumor segmentation task in the BraTS challenge.
3D U-Net有一个下采样和一个上采样路径,每条路径有四个分辨率。在下采样路径中,每一层有两个3×3×3的卷积,每个卷积后都有一个经过整流的线性单元(ReLU)激活函数,然后使用一个2×2×2的max pooling层进行下行采样。在上采样路径中,每一层使用核大小为2×2×2的反卷积,然后使用两个3×3×3的卷积再经过ReLU。该网络的下采样路径和上采样路径中具有相同分辨率的对应层之间进行skip connection。在最后一层中,使用1×1×1的卷积将输出通道的数量减少到分割标签的数量,即对于脑瘤的分割任务在BraTS中挑战。
The WNet proposed in [29] is an anisotropic network that considers a tradeoff between receptive field, model complexity and memory consumption. It employs dilated convolution [32], residual connection [6] and multi-scale prediction [30] to improve segmentation performance. The network uses 20 intra-slice convolution layers and four inter-slice convolution layers with two 2D downsampling layers. Since the anisotropic convolution has a small receptive field in the through-plane direction, multi-view fusion was used to take advantage of the 3D contextual information, where the network was applied in axial, sagittal and coronal views respectively. For the multi-view fusion, the softmax outputs in these three views were averaged. In [29], WNet is used to segment the whole tumor. TNet for tumor core segmentation uses the same structure as WNet, and ENet for enhancing core segmentation is a variant of WNet that uses only
one down-sampling layer. Compared with multi-label prediction, the cascaded networks require longer time for training and testing. To improve the training efficiency, we compare the cascaded networks [29] with the use of multi-class WNet, where a single WNet for multi label prediction is employed without using TNet and ENet. Therefore, for this variant we change the output channel number of WNet from 2 to 4. Multi-view fusion is also used for this multi-class WNet.
在[29]中提出的WNet是一个考虑接受域、模型复杂度和内存消耗之间权衡的各向异性网络。利用空洞卷积[32]、残差连接[6]和多尺度预测[30]来提高分割性能。该网络使用20个层内卷积层和4个层间卷积层,两个二维下采样层。由于各向异性卷积在通平面方向上的接受域较小,因此利用三维上下文信息进行多视点融合,将网络分别应用于轴向、矢状和冠状三个方向。对于多视图融合,取三个视图的softmax输出的平均值。在[29]中,WNet被用来分割WT。TNet用于TC的分割,它与WNet相同的结构,而ENet用于ET的分割是WNet的一个变体,仅使用一个下采样层。与多标签预测相比,级联网络需要更长的训练和测试时间。为了提高训练效率,我们将级联网络[29]与使用多类WNet进行比较,其中使用单个WNet进行多标签预测,而不使用TNet和ENet。因此,对于这种变体,我们将WNet的输出通道数从2更改为4。这个多类WNet网络也使用了多视图融合。
From the point view of image acquisition, an observed image is only one of many possible observations of the underlying anatomy that can be observed with different spatial transformations and noise. Direct inference with the observed image may lead to a biased result affected by the specific transformation and noise associated with that image. To obtain a more robust prediction, we consider different transformations and noise during the test time. Let β and e represent the parameters for spatial transformation and intensity noise respectively. We assume that β is a combination of fl, r and s, where fl is a random variable for flipping along each 3D axis, r is the rotation angle along each 3D axis, s is a scaling factor. We consider these parameters following some distributions: fl ∼ Bern(0:5), r ∼ U(0; 2π), s ∼ U(0:8; 1:2). For the intensity noise, we assume e ∼ N(0; 0:05) according to the reduced standard deviation of a median filtered version of a normalized image [28].
从图像采集的角度来看,通过不同的空间变换和噪声可以观察到潜在解剖结构的许多可能的观测结果,而观测到的图像只是其中之一。对观察到的图像进行直接推断,可能会导致受与该图像相关的特定变换和噪声影响的偏倚结果。为了获得更可靠的预测,我们考虑了不同的转换和测试期间的噪声。让β和e代表分别为空间变换和噪声强度的参数。我们假设β是fl、r和s的结合,fl 是一个随机变量用来翻转每个3d轴,r是每个3d轴的旋转角度, s是一个比例因子。我们考虑以下一些分布参数:fl ~ Bern(0:5), r ~ U(0;2π),s∼U (0:8;1:2)。对于强度噪声,我们假设e~(0;0:05)根据归一化图像[28]的中值滤波后的标准偏差降低。
For data augmentation, we randomly sample β and e from the above distributions and use them to transform the image. We use the same distributions of augmentation parameters at both training and test time for a given CNN. For test-time augmentation, we obtain N samples from the distributions of β and e by Monte Carlo simulation, and the resulting transformed version of the input was fed into the CNN. The N prediction results were combined to obtain the final prediction based on majority voting.
为了数据增加,我们随机从上面的分布对β和e进行采样,然后使用它们来变换图像。对于给定的CNN,我们在训练和测试时使用相同的增强参数分布。为了使用TTA,我们从Monte Carlo模拟出来的β和e分布中中获得N个样本分布,由此产生的转换版本的输入是送入CNN。将N个预测结果进行合并,得到基于多数投票的最终预测结果。
Both model-based (epistemic) uncertainty and image-based (aleatoric) uncertainty have been investigated for deep CNNs in recent years [17]. The epistemic uncertainty is often obtained by Bayesian approximation-based methods such as test-time dropout [9]. In [28], test-time augmentation was used to estimate the aleatoric uncertainty of segmentation results in a consistent mathematical framework. In this paper, we use test-time augmentation to obtain segmentation results as well as the associated aleatoric uncertainty according to [28].
近年来,基于模型(认知)的不确定度和基于图像(任意)的不确定度都得到了深入的研究。认知不确定度通常是通过贝叶斯近似方法得到的,如测试时间不确定度[9]。在[28]中,在一个一致的数学框架中,使用TTA来估计分割结果的任意不确定性。在本文中,我们使用TTA来获得分割结果以及相关的随机不确定性([28])。
The uncertainty estimation is obtained by measuring the diversity of the predictions for a given image. Both the variance and entropy of the distribution can be used to estimate uncertainty. Since variance is not sufficiently representative in the context of multi-modal distributions, we use entropy for the pixel-wise uncertainty estimation desired for segmentation tasks. Let X denote the input image and Y denote the output segmentation. We use Y i to denote the predicted label for the i-th pixel. With the Monte Carlo simulation described in Section 2.2, a set of values for Y i are obtained Yi = fy1i; y2i; :::; yNi g. The entropy of the distribution of Y i is therefore approximated as:
不确定性估计是通过测量给定图像预测的多样性来获得的。分布的方差和熵都可以用来估计不确定性。由于方差在多模态分布中不能充分代表,我们使用熵来进行分割任务所需的像素级不确定性估计。设X为输入图像,Y为输出分割。我们使用Y i来表示第i个像素的预测标签。利用2.2节所述的蒙特卡罗模拟,得到Yi的一组值Yi = fy1i;y2i;:::;因此,Y i分布的熵近似为:
where ^ pi m is the frequency of the m-th unique value in Yi.
其中^ m是Yi中第m个唯一值的频率。
Data and Implementation Details. We used the BraTS 20181 [3,4,5,24] dataset for experiments. The training set contains images from 285 patients, including 210 cases of HGG and 75 cases of LGG. The BraTS 2018 validation and testing set contain images from 66 and 191 patients with brain tumors of unknown grade, respectively. Each patient was scanned with four sequences: T1, T1ce, T2 and FLAIR. As a pre-processing performed by the organizers, all the images were skull-striped and re-sampled to an isotropic 1mm3 resolution, and the four modalities of the same patient had been co-registered. The ground truth were provided by the BraTS organizers. We uploaded the segmentation results obtained by our method to the BraTS 2018 server, and the server provided quantitative evaluations including Dice score and Hausdorff distance compared with the ground truth.
数据和实现细节。我们使用BraTS 2018[3,4,5,24]数据集进行实验。训练集包含285例患者的图像,其中210例HGG, 75例LGG。BraTS 2018验证和测试集包含66例和191例未知级别脑肿瘤患者的图像。每个患者扫描4个序列:T1、T1ce、T2和FLAIR序列。作为组织者进行的预处理,所有的图像都是头骨条纹的,并以各向同性的1mm3分辨率重新采样,并对同一患者的四种模式进行了联合登记。真相是由BraTS的组织者提供的。我们将我们的方法得到的分割结果上传至BraTS 2018服务器,服务器提供了包括Dice score、Hausdorff distance与ground truth的定量评价。
Fig. 1. An example of brain tumor segmentation results obtained by different networks and test-time augmentation (TTA). The first row shows the four modalities of the same patient. The second and third rows show segmentation results. Green: edema; Red: non-enhancing tumor core; Yellow: enhancing tumor core.
图1所示。一个例子的脑瘤分割结果获得不同的网络和(TTA)。第一行显示了同一病人的四种模式。第二行和第三行显示分割结果。绿色:水肿;红色:肿瘤核心无强化;黄色:增强肿瘤核心。
We implemented the 3D UNet [2], multi-class WNet and cascaded networks [29] in Tensorflow2 [1] using NiftyNet34 [10]. The Adaptive Moment Estimation (Adam) [18] strategy was used for training, with initial learning rate 10−3, weight decay 10−7, and maximal iteration 20k. The training patch size was 96×96×96 for 3D UNet and 96×96×19 for multi-class WNet. The batch size was 2 and 4 for these two networks respectively. For the cascaded networks, we followed the configurations in [29]. The training process was implemented on an NVIDIA TITAN X GPU. As a pre-processing, each image was normalized by the mean value and standard deviation. The Dice loss function [25,8] was used for training.
我们使用NiftyNet34[10]实现了Tensorflow2[1]中的3D UNet[2]、多类WNet和级联网络[29]。采用自适应矩估计(Adam)[18]策略进行训练,初始学习速率为10−3,权值衰减为10−7,最大迭代次数为20k。3D UNet的训练patch尺寸为96×96×96,multi-class WNet的训练patch尺寸为96×96×19。这两个网络的批处理大小分别为2和4。对于级联网络,我们遵循[29]中的配置。该训练过程是在NVIDIA TITAN X GPU上实现的。作为预处理,对每幅图像进行均值和标准差归一化处理。使用Dice损失函数[25,8]进行训练。
Fig. 2. Another example of brain tumor segmentation results obtained by different networks and test-time augmentation (TTA). The first row shows the four modalities of the same patient. The second and third rows show segmentation results. Green: edema; Red: non-enhancing tumor core; Yellow: enhancing tumor core
图2所示。另一个例子,脑瘤分割结果得到不同的网络和(TTA)。第一行显示了同一病人的四种模式。第二行和第三行显示分割结果。绿色:水肿;红色:肿瘤核心无强化;黄色:增强肿瘤核心
At test time, the augmented prediction number was set to N = 20 for all the network structures. The multi-class WNet and cascaded networks were trained in axial, sagittal and coronal views respectively, and the predictions in these three views were fused by averaging at test time.
在测试时,将所有网络结构的增广预测数设为N = 20。多类WNet和级联网络分别在轴向、矢状和冠状面进行训练,并在测试时通过平均融合这三种视图中的预测。
Segmentation Results. Fig. 1 shows an example from the BraTS 2018 validation set. The first row shows the input images of four modalities: FLAIR, T1, T1ce and T2. The second and third rows present the segmentation results of 3D UNet, multi-class WNet, cascaded networks and their corresponding results with test-time augmentation. It can be observed that the initial output of the 3D UNet seems to be noisy with some false positives of edema and non-enhancing tumor core. After using test-time augmentation, the result becomes more spatially consistent. The output of multi-class WNet also seems to be noisy for the non-enhancing tumor core. A smoother segmentation is obtained by multiclass WNet with test-time augmentation. For the cascaded networks, test-time augmentation also leads to visually better resutls of the tumor core.
分割结果。图1显示了来自BraTS 2018验证集的一个示例。第一行显示了四种模式的输入图像:FLAIR、T1、T1ce和T2。第二行和第三行分别展示了3D UNet、多类WNet、级联网络的分割结果以及相应的测试时间扩展结果。可以观察到3D UNet的初始输出似乎是有噪声的,存在水肿和肿瘤核无增强的假阳性。在使用测试时间扩展之后,结果在空间上变得更加一致。对于非增强的肿瘤核心,多类WNet的输出也有一定的噪声。通过增加测试时间的多类WNet可以获得更平滑的分割。对于级联网络,测试时间的延长也会带来更好的肿瘤核心的视觉恢复。
Fig. 2 shows another example from the BraTS 2018 validation set. It can be observed that the 3D UNet obtains a hole in the tumor core, which seems to be an under-segmentation. The hole is filled after using test-time augmentation and the result looks more consistent with the input images. The initial prediction by multi-class WNet seems to have an over segmentation of the nonenhancing tumor core. After using test-time augmentation, the over-segmented\ regions become smaller, leading to higher accuracy. Test-time augmentation also helps to improve the result of cascaded networks. Fig. 3 shows a case from the BraTS 2018 testing set, where test-time augmentation obtains a better spatial consistency for the tumor core. In addition, it leads to an uncertainty estimation of the segmentation output. It can be observed that most uncertain results focus on the border of the tumor and some potentially mis-segmented regions.
图2展示了另一个来自BraTS 2018验证集的例子。可以观察到,3D UNet在肿瘤核心上获得了一个洞,这似乎是一个欠分割。在使用测试时间扩展后,该孔被填充,并且结果看起来与输入图像更一致。多类WNet的初步预测似乎对非增强的肿瘤核心有过度分割。使用测试时间扩展后,过度分割的区域变小,从而提高了精度。测试时间的增加也有助于改进级联网络的结果。图3为BraTS 2018测试集的一个案例,其中测试时间的增加为肿瘤核心获得了更好的空间一致性。此外,它导致了分割输出的不确定性估计。可以观察到,大多数不确定的结果集中在肿瘤的边界和一些潜在的错节区域。
A quantitative evaluation of our different methods on the BraTS 2018 validation set is shown in Table 1. The initial output of 3D UNet achieved Dice scores of 73.44%, 86.38% and 76.58% for enhancing tumor core, whole tumor and tumor core respectively. 3D UNet with test-time augmentation achieved a better performance than the baseline of 3D UNet, leading to Dice scores of 75.43%, 87.31% and 78.32% respectively. For the initial output of multi-class WNet, the Dice score was 75.70%, 88.98% and 72.53% for these three structures respectively. After using test-time augmentation, an improvement was achieved, and the Dice score was 77.70%, 89.56% and 73.04% for these three structures respectively. For the cascaded networks, test-time augmentation leads to higher accuracy for the enhancing tumor core and tumor core. Table 2 presents the performance of our cascaded networks with test-time augmentation on BraTS 2018 testing set. The average Dice scores for enhancing tumor core, whole tumor and tumor core are 74.66%, 87.78% and 79.64%, respectively. The corresponding values of Hausdorff distance are 4.16mm, 5.97mm and 6.71mm, respectively.
我们在BraTS 2018验证集上对不同方法的定量评估如表1所示。3D UNet初始输出增强肿瘤核心、全肿瘤和肿瘤核心的Dice得分分别为73.44%、86.38%和76.58%。增加测试时间的3D UNet的性能优于3D UNet的基线,使得Dice的得分分别为75.43%、87.31%和78.32%。对于多类WNet的初始输出,这三种结构的Dice分数分别为75.70%、88.98%和72.53%。通过增加测试时间,得到了改善,三种结构的DICE点数分别为77.70%、89.56%和73.04%。对于级联网络,测试时间的增加可以提高增强肿瘤核和肿瘤核的准确性。表2展示了我们在BraTS 2018测试集上增加测试时间的级联网络的性能。增强肿瘤核心、全肿瘤和肿瘤核心的平均骰子得分分别为74.66%、87.78%和79.64%。Hausdorff距离的对应值分别为4.16mm、5.97mm和6.71mm。
For test-time augmentation, we only used flipping, rotation and scaling for spatial transformations. It is also possible to employ more complex transformations such as elastic deformations used in [2]. However, such deformations take longer time for testing and have less efficiency. The results show that test-time augmentation leads to an improvement of segmentation accuracy for different CNNs including 3D UNet [2], multi-class WNet and cascaded networks [29]. Test-time augmentation can be applied to other CNN models as well. The uncertainty estimation obtained by our method can be used for downstream analysis such as uncertainty-aware volume measurement [7] and guiding user interactions [30]. It would be of interest to assess the impact of test-time augmentation on CNNs trained with state-of the-art policies such as in [13].
对于TTA,我们只对空间转换使用翻转、旋转和缩放。也可以采用更复杂的转换,如[2]中使用的弹性变形。然而,这种变形的检测时间较长,效率较低。结果表明,TTA可以提高不同网络的分割精度,包括3D UNet[2]、多级WNet和级联网络[29]。TTA也适用于其他CNN模型。该方法得到的不确定性估计可用于后续分析,如不确定性感知的体积测量[7]和指导用户交互[30]。我们有兴趣评估TTA对使用最新的策略(如[13])训练的CNNs的影响。
In conclusion, we explored the effect of test-time augmentation for CNNbased brain tumor segmentation. We used 3D U-Net, 2.5D multi class WNet and cascaded networks as the underpinning network structures. For training and testing, we augmented the image by 3D rotation, flipping, scaling and adding random noise. Experiments with BraTS 2018 training and validation set show that test-time augmentation helps to improve the brain tumor segmentation accuracy for different CNN structures and obtain uncertainty estimation of the segmentation results.
综上所述,我们探讨了TTA对基于cnn的脑肿瘤分割的影响。采用3d unet、2.5D 多类WNet和级联网络作为支撑网络结构。为了训练和测试,我们通过3D旋转、翻转、缩放和添加随机噪声来增强图像。BraTS 2018训练和验证集实验表明,TTA有助于提高不同CNN结构下的脑瘤分割精度,获得分割结果的不确定性估计。