语义分割 Semantic Segmentation

之前了解过语义分割的内容,感觉可以做好多东西,然后就抽空学习了一下,这里记录一下方便以后查阅,这篇文章可能也会随着学习的深入不断更新。

语义分割 Semantic Segmentation

  • 一些基本概念
  • 几种语义分割算法
    • Fully Convolutional Networks (FCN)
      • 全卷积网络(FCN)的基本信息
      • FCN的优缺点
      • 语义分割VS图像分类
      • 分类 -> 分割 的变化
      • 上采样方法
        • 上采——双线性插值
        • 上采样——Un-pooling
        • 上采样——Transpose Conv
      • FCN网络结构
      • FCN代码实现
    • U-Net
      • U-Net网络结构
      • skip-connect机制
      • U-Net输出层
      • U-Net代码实现
    • Pyramid Scene Parsing Network(PSP)分割网络
      • 三个分割问题
      • PSPNet的主要贡献
      • PSP针对的问题
      • 感受野(RF)作用的理解
      • RF -> PSP
      • Pyramid Pooling 模块
      • Adaptive Pool 自适应池化的维度计算
      • PSP网络结构
      • Dilated Convolution 空洞卷积
      • PSP网的辅助模块
      • PSPNet代码实现
    • DeepLab
      • DeepLab V1
        • DeepLab V1 网络结构
        • DeepLab V1 代码实现
      • DeepLab V2
        • DeepLab V2 网络结构
        • DeepLab V2 ASPP模块
        • DeepLab V2 代码实现
      • DeepLab V3
        • DeepLab V3 网络结构
        • DeepLab V3 ASPP升级模块
        • DeepLab V3 Multi-Grid
        • 代码实现
      • DeepLab V3+

一些基本概念

图像分割
图像分割指的是使用边界、色彩梯度等特征对图像进行划分,此时比较火的算法有Ostu、FCM、分水岭、N-Cut等,这些算法一般是非监督学习,分割出来的结果并没有语义的标注,换句话说,分割出来的东西并不知道是什么。

语义分割(semantic segmentation)
是将输入图像中的每个像素点预测为不同的语义类别。更注重类别之间的区分,会重点将前景里的车辆和背景里的房屋、天空、地面分割开,但是不区分重叠车辆。主要有FCN,DeepLab,PSPNet等方法。

实例分割(instance segmentation)
是目标检测和语义分割的结合,将输入图像中的目标检测出来,对目标包含的每个像素分配类别标签。更注重前景中目标个体之间的分割,背景的房屋、天空、地面均为一类。主要有DeepMask,Mask R-CNN,PANet等方法。

全景分割(panoptic segmentation)

是语义分割和实例分割的综合,旨在同时分割实例层面的目标(thing)和语义层面的背景内容(stuff),将输入图像中的每个像素点赋予类别标签和实例ID,生成全局的,统一的分割图像。

语义分割 Semantic Segmentation_第1张图片

语义分割和图像分割的区别

实例与全景分割PPT

「自行科技」一文带你读懂全景分割

CNN图像语义分割基本上是这个套路:
下采样+上采样:Convlution + Deconvlution/Resize
多尺度特征融合:特征逐点相加/特征channel维度拼接
获得像素级别的segement map:对每一个像素点进行判断类别

语义分割网络在特征融合时也有2种办法:

FCN式的逐点相加,对应caffe的EltwiseLayer层,对应tensorflow的tf.add()
U-Net式的channel维度拼接融合,对应caffe的ConcatLayer层,对应tensorflow的tf.concat()

图像分割综述【深度学习方法】

几种语义分割算法

Fully Convolutional Networks (FCN)

Fully Convolutional Networks for Semantic Segmentation,简称FCN。这篇论文是第一篇成功使用深度学习做图像语义分割的论文。论文的主要贡献有两点:

提出了全卷积网络。将全连接网络替换成了卷积网络,使得网络可以接受任意大小的图片,并输出和原图一样大小的分割图。只有这样,才能为每个像素做分类。

使用了反卷积层(Deconvolution)。分类神经网络的特征图一般只有原图的几分之一大小。想要映射回原图大小必须对特征图进行上采样,这就是反卷积层的作用。虽然名字叫反卷积层,但其实它并不是卷积的逆操作,更合适的名字叫做转置卷积(Transposed Convolution),作用是从小的特征图卷出大的特征图。

全卷积网络(FCN)的基本信息

语义分割 Semantic Segmentation_第2张图片

语义分割 Semantic Segmentation_第3张图片

FCN的优缺点

语义分割 Semantic Segmentation_第4张图片

语义分割VS图像分类

语义分割 Semantic Segmentation_第5张图片

语义分割 Semantic Segmentation_第6张图片

分类 -> 分割 的变化

语义分割 Semantic Segmentation_第7张图片

上采样方法

语义分割 Semantic Segmentation_第8张图片

上采——双线性插值

语义分割 Semantic Segmentation_第9张图片
语义分割 Semantic Segmentation_第10张图片

上采样——Un-pooling

语义分割 Semantic Segmentation_第11张图片

上采样——Transpose Conv

语义分割 Semantic Segmentation_第12张图片
语义分割 Semantic Segmentation_第13张图片
语义分割 Semantic Segmentation_第14张图片
语义分割 Semantic Segmentation_第15张图片

FCN网络结构

语义分割 Semantic Segmentation_第16张图片
语义分割 Semantic Segmentation_第17张图片
语义分割 Semantic Segmentation_第18张图片
语义分割 Semantic Segmentation_第19张图片

FCN代码实现

class FCN8s(nn.Module):
    def __init__(self, in_channels=1, out_channels=[64, 128, 256, 512, 512, 4096, 4096], n_class=21):
        super(FCN8s, self).__init__()

        # conv1
        self.conv1_1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels[0], kernel_size=3, padding=100)
        self.relu1_1 = nn.ReLU(inplace=True)
        self.conv1_2 = nn.Conv2d(in_channels=out_channels[0], out_channels=out_channels[0], kernel_size=3, padding='same')
        self.relu1_2 = nn.ReLU(inplace=True) # 覆盖掉原来的变量
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) # 向上取整, 1/2

        # conv2
        self.conv2_1 = nn.Conv2d(in_channels=out_channels[0], out_channels=out_channels[1], kernel_size=3, padding='same')
        self.relu2_1 = nn.ReLU(inplace=True)
        self.conv2_2 = nn.Conv2d(in_channels=out_channels[1], out_channels=out_channels[1], kernel_size=3, padding='same')
        self.relu2_2 = nn.ReLU(inplace=True) # 覆盖掉原来的变量
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) # 向上取整, 1/4

        # conv3
        self.conv3_1 = nn.Conv2d(in_channels=out_channels[1], out_channels=out_channels[2], kernel_size=3, padding='same')
        self.relu3_1 = nn.ReLU(inplace=True)
        self.conv3_2 = nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[2], kernel_size=3, padding='same')
        self.relu3_2 = nn.ReLU(inplace=True)
        self.conv3_3 = nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[2], kernel_size=3, padding='same')
        self.relu3_3 = nn.ReLU(inplace=True) # 覆盖掉原来的变量
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) # 向上取整, 1/8

        # conv4
        self.conv4_1 = nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[3], kernel_size=3, padding='same')
        self.relu4_1 = nn.ReLU(inplace=True)
        self.conv4_2 = nn.Conv2d(in_channels=out_channels[3], out_channels=out_channels[3], kernel_size=3, padding='same')
        self.relu4_2 = nn.ReLU(inplace=True)
        self.conv4_3 = nn.Conv2d(in_channels=out_channels[3], out_channels=out_channels[3], kernel_size=3, padding='same')
        self.relu4_3 = nn.ReLU(inplace=True) # 覆盖掉原来的变量
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) # 向上取整, 1/16

        # conv5
        self.conv5_1 = nn.Conv2d(in_channels=out_channels[3], out_channels=out_channels[4], kernel_size=3, padding='same')
        self.relu5_1 = nn.ReLU(inplace=True)
        self.conv5_2 = nn.Conv2d(in_channels=out_channels[4], out_channels=out_channels[4], kernel_size=3, padding='same')
        self.relu5_2 = nn.ReLU(inplace=True)
        self.conv5_3 = nn.Conv2d(in_channels=out_channels[4], out_channels=out_channels[4], kernel_size=3, padding='same')
        self.relu5_3 = nn.ReLU(inplace=True) # 覆盖掉原来的变量
        self.pool5 = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True) # 向上取整, 1/32

        # fc6
        self.fc6 = nn.Conv2d(in_channels=out_channels[4], out_channels=out_channels[5], kernel_size=7) # 由padding=100得此处的最小尺寸为7!
        self.relu6 = nn.ReLU(inplace=True)
        self.drop6 = nn.Dropout2d()

        # fc7
        self.fc7 = nn.Conv2d(in_channels=out_channels[5], out_channels=out_channels[6], kernel_size=1)
        self.relu7 = nn.ReLU(inplace=True)
        self.drop7 = nn.Dropout2d()

        self.score_fr = nn.Conv2d(in_channels=out_channels[5], out_channels=n_class, kernel_size=1)
        self.scoer_pool3 = nn.Conv2d(in_channels=out_channels[2], out_channels=n_class, kernel_size=1)
        self.score_pool4 = nn.Conv2d(in_channels=out_channels[3], out_channels=n_class, kernel_size=1)

        self.upscore2 = nn.ConvTranspose2d(in_channels=n_class, out_channels=n_class, kernel_size=4, stride=2, bias=False)
        self.upscore8 = nn.ConvTranspose2d(in_channels=n_class, out_channels=n_class, kernel_size=16, stride=8, bias=False)
        self.upscore_pool4 = nn.ConvTranspose2d(in_channels=n_class, out_channels=n_class, kernel_size=4, stride=2, bias=False)


    def forward(self, x):
        shape = x.shape
        if len(shape) is not 4:
            x = torch.unsqueeze(x, 1)
        
        x = self.relu1_1(self.conv1_1(x)) # [n, c, h-2+200, w-2+200]
        x = self.relu1_2(self.conv1_2(x))
        x = self.pool1(x) # [n, 64, (h-2+200)/2, (w-2+200)/2]

        x = self.relu2_1(self.conv2_1(x))
        x = self.relu2_2(self.conv2_2(x))
        x = self.pool2(x) # [n, 128, (h-2+200)/4, (w-2+200)/4]

        x = self.relu3_1(self.conv3_1(x))
        x = self.relu3_2(self.conv3_2(x))
        x = self.relu3_3(self.conv3_3(x))
        x = self.pool3(x) # [n, 256, (h-2+200)/8, (w-2+200)/8]
        pool3 = x

        x = self.relu4_1(self.conv4_1(x))
        x = self.relu4_2(self.conv4_2(x))
        x = self.relu4_3(self.conv4_3(x))
        x = self.pool4(x) # [n, 256, (h-2+200)/16, (w-2+200)/16]
        pool4 = x

        x = self.relu5_1(self.conv5_1(x))
        x = self.relu5_2(self.conv5_2(x))
        x = self.relu5_3(self.conv5_3(x))
        x = self.pool5(x) # [n, 512, (h-2+200)/32, (w-2+200)/32]

        x = self.relu6(self.fc6(x)) # [n, 4096, (h-2+200)/32-6, (w-2+200)/32-6]
        x = self.drop6(x)

        x = self.relu7(self.fc7(x)) # [n, 4096, (h-2+200)/32-6, (w-2+200)/32-6]
        x = self.drop7(x)

        x = self.score_fr(x) # [n, n_class, (h-2+200)/32-6, (w-2+200)/32-6]
        x = self.upscore2(x) # [n, n_class, (h-2+200)/16-10, (w-2+200)/16-10]

        score_pool4 = self.score_pool4(pool4) # [n, n_class, (w-2+200)/16, (w-2+200)/16]
        score_pool4 = score_pool4[:, :, 5:5+x.size()[2], 5:5+x.size()[3]] # [n, n_class, (h-2+200)/16-10, (w-2+200)/16-10]
        
        x = x + score_pool4 # [n, n_class, (h-2+200)/16-10, (w-2+200)/16-10]
        x = self.upscore_pool4(x) # [n, n_class, (h-2+200)/8-18, (w-2+200)/8-18]

        score_pool3 = self.scoer_pool3(pool3) # [n, n_class, (h-2+200)/8, (w-2+200)/8]
        score_pool3 = score_pool3[:, :, 9:9+x.size()[2], 9:9+x.size()[3]] # [n, n_class, (h-2+200)/8-18, (w-2+200)/8-18]

        x = x + score_pool3 # [n, n_class, (h-2+200)/8-18, (w-2+200)/8-18]

        x = self.upscore8(x) # [n, n_class, (h-2+200)-17*8, (w-2+200)-17*8] -> [n, n_class, h+62, w+62]
        x = x[:, :, 31:31+shape[1], 31:31+shape[2]].contiguous() # [n, n_class, h, w]

        return x

FCN全卷积网络详解PPT

U-Net

U-Net: Convolutional Networks for Biomedical Image Segmentation,U-Net是原作者参加ISBI Challenge提出的一种分割网络,能够适应很小的训练集(大约30张图)。U-Net与FCN都是很小的分割网络,既没有使用空洞卷积,也没有后接CRF,结构简单。

U-Net类似于一个大大的U字母:首先进行Conv+Pooling下采样;然后Deconv反卷积进行上采样,crop之前的低层feature map,进行融合;然后再次上采样。重复这个过程,直到获得输出388x388x2的feature map,最后经过softmax获得output segment map。总体来说与FCN思路非常类似。

U-Net网络结构

语义分割 Semantic Segmentation_第20张图片
语义分割 Semantic Segmentation_第21张图片
语义分割 Semantic Segmentation_第22张图片

skip-connect机制

语义分割 Semantic Segmentation_第23张图片

U-Net输出层

语义分割 Semantic Segmentation_第24张图片

U-Net代码实现

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary
from torchstat import stat



#%%
class DoubleConv(nn.Module):
    '''(convolution => [BN] => ReLU) * 2'''

    def __init__(self, in_channels, out_channels, mid_channels=None):
        super(DoubleConv, self).__init__()
        if not mid_channels:
            mid_channels = out_channels
        self.double_conv = nn.Sequential(
            nn.Conv2d(in_channels, mid_channels, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(mid_channels, out_channels, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.double_conv(x)



#%%
class DownSample(nn.Module):
    """Downscaling with maxpool then double conv"""

    def __init__(self, in_channels, out_channels):
        super(DownSample, self).__init__()
        
        self.doubleConv = DoubleConv(in_channels, out_channels)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)

    def forward(self, x):
        res = self.doubleConv(x)
        out = self.maxpool(res)
        return res, out



#%%
class UpSample(nn.Module):
    """Upscaling then double conv"""

    def __init__(self, in_channels, out_channels, bilinear=False):
        super().__init__()

        # if bilinear, use the normal convolutions to reduce the number of channels
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
            self.conv = DoubleConv(in_channels, out_channels, in_channels//2)
        else:
            self.up = nn.ConvTranspose2d(in_channels, in_channels//2, kernel_size=2, stride=2)
            self.conv = DoubleConv(in_channels, out_channels)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        # input is CHW
        diffY = x2.size()[2] - x1.size()[2]
        diffX = x2.size()[3] - x1.size()[3]

        x1 = F.pad(x1, [diffX//2, diffX-diffX//2,
                        diffY//2, diffY-diffY//2]) # 补齐维度
        x = torch.cat([x2, x1], dim=1)
        return self.conv(x)



#%%
class UNet(nn.Module):
    def __init__(self, in_channels=1, out_channels=[64, 128, 256, 512, 1024], n_classes=5, bilinear=False):
        super(UNet, self).__init__()
        
        self.down1 = DownSample(in_channels=in_channels, out_channels=out_channels[0])
        self.down2 = DownSample(in_channels=out_channels[0], out_channels=out_channels[1])
        self.down3 = DownSample(in_channels=out_channels[1], out_channels=out_channels[2])
        self.down4 = DownSample(in_channels=out_channels[2], out_channels=out_channels[3])

        factor = 2 if bilinear else 1
        self.center = DoubleConv(in_channels=out_channels[3], out_channels=out_channels[4]//factor)

        self.up1 = UpSample(in_channels=out_channels[4], out_channels=out_channels[3]//factor, bilinear=bilinear)
        self.up2 = UpSample(in_channels=out_channels[3], out_channels=out_channels[2]//factor, bilinear=bilinear)
        self.up3 = UpSample(in_channels=out_channels[2], out_channels=out_channels[1]//factor, bilinear=bilinear)
        self.up4 = UpSample(in_channels=out_channels[1], out_channels=out_channels[0])

        self.outConv = nn.Conv2d(in_channels=out_channels[0], out_channels=n_classes, kernel_size=1)

    def forward(self, x):
        if len(x.shape) is not 4:
            x = torch.unsqueeze(x, 1)

        res1, x = self.down1(x)
        res2, x = self.down2(x)
        res3, x = self.down3(x)
        res4, x = self.down4(x)

        x = self.center(x)

        x = self.up1(x, res4)
        x = self.up2(x, res3)
        x = self.up3(x, res2)
        x = self.up4(x, res1)

        x = self.outConv(x)

        return x

U-Net/PSP网络PPT

Pyramid Scene Parsing Network(PSP)分割网络

Pyramid Scene Parsing Network,提出的金字塔池化模块( pyramid pooling module)能够聚合不同区域的上下文信息,从而提高获取全局信息的能力。

三个分割问题

语义分割 Semantic Segmentation_第25张图片

基于CNN的分割模型取得了很大的成果,但是在面对场景分析这一任务中遇到了困难。场景分析有两个特点:目标种类繁多;多个目标交叠在一起。这共同导致分割难度增大,分割效果不尽如人意。由此产生了三个问题:

Mismatched Relationship(不匹配联系)
上下文关系匹配对理解复杂场景很重要。对于一个物体其存在的位置是有其规律的,比如上图第一行中所示的,FCN网络错将“船”分类成“汽车”,但是汽车很少会出现在河面上的,这是因为FCN缺乏依据上下文推断的能力

Confusion Categories(类别混乱)

对于一些具有相似属性的目标会在分割网络结果中存在混淆的现象,如上图中第二行所示,FCN对于building和skyscaper这两个相似目标的分类出现了混淆,许多标签之间存在关联,可以通过标签之间的关系弥补分割网络这一缺陷。

Inconspicuous Classes(不连续类别)

对于一些较小的目标在分割任务中难以找到,大目标超出了网络感受野而导致不连续分割的情况,如上图第三行所示。床和枕头因为颜色和材质一致,而且枕头被包含在床里面,因此FCN缺失对枕头的分割。为了提高网络对非常小或非常大的对象的性能,应该特别注意包含不显著类别(过大或者过小)的不同子区域。

PSPNet的主要贡献

分割网络的许多问题出在FCN不能有效的处理场景之间的关系和全局信息。论文提出了能够获取全局场景的深度网络PSPNet,能够融合合适的全局特征,将局部和全局信息融合到一起。并提出了一个适度监督损失的优化策略,在多个数据集上表现优异。

PSP针对的问题

语义分割 Semantic Segmentation_第26张图片

语义分割 Semantic Segmentation_第27张图片

感受野(RF)作用的理解

语义分割 Semantic Segmentation_第28张图片

RF -> PSP

语义分割 Semantic Segmentation_第29张图片

Pyramid Pooling 模块

语义分割 Semantic Segmentation_第30张图片

在一般CNN中感受野可以粗略的认为是使用上下文信息的大小,论文指出在许多网络中没有充分的获取全局信息,所以效果不好。要解决这一问题,常用的方法是:

  • 用Global Average Pooling处理。但这可能会失去空间关系导致模糊。
  • 由金字塔池化产生不同层次的特征最后被平滑的连接成一个FC层做分类。这样可以去除CNN固定大小的图像分类约束,减少不同区域之间的信息损失。

语义分割 Semantic Segmentation_第31张图片

Adaptive Pool 自适应池化的维度计算

语义分割 Semantic Segmentation_第32张图片

PSP网络结构

语义分割 Semantic Segmentation_第33张图片

  • 输入图像经过预训练的模型(ResNet101)和 空洞卷积(dilated) 策略提取feature map,提取后的feature map是原始输入图像的1/8。
  • feature map经过Pyramid Pooling Module得到融合的带有整体信息的feature map,在池化前的feature map拼接到一起。
  • 最后过一个卷积层得到最终输出。

Dilated Convolution 空洞卷积

语义分割 Semantic Segmentation_第34张图片
语义分割 Semantic Segmentation_第35张图片
语义分割 Semantic Segmentation_第36张图片

PSP网的辅助模块

语义分割 Semantic Segmentation_第37张图片

PSPNet代码实现

import torch
import torch.nn as nn
from torchvision import models
import torch.nn.functional as F
from torchinfo import summary
from torchstat import stat



#%%
def initialize_weights(*models):
    for model in models:
        for module in model.modules():
            if isinstance(module, nn.Conv2d) or isinstance(module, nn.Linear):
                nn.init.kaiming_normal(module.weight)
                if module.bias is not None:
                    module.bias.data.zero_()
            elif isinstance(module, nn.BatchNorm2d):
                module.weight.data.fill_(1)
                module.bias.data.zero_()



#%%
class PyramidPoolingModule(nn.Module):
    def __init__(self, in_dim, reduction_dim, setting):
        super(PyramidPoolingModule, self).__init__()
        self.features = []
        for s in setting:
            self.features.append(nn.Sequential(
                nn.AdaptiveAvgPool2d(output_size=s),
                nn.Conv2d(in_channels=in_dim, out_channels=reduction_dim, kernel_size=1, bias=False),
                nn.BatchNorm2d(num_features=reduction_dim, momentum=.95),
                nn.ReLU(inplace=True)
            ))
        self.features = nn.ModuleList(self.features)

    def forward(self, x):
        x_size = x.size()
        out = [x]
        for f in self.features:
            out.append(F.upsample(f(x), x_size[2:], mode='bilinear'))
        out = torch.cat(out, 1)
        return out



#%%
class PSPNet(nn.Module):
    def __init__(self, in_channels=1, n_classes=5, pretrained=False, use_aux=False):
        super(PSPNet, self).__init__()
        self.use_aux = use_aux
        resnet = models.resnet101()
        if pretrained:
            resnet = models.resnet101(pretrained=pretrained)
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=64, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
        ) # out_channel=64
        self.layer1 = resnet.layer1 # out_channel=256
        self.layer2 = resnet.layer2 # out_channel=512
        self.layer3 = resnet.layer3 # out_channel=1024
        self.layer4 = resnet.layer4 # out_channel=2048

        for n, m in self.layer3.named_modules():
            if 'conv2' in n:
                m.dilation, m.padding, m.stride = (2, 2), (2, 2), (1, 1)
            elif 'downsample.0' in n:
                m.stride = (1, 1)
        for n, m in self.layer4.named_modules():
            if 'conv2' in n:
                m.dilation, m.padding, m.stride = (4, 4), (4, 4), (1, 1)
            elif 'downsample.0' in n:
                m.stride = (1, 1)

        self.ppm = PyramidPoolingModule(in_dim=2048, reduction_dim=512, setting=[1, 2, 3, 6])

        self.final = nn.Sequential(
            nn.Conv2d(in_channels=4096, out_channels=512, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(num_features=512, momentum=.95),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.1),
            nn.Conv2d(in_channels=512, out_channels=n_classes, kernel_size=1)
        )

        if use_aux: # auxiliary loss
            self.aux_logits = nn.Conv2d(in_channels=1024, out_channels=n_classes, kernel_size=1)
            initialize_weights(self.aux_logits)

        initialize_weights(self.ppm, self.final)

    def forward(self, x):
        x_size = x.size()
        if len(x_size) is not 4:
            x = torch.unsqueeze(x, 1) # [n, c, h, w]

        x = self.layer0(x) # [n, 64, h//4, w//4]
        x = self.layer1(x) # [n, 256, h//4, w//4]
        x = self.layer2(x) # [n, 512, h//8, w//8]
        x = self.layer3(x) # [n, 1024, h//8, w//8]
        if self.training and self.use_aux:
            aux = self.aux_logits(x)
        x = self.layer4(x) # [n, 2048, h//8, w//8]
        x = self.ppm(x) # [n, 4096, h//8, w//8]
        x = self.final(x) # [n, n_classes, h//8, w//8]
        if self.training and self.use_aux:
            return F.upsample(x, x_size[1:], mode='bilinear'), F.upsample(aux, x_size[1:], mode='bilinear')
        return F.upsample(x, x_size[1:], mode='bilinear')

U-Net/PSP网络PPT

图像分割2(U-Net/V-Net/PSPNet)

DeepLab

语义分割 Semantic Segmentation_第38张图片

DeepLab V1

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

DeepLab V1 网络结构

语义分割 Semantic Segmentation_第39张图片

DeepLab V1 代码实现
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary
from torchstat import stat



#%%
class classification(nn.Module):
    def __init__(self, in_channels, out_channels, stride, n_classes):
        super(classification, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=stride, padding=1)
        self.relu1 = nn.ReLU(inplace=True)
        self.drop1 = nn.Dropout(p=0.3)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=1)
        self.relu2 = nn.ReLU(inplace=True)
        self.drop2 = nn.Dropout(p=0.3)
        self.conv3 = nn.Conv2d(in_channels=out_channels, out_channels=n_classes, kernel_size=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.drop1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.drop2(x)
        x = self.conv3(x)

        return x


#%%
class DeepLab_V1(nn.Module):
    def __init__(self, in_channels=1, out_channels=[64, 128, 256, 512, 512, 512, 512], n_classes=5):
        super(DeepLab_V1, self).__init__()

        self.classification0 = classification(
            in_channels, out_channels[0], stride=8, n_classes=n_classes
        ) # 下采样八倍所以里面的第一个卷积的stride = 8
        
        self.vggLayer1 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels[0], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[0], out_channels=out_channels[0], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        )

        self.classification1 = classification(
            out_channels[0], out_channels[1], stride=4, n_classes=n_classes
        ) #接受Layer1的输出所以下采样4倍

        self.vggLayer2 = nn.Sequential(
            nn.Conv2d(in_channels=out_channels[0], out_channels=out_channels[1], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[1], out_channels=out_channels[1], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        )

        self.classification2 = classification(
            out_channels[1], out_channels[2], stride=2, n_classes=n_classes
        ) #接受Layer1的输出所以下采样2倍

        self.vggLayer3 = nn.Sequential(
            nn.Conv2d(in_channels=out_channels[1], out_channels=out_channels[2], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[2], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[2], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        )

        self.classification3 = classification(
            out_channels[2], out_channels[3], stride=1, n_classes=n_classes
        ) #接受Layer3的输出相对于原图已经下采样8倍所以不用下采样

        self.vggLayer4 = nn.Sequential(
            nn.Conv2d(in_channels=out_channels[2], out_channels=out_channels[3], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[3], out_channels=out_channels[3], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=out_channels[3], out_channels=out_channels[3], kernel_size=3, padding='same'),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True)
        )

        self.classification4 = classification(
            out_channels[3], out_channels[4], stride=1, n_classes=n_classes
        ) #接受Layer4的输出相对于原图已经下采样8倍所以不用下采样

        self.vggLayer5 = nn.Sequential(
            nn.Conv2d(out_channels[3], out_channels[4], kernel_size=3, dilation=2, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels[4], out_channels[4], kernel_size=3, dilation=2, padding='same'),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels[4], out_channels[4], kernel_size=3, dilation=2, padding='same'),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True)
        )

        self.fc6 = nn.Sequential(
            nn.Conv2d(out_channels[4], out_channels[5], kernel_size=3, dilation=4, padding='same'),
            nn.ReLU(inplace=True),
            nn.Dropout()
        )

        self.fc7 = nn.Sequential(
            nn.Conv2d(out_channels[5], out_channels[6], kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Dropout()
        )

        self.classification7 = classification(
            out_channels[6], out_channels[6], stride=1, n_classes=n_classes
        ) #接受fc7的输出相对于原图已经下采样8倍所以不用下采样

    def forward(self, x):
        x_size = x.size()
        if len(x_size) is not 4:
            x = torch.unsqueeze(x, 1) # [n, c, h, w]

        cla0 = self.classification0(x) # [n, 64, h//8, w//8]

        x = self.vggLayer1(x) # [n, 64, h//2, w//2]
        cla1 = self.classification1(x) # [n, n_classes, h//8, w//8]

        x = self.vggLayer2(x) # [n, 128, h//4, w//4]
        cla2 = self.classification2(x) # [n, n_classes, h//8, w//8]

        x = self.vggLayer3(x) # [n, 256, h//8, w//8]
        cla3 = self.classification3(x) # [n, n_classes, h//8, w//8]

        x = self.vggLayer4(x) # [n, 512, h//8, w//8]
        cla4 = self.classification4(x) # [n, n_classes, h//8, w//8]

        x = self.vggLayer5(x) # [n, 512, h//8, w//8]
        x = self.fc6(x) # [n, 512, h//8, w//8]
        x = self.fc7(x) # [n, 512, h//8, w//8]
        cla7 = self.classification7(x) # [n, n_classes, h//8, w//8]

        x = cla0+cla1+cla2+cla3+cla4+cla7 # [n, n_classes, h//8, w//8]
        x = F.upsample(x, size=x_size[1:], mode='bilinear')

        return x

DeepLab V2

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLab V2 网络结构

语义分割 Semantic Segmentation_第40张图片

语义分割 Semantic Segmentation_第41张图片

DeepLab V2 ASPP模块

语义分割 Semantic Segmentation_第42张图片
语义分割 Semantic Segmentation_第43张图片

DeepLab V2 代码实现
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary
from torchstat import stat



#%%
class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride, padding, dilation):
        super(ResBlock, self).__init__()
        self.downsample = False
        self.mid_channels = out_channels//4

        self.reduce = nn.Sequential(
            nn.Conv2d(in_channels, self.mid_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=self.mid_channels)
        )
        self.conv3x3 = nn.Sequential(
            nn.Conv2d(self.mid_channels, self.mid_channels, 
                      kernel_size=3, stride=stride, padding=padding, dilation=dilation, bias=False),
            nn.BatchNorm2d(num_features=self.mid_channels)
        )
        self.increase = nn.Sequential(
            nn.Conv2d(self.mid_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

        if in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(num_features=out_channels)
            )
        else:
            self.shortcut = nn.Identity()

    def forward(self, x):
        res = x
        x = self.reduce(x)
        x = self.conv3x3(x)
        x = self.increase(x)
        res = self.shortcut(res)
        x = x+res

        return x
        


#%%
class ResLayer(nn.Module):
    def __init__(self, in_channels, out_channels, n_layers, stride=1, padding=1, dilation=1):
        super(ResLayer, self).__init__()
        resLayer = []
        for i in range(n_layers):
            resLayer.append(
                ResBlock(in_channels=(in_channels if i==0 else out_channels),
                         out_channels=out_channels,
                         stride=(stride if i==0 else 1),
                         padding=padding,
                         dilation=dilation)
            )
        self.resLayers = nn.Sequential(*resLayer)
    
    def forward(self, x):
        x = self.resLayers(x)
        return x



#%%
class ASPP(nn.Module):
    def __init__(self, in_channesls, out_channels, dilatopns):
        super(ASPP, self).__init__()

        self.aspp1 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[0], padding=dilatopns[0]),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )
        
        self.aspp2 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[1], padding=dilatopns[1]),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )
        self.aspp3 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[2], padding=dilatopns[2]),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )
        self.aspp4 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[3], padding=dilatopns[3]),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        aspp1 = self.aspp1(x)
        aspp2 = self.aspp2(x)
        aspp3 = self.aspp3(x)
        aspp4 = self.aspp4(x)
        out = aspp1+aspp2+aspp3+aspp4

        return out



#%%
class DeepLab_V2(nn.Module):
    def __init__(self, in_channels=1, out_channels=[64, 256, 512, 1024, 2048], n_layers=[3, 4, 23, 3], n_classes=5):
        super(DeepLab_V2, self).__init__()

        self.stem = nn.Sequential(
            nn.Conv2d(in_channels, out_channels[0], kernel_size=7, stride=2, padding=3, dilation=1),
            nn.BatchNorm2d(num_features=out_channels[0]),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True)
        )

        self.res101Layer1 = ResLayer(out_channels[0], out_channels[1], n_layers[0], stride=2)
        self.res101Layer2 = ResLayer(out_channels[1], out_channels[2], n_layers[1], stride=2)
        self.res101Layer3 = ResLayer(out_channels[2], out_channels[3], n_layers[2], stride=1, padding=2, dilation=2)
        self.res101Layer4 = ResLayer(out_channels[3], out_channels[4], n_layers[3], stride=1, padding=4, dilation=4)

        self.aspp = ASPP(out_channels[4], n_classes, dilatopns=[6, 12, 18, 24])

    def forward(self, x):
        x_size = x.size()
        if len(x_size) is not 4:
            x = torch.unsqueeze(x, 1) # [n, c, h, w]
        x = self.stem(x)
        x = self.res101Layer1(x)
        x = self.res101Layer2(x)
        x = self.res101Layer3(x)
        x = self.res101Layer4(x)
        x = self.aspp(x)
        x = F.upsample(x, size=x_size[1:], mode='bilinear')

        return x

DeepLab V3

Rethinking Atrous Convolution for Semantic Image Segmentation

DeepLab V3 网络结构

语义分割 Semantic Segmentation_第44张图片

DeepLab V3 ASPP升级模块

语义分割 Semantic Segmentation_第45张图片

DeepLab V3 Multi-Grid

语义分割 Semantic Segmentation_第46张图片

代码实现
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchinfo import summary
from torchstat import stat



#%%
class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride, padding, dilation):
        super(ResBlock, self).__init__()
        self.downsample = False
        self.mid_channels = out_channels//4

        self.reduce = nn.Sequential(
            nn.Conv2d(in_channels, self.mid_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=self.mid_channels)
        )
        self.conv3x3 = nn.Sequential(
            nn.Conv2d(self.mid_channels, self.mid_channels, 
                      kernel_size=3, stride=stride, padding=padding, dilation=dilation, bias=False),
            nn.BatchNorm2d(num_features=self.mid_channels)
        )
        self.increase = nn.Sequential(
            nn.Conv2d(self.mid_channels, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

        if in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(num_features=out_channels)
            )
        else:
            self.shortcut = nn.Identity()

    def forward(self, x):
        res = x
        x = self.reduce(x)
        x = self.conv3x3(x)
        x = self.increase(x)
        res = self.shortcut(res)
        x = x+res

        return x
        


#%%
class ResLayer(nn.Module):
    def __init__(self, in_channels, out_channels, n_layers, stride=1, padding=1, dilation=1, multi_grids=None):
        super(ResLayer, self).__init__()

        if multi_grids is None:
            multi_grids = [1 for _ in range(n_layers)]
        else:
            assert n_layers == len(multi_grids)
        
        resLayer = []
        for i in range(n_layers):
            resLayer.append(
                ResBlock(in_channels=(in_channels if i==0 else out_channels),
                         out_channels=out_channels,
                         stride=(stride if i==0 else 1),
                         padding=padding*multi_grids[i],
                         dilation=dilation*multi_grids[i])
            )
        self.resLayers = nn.Sequential(*resLayer)
    
    def forward(self, x):
        x = self.resLayers(x)
        return x
    


#%%
class ASPP_plus(nn.Module):
    def __init__(self, in_channesls, out_channels, dilatopns=[6, 12, 18]):
        super(ASPP_plus, self).__init__()

        self.aspp0 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

        self.aspp1 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[0], padding=dilatopns[0], bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )
        
        self.aspp2 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[1], padding=dilatopns[1], bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )
        self.aspp3 = nn.Sequential(
            nn.Conv2d(in_channesls, out_channels, kernel_size=3,
                      dilation=dilatopns[2], padding=dilatopns[2], bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

        self.aspp4 = nn.Sequential(
            nn.AdaptiveAvgPool2d(output_size=1),
            nn.Conv2d(in_channesls, out_channels, kernel_size=1, bias=False),
            nn.BatchNorm2d(num_features=out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x_size = x.size()
        aspp0 = self.aspp0(x)
        aspp1 = self.aspp1(x)
        aspp2 = self.aspp2(x)
        aspp3 = self.aspp3(x)
        aspp4 = self.aspp4(x)
        aspp4 = F.interpolate(aspp4, x_size[2:], mode='bilinear')
        out = aspp0+aspp1+aspp2+aspp3+aspp4

        return out
    

#%%
class DeepLab_V3(nn.Module):
    def __init__(self, in_channels=1, out_channels=[64, 256, 512, 1024, 2048], 
                 n_layers=[3, 4, 6, 3], multi_grids=[1, 2, 4], n_classes=5):
        super(DeepLab_V3, self).__init__()

        self.stem = nn.Sequential(
            nn.Conv2d(in_channels, out_channels[0], kernel_size=7, stride=2, padding=3, dilation=1),
            nn.BatchNorm2d(num_features=out_channels[0]),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True)
        )

        self.res50Layer1 = ResLayer(out_channels[0], out_channels[1], n_layers[0], stride=2)
        self.res50Layer2 = ResLayer(out_channels[1], out_channels[2], n_layers[1], stride=2)
        self.res50Layer3 = ResLayer(out_channels[2], out_channels[3], n_layers[2], stride=1, padding=2, dilation=2)
        self.res50Layer4 = ResLayer(out_channels[3], out_channels[4], n_layers[3], 
                                    stride=1, padding=2, dilation=2, multi_grids=multi_grids)
        self.res50Layer4_copy1 = ResLayer(out_channels[4], out_channels[4], n_layers[3],
                                          stride=1, padding=4, dilation=4, multi_grids=multi_grids)
        self.res50Layer4_copy2 = ResLayer(out_channels[4], out_channels[4], n_layers[3],
                                          stride=1, padding=8, dilation=8, multi_grids=multi_grids)
        self.res50Layer4_copy3 = ResLayer(out_channels[4], out_channels[4], n_layers[3],
                                          stride=1, padding=16, dilation=16, multi_grids=multi_grids)
        
        self.aspp = ASPP_plus(out_channels[4], n_classes, dilatopns=[6, 12, 18, 24])

    def forward(self, x):
        x_size = x.size()
        if len(x_size) is not 4:
            x = torch.unsqueeze(x, 1) # [n, c, h, w]
        x = self.stem(x)
        x = self.res50Layer1(x)
        x = self.res50Layer2(x)
        x = self.res50Layer3(x)
        x = self.res50Layer4(x)
        x = self.res50Layer4_copy1(x)
        x = self.res50Layer4_copy2(x)
        x = self.res50Layer4_copy3(x)
        x = self.aspp(x)
        x = F.upsample(x, size=x_size[1:], mode='bilinear')

        return x

DeepLab V3+

Encoder-decoder with atrous separable convolution for semantic image segmentation

DeepLab PPT

你可能感兴趣的:(深度学习,python,深度学习)