图像语义分割 -- FCN

一:图像语义分割
最简答理解图像语义分割呢就是看下面的图片例子:
像素级别的分类:
图像语义分割 -- FCN_第1张图片

假如像素有五个类别,那么最后输出的结果在长度和宽度上是一样的,只不过通道数就是类别个数了。拆解开各个通道就是如下所示:
图像语义分割 -- FCN_第2张图片

合并后就是各个像素的类别信息了,一般情况下会不同的颜色来表示各个类别,这样我们就能更加清晰看出来哪些像素是什么类别了,这也是一个监督学习的例子。
图像语义分割 -- FCN_第3张图片

二:FCN
由于自己太懒了,关于FCN的概念和介绍请自己自行百度哈。推荐下面链接
https://zhuanlan.zhihu.com/p/31428783
https://www.pianshen.com/article/1189524224/
太多重复的我就不写了,请直接看上面的链接咯,会的就直接跳过忽略,总体来说,
CNN是用来做图像分类的,是图像级别的,较浅的卷积层感知域较小,学习到一些局部区域的特征,较深的卷积层具有较大的感知域,能够学习到更加抽象一些的特征。
但是由于CNN在进行convolution和pooling过程中丢失了图像细节,即feature map size逐渐变小,所以不能很好地指出物体的具体轮廓、指出每个像素具体属于哪个物体,无法做到精确的分割。
FCN是把CNN最后的FC层全部改成了卷积层,这样就会得到2维的输出,能对应到原始图像每一个像素的类别,后接softmax获得每个像素点的分类信息,从而解决了分割问题。
图像语义分割 -- FCN_第4张图片

三:FCN的结构
如下所示:
图像语义分割 -- FCN_第5张图片

简单描述下其结构:
1:image经过多个conv和加一个max pooling变为pool1 feature,宽高变为1/2。
2:pool1 feature再经过多个conv家加一个max pooling变为pool2 feature,宽高变为1/4。
3:pool2 feature再经过多个conv加一个max pooling变为pool3 feature,宽高变为1/8。
4:pool3 feature再经过多个conv加一个max pooling变为pool4 feature,宽高变为1/16。
5:pool4 feature再经过多个conv加一个max pooling变为pool5 feature,宽高变为1/32。

这里是得到了五个特征feature,这个是不是很熟悉啊?
我们才学过的那个VGG不就是这么个结构么?除了我们上篇文章学习的VGG16,还有其他VGG结构,我们等下都会涉及到。

但是这里只是特征提取阶段,我们下一步得进行特征融合等操作,这个才是FCN的重头啊。
1:对于FCN-32,直接将最后的输出通过反卷积扩大32倍。
2:对于FCN-16,先把pool5反卷积扩大2倍,再和pool相加,最后通过反卷积扩大16倍。
3:对于FCN-8,先把pool5反卷积扩大2倍,再和pool相加,得到的值扩大2倍,再和pool3相加,最后通过反卷积扩大8倍。

四:代码实现

复用了VGG的结构。
VGG有VGG11,VGG13,VGG16和VGG19,后两者是最常用的结构,他们的特点就是
前面特征提取层都是五层结构,最后公用两个FC层,在FCN中会把这里卷积改成卷积。
输入时数据是[batch, 3, W, H]。五层分别会得到[batch, 64, W, H],[batch, 128, W, H],[batch, 256, W, H],[batch, 512, W, H],[batch, 512, W, H]。
至于VGG11,VGG13,VGG16和VGG19的结构呢,可以代码查看下

import torch
import torchvision

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    net = torchvision.models.vgg11(pretrained=True)   # 从预训练模型加载VGG11网络参数
    print(net)

    net = torchvision.models.vgg13(pretrained=True)   # 从预训练模型加载VGG13网络参数
    print(net)

    net = torchvision.models.vgg16(pretrained=True)   # 从预训练模型加载VGG16网络参数
    print(net)

    net = torchvision.models.vgg19(pretrained=True)   # 从预训练模型加载VGG19网络参数
    print(net)

为了减少数据量的计算,在反卷积的之前会先进行降维的操作,详细见代码。

完整代码如下:

from torch import nn
import torch
import torchvision

# ranges 是用于方便获取和记录每个池化层得到的特征图
# 各个vgg原始定义的特征输出的层数
ranges = {
    'vgg11': ((0, 3), (3, 6), (6, 11), (11, 16), (16, 21)),
    'vgg13': ((0, 5), (5, 10), (10, 15), (15, 20), (20, 25)),
    'vgg16': ((0, 5), (5, 10), (10, 17), (17, 24), (24, 31)),
    'vgg19': ((0, 5), (5, 10), (10, 19), (19, 28), (28, 37))
}


class VGGTest(nn.Module):
    def __init__(self, vgg='vgg16'):  # num_classes,此处为 二分类值为2
        super(VGGTest, self).__init__()
        net = {}
        if vgg == 'vgg11':
            net = torchvision.models.vgg11(pretrained=True)  # 从预训练模型加载VGG11网络参数
        elif vgg == 'vgg13':
            net = torchvision.models.vgg13(pretrained=True)  # 从预训练模型加载VGG13网络参数
        elif vgg == 'vgg16':
            net = torchvision.models.vgg16(pretrained=True)  # 从预训练模型加载VGG16网络参数
        elif vgg == 'vgg19':
            net = torchvision.models.vgg19(pretrained=True)  # 从预训练模型加载VGG19网络参数

        self.pool1 = net.features[ranges[vgg][0][0]: ranges[vgg][0][1]]  # [batch, 64, W, H]
        self.pool2 = net.features[ranges[vgg][1][0]: ranges[vgg][1][1]]  # [batch, 128,W, H]
        self.pool3 = net.features[ranges[vgg][2][0]: ranges[vgg][2][1]]  # [batch, 256,W, H]
        self.pool4 = net.features[ranges[vgg][3][0]: ranges[vgg][3][1]]  # [batch, 512,W, H]
        self.pool5 = net.features[ranges[vgg][4][0]: ranges[vgg][4][1]]  # [batch, 512,W, H]

    def forward(self, x):
        pool1_features = self.pool1(x)  # 1/2
        pool2_features = self.pool2(pool1_features)  # 1/4
        pool3_features = self.pool3(pool2_features)  # 1/8
        pool4_features = self.pool4(pool3_features)  # 1/16
        pool5_features = self.pool5(pool4_features)  # 1/32

        return pool1_features, pool2_features, pool3_features, pool4_features, pool5_features


class FCN32s(nn.Module):
    def __init__(self, vgg='vgg16', num_classes=21):  # num_classes,最后的通道数。也是像素分类的个数。
        super(FCN32s, self).__init__()

        # 掐年5个提取特征层
        self.net_features = VGGTest(vgg)

        # 最后俩个那个FC层,被改造成两个Conv层
        self.FC_Layers = nn.Sequential(
            # fc6
            nn.Conv2d(512, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d(),

            # fc7
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d()
        )

        # 输出层之前,先降维一下。
        self.score_pool5 = nn.Conv2d(4096, num_classes, 1)
        # 最后把降维后的数据进行反卷积(上采样)
        self.final_x32 = nn.ConvTranspose2d(num_classes, num_classes, 32, stride=32, bias=False)

    def forward(self, x):
        # 得到5个特征
        features = self.net_features(x)

        # 最后一个特征图还要进行FC操作
        last_features = self.FC_Layers(features[-1])

        # 先降行通道降维,减少运算量,再上采样(反卷积法)
        pool5 = self.score_pool5(last_features)

        out = self.final_x32(pool5)
        return out


class FCN16s(nn.Module):
    def __init__(self, vgg='vgg16', num_classes=21):  # num_classes,此处为 二分类值为2
        super(FCN16s, self).__init__()

        # 掐年5个提取特征层
        self.net_features = VGGTest(vgg)

        # 最后俩个那个FC层,被改造成两个Conv层
        self.FC_Layers = nn.Sequential(
            # fc6
            nn.Conv2d(512, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d(),

            # fc7
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d()
        )

        # 输出层之前,先降维一下。
        self.score_pool5 = nn.Conv2d(4096, num_classes, 1)
        self.score_pool4 = nn.Conv2d(512, num_classes, 1)
        # 最后把降维后的数据进行反卷积(上采样)
        self.upscore_pool5_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.final_x16 = nn.ConvTranspose2d(num_classes, num_classes, 16, stride=16, bias=False)

    def forward(self, x):
        # 得到5个特征
        features = self.net_features(x)

        # 最后一个特征图还要进行FC操作
        last_features = self.FC_Layers(features[-1])

        # 先降行通道降维,减少运算量,再上采样(反卷积法)
        pool5 = self.score_pool5(last_features)
        pool4 = self.score_pool4(features[-2])

        out = pool4 + self.upscore_pool5_x2(pool5)

        out = self.final_x16(out)
        return out


class FCN8s(nn.Module):
    def __init__(self, vgg='vgg16', num_classes=21):  # num_classes,此处为 二分类值为2
        super(FCN8s, self).__init__()

        # 掐年5个提取特征层
        self.net_features = VGGTest(vgg)

        # 最后俩个那个FC层,被改造成两个Conv层
        self.FC_Layers = nn.Sequential(
            # fc6
            nn.Conv2d(512, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d(),

            # fc7
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d()
        )

        # 输出层之前,先降维一下。
        self.score_pool5 = nn.Conv2d(4096, num_classes, 1)
        self.score_pool4 = nn.Conv2d(512, num_classes, 1)
        self.score_pool3 = nn.Conv2d(256, num_classes, 1)
        # 最后把降维后的数据进行反卷积(上采样)
        self.upscore_pool5_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool4_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.final_x8 = nn.ConvTranspose2d(num_classes, num_classes, 8, stride=8, bias=False)

    def forward(self, x):
        # 得到5个特征
        features = self.net_features(x)

        # 最后一个特征图还要进行FC操作
        last_features = self.FC_Layers(features[-1])

        # 先行通道降维,减少运算量,再上采样(反卷积法)
        pool5 = self.score_pool5(last_features)
        pool4 = self.score_pool4(features[-2])
        pool3 = self.score_pool3(features[-3])

        out = pool4 + self.upscore_pool5_x2(pool5)
        out = pool3 + self.upscore_pool4_x2(out)

        out = self.final_x8(out)
        return out


class FCN4s(nn.Module):
    def __init__(self, vgg='vgg16', num_classes=21):  # num_classes,此处为 二分类值为2
        super(FCN4s, self).__init__()

        # 掐年5个提取特征层
        self.net_features = VGGTest(vgg)

        # 最后俩个那个FC层,被改造成两个Conv层
        self.FC_Layers = nn.Sequential(
            # fc6
            nn.Conv2d(512, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d(),

            # fc7
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d()
        )

        # 输出层之前,先降维一下。
        self.score_pool5 = nn.Conv2d(4096, num_classes, 1)
        self.score_pool4 = nn.Conv2d(512, num_classes, 1)
        self.score_pool3 = nn.Conv2d(256, num_classes, 1)
        self.score_pool2 = nn.Conv2d(128, num_classes, 1)
        # 最后把降维后的数据进行反卷积(上采样)
        self.upscore_pool5_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool4_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool3_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.final_x4 = nn.ConvTranspose2d(num_classes, num_classes, 4, stride=4, bias=False)

    def forward(self, x):
        # 得到5个特征
        features = self.net_features(x)

        # 最后一个特征图还要进行FC操作
        last_features = self.FC_Layers(features[-1])

        # 先行通道降维,减少运算量,再上采样(反卷积法)
        pool5 = self.score_pool5(last_features)
        pool4 = self.score_pool4(features[-2])
        pool3 = self.score_pool3(features[-3])
        pool2 = self.score_pool2(features[-4])

        out = pool4 + self.upscore_pool5_x2(pool5)
        out = pool3 + self.upscore_pool4_x2(out)
        out = pool2 + self.upscore_pool3_x2(out)

        out = self.final_x4(out)
        return out


class FCN2s(nn.Module):
    def __init__(self, vgg='vgg16', num_classes=21):  # num_classes,此处为 二分类值为2
        super(FCN2s, self).__init__()

        # 掐年5个提取特征层
        self.net_features = VGGTest(vgg)

        # 最后俩个那个FC层,被改造成两个Conv层
        self.FC_Layers = nn.Sequential(
            # fc6
            nn.Conv2d(512, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d(),

            # fc7
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout2d()
        )

        # 输出层之前,先降维一下。
        self.score_pool5 = nn.Conv2d(4096, num_classes, 1)
        self.score_pool4 = nn.Conv2d(512, num_classes, 1)
        self.score_pool3 = nn.Conv2d(256, num_classes, 1)
        self.score_pool2 = nn.Conv2d(128, num_classes, 1)
        self.score_pool1 = nn.Conv2d(64, num_classes, 1)
        # 最后把降维后的数据进行反卷积(上采样)
        self.upscore_pool5_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool4_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool3_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.upscore_pool2_x2 = nn.Sequential(
            nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False),
            nn.ReLU(inplace=True)
        )
        self.final_x2 = nn.ConvTranspose2d(num_classes, num_classes, 2, stride=2, bias=False)

    def forward(self, x):
        # 得到5个特征
        features = self.net_features(x)

        # 最后一个特征图还要进行FC操作
        last_features = self.FC_Layers(features[-1])

        # 先进行通道降维,减少运算量,再上采样(反卷积法)
        pool5 = self.score_pool5(last_features)
        pool4 = self.score_pool4(features[-2])
        pool3 = self.score_pool3(features[-3])
        pool2 = self.score_pool2(features[-4])
        pool1 = self.score_pool1(features[-5])

        out = pool4 + self.upscore_pool5_x2(pool5)
        out = pool3 + self.upscore_pool4_x2(out)
        out = pool2 + self.upscore_pool3_x2(out)
        out = pool1 + self.upscore_pool2_x2(out)

        out = self.final_x2(out)
        return out


if __name__ == '__main__':
    # model = VGGTest()
    x = torch.rand(64, 3, 224, 224)
    print(x.shape)
    num_classes = 21  # 假设最后像素的类别是num_classes,那么通道数就是num_classes

    # model = FCN32s()
    # model = FCN16s()
    # model = FCN8s()
    # model = FCN4s()
    model = FCN2s(num_classes=num_classes)
    y = model(x)
    print(y.shape)

你可能感兴趣的:(深度学习,PyTorch,FCN,图像语义分割,VGG)