UNet地表建筑物识别-手动实现跟我学

此文章为搬运
原项目链接

【AI达人特训营第二期】UNet地表建筑物识别

1.项目介绍

目标:需要利用航拍影像数据完成地表建筑物识别,将地表航拍图像素划分为有建筑物和无建筑物两类。实例图片如下:
UNet地表建筑物识别-手动实现跟我学_第1张图片

用途:通过航拍影像数据,可以监测全国各地,特别是风景区、水资源保护区、耕地区域等建筑物变迁、规划等,本项目对于监管部分具有重要的辅助作用。

难点:地表建筑物识别是分类还是检测问题?选择使用什么模型?

初衷:为什么不选择PaddleSeg or PaddleRS,对于新手来说,难度较大。难以理解参数怎么设置,为什么这么设置。背后是怎么实现的?射雕英雄传中洪七公教郭靖降龙十八掌,看起来就是很普通的一掌。但是其中千变万化,对于新手还是太难理解了。PaddleSeg就有点这意思。

2.数据分析

(1)数据集来源

本数据集使用航拍数据(Inria Aerial Image Labeling),链接为https://aistudio.baidu.com/aistudio/datasetdetail/177948

(2)数据集移动与解压缩

# 重命名,移动
!unzip -oq -d /home/aistudio/data/data177948/ /home/aistudio/data/data177948/地表建筑物识别.zip
!mv /home/aistudio/data/data177948/地表建筑物识别 /home/aistudio/data/data177948/dataset
# 解压数据集
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/train.zip
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/train_mask.csv.zip
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/test_a.zip

(3)数据集概况

数据集包括训练集和测试集。其中训练集中有30000张图像,带标签的图像有24796张。下面的模型就是基于这样带标签的图像训练出来的。注意:利用训练集中的全部数据训练时,会出现“Found inf or nan, current scale is: 0.0, decrease to: 0.0*0.5”类似的提示。

测试集有2500张图像,其中的标签没有提供,即没有mask。因此根据测试集无法得出评价指标。(利用pandas读取train_mask.csv文件后,发现数据表中没有列名字,因此在下面read_csv()函数中写了names=[‘name’, ‘mask’]的参数,作用就是给数据表两列起名字。)获取数据集数量的代码如下:

import pandas as pd
import os

data_path = "/home/aistudio/data/data177948/dataset/"
df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
# 原训练集中图像数量
print(len(df))
# 删除无标签数据后训练集的图像数量
df2=df.dropna()
print(len(df2))
# 测试集中图像数量
df3 = pd.read_csv(os.path.join(data_path,'test_a_samplesubmit.csv'), sep='\t', names=['name', 'mask'])
print(len(df3))

由于直接在原始训练集删除无标签数据的工作量比较大,因此考虑直接对train_mask.csv进行处理,删除mask为Nan的记录。在train_mask.csv`中删除无标签数据的代码如下:

import pandas as pd
import os

data_path = "/home/aistudio/data/data177948/dataset/"
df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])

df = df.dropna() # 删除无标签的记录(删除行)

df = df.reset_index(drop=True) # 删除行之后df中的记录下标不连续,要重置下标

如果不修改train_mask.csv文件,直接在训练集中删除无标签图像的代码如下:

import pandas as pd
import os
import pdb
data_path = "/home/aistudio/data/data177948/dataset/"

df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
# print(len(df))
# print(df.shape[0])
# print(df.shape[1])
df.head()
total = 0
for idx in range(len(df)):
    df['mask'] = df['mask'].fillna('') # .fillna(): 填充NaN的值为空
    # rle mask length
    df['rle_len'] = df['mask'].map(len) # .map(): 特定列中的每一个元素应用一个函数len
    # image/mask path
    df['image_path'] = df['name'].apply(lambda x: os.path.join(data_path, 'train', str(x))) 
    # empty mask
    df['empty'] = (df.rle_len==0)
    img_paths = df['image_path'].tolist() # image

    if df['empty'].iloc[idx]:
        if df.iloc[idx]['name'] in img_paths[idx]:
            os.remove(img_paths[idx])
            total += 1
            print('image--%d--deleted successfully, total=%d'%(idx,total))

(4)图像文件名获取

前面的数据分析可知,从train_mask.csv文件中读取的数据有两列(图像名字name、图像mask),但是没有给出图像的路径。因此,需要自己写出图像的路径,代码如下:

df['image_path'] = df['name'].apply(lambda x: os.path.join(data_path, 'train', str(x))) 

(5)训练集划分

由于训练集的图像数量不多,本项目采用交叉验证方法将原始训练数据分割为训练集和验证集,完成模型的训练。代码如下:

kf = KFold(n_splits=CFG.n_fold, shuffle=True, random_state=CFG.seed)
# 初始化,在df数据表中创建名字为fold的列
df.loc[:,'fold'] = -1
# 将df中数据利用fold进行标志
for fold, (train_idx, val_idx) in enumerate(kf.split(X=df)):
    df.loc[val_idx, 'fold'] = fold

(6)图像读取与显示

利用opencv读取图像,代码如下:

# 读取第1张图片
img = cv2.imread('/home/aistudio/data/data177948/dataset/train/'+ df['name'].iloc[0])
# opencv读取的图像的通道顺序是BGR,用cv2.cvtColor()转换成RGB
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
# 读取之后,显示图像,代码如下:
plt.figure(figsize=(9,9))
plt.imshow(img)
plt.show()

(6)mask读取、显示

由于train_mask.csv中提供的mask是rle编码,需要将rle编码转换为图像(rle是行程长度编码)。编码、解码、显示代码如下:

# 1.将图片编码为rle格式
def rle_encode(im):
    '''
    im: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels = im.flatten(order = 'F')
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

# 2.将rle格式进行解码为图片
def rle_decode(mask_rle, shape=(512, 512)):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape, order='F')

# 3.调用rle_decode()函数可以得到mask的图像,代码如下:
mask = rle_decode(df['mask'].iloc[1])
mask2 = np.array(mask) # 将mask转换为矩阵,mask为单通道
masks = np.stack((mask2,mask2,mask2), axis=2).astype('float32') # 将mask叠加为三通道,[h, w, c]

# 4.mask的显示代码如下:
plt.figure(figsize=(15,10))
plt.imshow(mask)
plt.show()

3.问题分析与模型搭建

本项目的任务是判断一幅图像中是否存在建筑物。从整体上判断一幅图像中是否存在建筑物,这是个识别问题。一种思路将整幅图像作为一个特征进行分类,很显然难度是比较大的。结合标签的特点,可以把整幅图像的分类转换为单个像素的分类。即一幅图像中像素的二分类问题。判断一幅图像中某个像素点是否建筑的构成部分,分别用1和0表示。

对于二分类问题,loss函数选择二值交叉熵损失函数。更多的知识可以百度下。

像素级别的二分类问题在计算机视觉领域中称为图像分割,准确的来说,属于语义分割,属于同一类的像素都要被归为一类,即从像素级别来理解图像。本项目采用经典的U-Net网络完成模型搭建,U-Net网络代码如下(值得注意的是U-Net最后一层的输出并没有经过激活函数):

######################################
### U-Net ###
######################################
class Encoder(nn.Layer):#下采样:两层卷积,两层归一化,最后池化。
    def __init__(self, num_channels, num_filters):
        super(Encoder,self).__init__()#继承父类的初始化
        self.conv1 = nn.Conv2D(in_channels=num_channels,
                              out_channels=num_filters,
                              kernel_size=3,#3x3卷积核,步长为1,填充为1,不改变图片尺寸[H W]
                              stride=1,
                              padding=1)
        self.bn1   = nn.BatchNorm(num_filters,act="relu")#归一化,并使用了激活函数
        
        self.conv2 = nn.Conv2D(in_channels=num_filters,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn2   = nn.BatchNorm(num_filters,act="relu")
        
        self.pool  = nn.MaxPool2D(kernel_size=2,stride=2,padding="SAME")#池化层,图片尺寸减半[H/2 W/2]
        
    def forward(self,inputs):
        x = self.conv1(inputs)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x_conv = x           #两个输出,灰色 ->
        x_pool = self.pool(x)#两个输出,红色 | 
        return x_conv, x_pool
      
class Decoder(nn.Layer):#上采样:一层反卷积,两层卷积层,两层归一化
    def __init__(self, num_channels, num_filters):
        super(Decoder,self).__init__()
        self.up = nn.Conv2DTranspose(in_channels=num_channels,
                                    out_channels=num_filters,
                                    kernel_size=2,
                                    stride=2,
                                    padding=0)#图片尺寸变大一倍[2*H 2*W]

        self.conv1 = nn.Conv2D(in_channels=num_filters*2,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn1   = nn.BatchNorm(num_filters,act="relu")
        
        self.conv2 = nn.Conv2D(in_channels=num_filters,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn2   = nn.BatchNorm(num_filters,act="relu")
        
    def forward(self,input_conv,input_pool):
        x = self.up(input_pool)
        h_diff = (input_conv.shape[2]-x.shape[2])
        w_diff = (input_conv.shape[3]-x.shape[3])
        pad = nn.Pad2D(padding=[h_diff//2, h_diff-h_diff//2, w_diff//2, w_diff-w_diff//2])
        x = pad(x)                                #以下采样保存的feature map为基准,填充上采样的feature map尺寸
        x = paddle.concat(x=[input_conv,x],axis=1)#考虑上下文信息,in_channels扩大两倍
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        return x
    
class UNet(nn.Layer):
    def __init__(self,num_classes=59):
        super(UNet,self).__init__()
        self.down1 = Encoder(num_channels=  3, num_filters=64) #下采样
        self.down2 = Encoder(num_channels= 64, num_filters=128)
        self.down3 = Encoder(num_channels=128, num_filters=256)
        self.down4 = Encoder(num_channels=256, num_filters=512)
        
        self.mid_conv1 = nn.Conv2D(512,1024,1)                 #中间层
        self.mid_bn1   = nn.BatchNorm(1024,act="relu")
        self.mid_conv2 = nn.Conv2D(1024,1024,1)
        self.mid_bn2   = nn.BatchNorm(1024,act="relu")

        self.up4 = Decoder(1024,512)                           #上采样
        self.up3 = Decoder(512,256)
        self.up2 = Decoder(256,128)
        self.up1 = Decoder(128,64)
        
        self.last_conv = nn.Conv2D(64,num_classes,1)           #1x1卷积,softmax做分类
        
    def forward(self,inputs):
        x1, x = self.down1(inputs)
        x2, x = self.down2(x)
        x3, x = self.down3(x)
        x4, x = self.down4(x)
        
        x = self.mid_conv1(x)
        x = self.mid_bn1(x)
        x = self.mid_conv2(x)
        x = self.mid_bn2(x)
        
        x = self.up4(x4, x)
        x = self.up3(x3, x)
        x = self.up2(x2, x)
        x = self.up1(x1, x)
        
        x = self.last_conv(x)
        
        return x

4.评估指标

本项目使用Dice coefficient来衡量选手结果与真实标签的差异性,Dice coefficient可以按像素差异性来比较结果的差异性。Dice coefficient的具体计算方式如下:

2 ∗ ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ \frac{2 * |X \cap Y|}{|X| + |Y|} X+Y2XY

其中X是预测结果,Y为真实标签的结果。当X与Y完全相同时Dice coefficient为1,排行榜使用所有测试集图片的平均Dice coefficient来衡量,分数值越大越好。

5.模型训练

(1)知识准备

GPU使用

import paddle
# 获取GPU
device = paddle.device.get_device()
print(device)
# 设置GPU
device = paddle.device.set_device()
print(device)

额外安装的包

  • PaddleSeg,作用:使用PaddleSeg的DiceLoss函数(注意:在后面的训练本项目写了DiceLoss的代码,但是实际上并没有计算DiceLoss,也就是说本项目中没有使用PaddleSeg)。PaddleSeg需要自己先下载,然后上传后加压缩安装,代码如下:
# 解压缩,重命名
!unzip -o -d /home/aistudio/ /home/aistudio/PaddleSeg-release-2.6.zip > /dev/null
!mv /home/aistudio/PaddleSeg-release-2.6 /home/aistudio/PaddleSeg
# 安装
!pip install -e /home/aistudio/PaddleSeg > /dev/null

#更新sys.path,代码如下:

# 因为`sys.path`可能没有及时更新,这里选择手动加载
import sys
sys.path.append('/home/aistudio/PaddleSeg')

  • skimage,作用:调用color.label2rgb()函数,显示图像时用

(2)完整训练过程

###############################3
# 加载各种包
# 常规包,不需要额外安装
import sys
sys.path.append('/home/aistudio/PaddleSeg')
import os
import pdb
import random
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold # Sklearn
import time
import cv2
from tqdm import tqdm
%matplotlib inline
from matplotlib import pyplot as plt
from skimage import color#label2rgb
# paddle相关包,不需要额外安装
import paddle
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle import nn
from paddle.vision import transforms as A
# paddleSeg包
import paddleseg
#####################################
# 设置随机种子,保证训练结果可重复

#random.seed(SEED)
#np.random.seed(SEED)
#paddle.seed(SEED)
def set_seed(seed=42):
    ##### why 42? The Answer to the Ultimate Question of Life, the Universe, and Everything is 42.
    random.seed(seed) # python
    np.random.seed(seed) # numpy
    paddle.seed # pytorch

# mask转换为图片或图片转换为mask

# 将图片编码为rle格式
def rle_encode(im):
    '''
    im: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels = im.flatten(order = 'F')
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

# 将rle格式进行解码为图片
def rle_decode(mask_rle, shape=(512, 512)):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape, order='F')

# 数据变换:本项目中没有使用数据变换,除了Resize操作。

def build_transforms(CFG):
    data_transforms = {
        "train": A.Compose([
            A.Resize((CFG.img_size,CFG.img_size)), #把数据长宽像素调成224*224
            #ColorJitter(0.4, 0.4, 0.4, 0.4),
            #A.RandomHorizontalFlip(0.5),
            #A.RandomRotation((-5,5)),        
            #A.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC'), #标准化
            #A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], data_format='HWC') #标准化
            #Transpose(), #原始数据形状维度是HWC格式,经过Transpose,转换为CHW格式
            ]),     
        "valid_test": A.Compose([
            #A.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC')
            #A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], data_format='HWC') #标准化
            A.Resize((CFG.img_size,CFG.img_size)), #把数据长宽像素调成224*224
            ])
        }
    return data_transforms

# 数据读取

class build_dataset(Dataset):
    def __init__(self, df, label=True, transforms=None):
        self.df = df
        self.label = label
        self.img_paths = df['image_path'].tolist() # image
        self.masks = df['mask'].tolist()
        #self.ids = df['id'].tolist()

        # if 'mask_path' in df.columns:
        #     self.mask_paths  = df['mask_path'].tolist() # mask
        # else:
        #     self.mask_paths = None

        self.transforms = transforms

    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        #pdb.set_trace()
        #### load id
        #id       = self.ids[index]
        #### load image
        img_path  = self.img_paths[index]
        
        image = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB).astype(np.float32)/255
        #image = cv2.resize(image, (CFG.img_size, CFG.img_size),interpolation=cv2.INTER_LINEAR)
        #mask  = cv2.resize(mask, dsize=(CFG.img_size, CFG.img_size),interpolation=cv2.INTER_LINEAR)
        
        if self.label: # train
            #### load mask
            mask = self.masks[index]  # self.df['mask'].iloc[index]
            mask = rle_decode(mask)
            mask = np.array(mask).astype('float32')
            
            if self.transforms:
                image = self.transforms(image)
                mask = self.transforms(mask)
                #image, mask = self.transforms(image, mask)
            #pdb.set_trace()
            image = np.transpose(image, (2, 0, 1)) # [h, w, c] => [c, h, w]
            mask = mask.reshape((CFG.img_size, CFG.img_size, 1))
            mask = np.transpose(mask, (2, 0, 1)) # [h, w, c] => [c, h, w]
            return paddle.to_tensor(image), paddle.to_tensor(mask)
        else:  # test
            ### augmentations
            if self.transforms:
                image = self.transforms(image)
            image = np.transpose(image, (2, 0, 1)) # [h, w, c] => [c, h, w]
            return paddle.to_tensor(image)       

def build_dataloader(df, fold, data_transforms, CFG):
    train_df = df.query("fold!=@fold").reset_index(drop=True)
    valid_df = df.query("fold==@fold").reset_index(drop=True)
    #pdb.set_trace()
    train_dataset = build_dataset(train_df, label=True, transforms=data_transforms['train'])
    valid_dataset = build_dataset(valid_df, label=True, transforms=data_transforms['valid_test'])
    train_loader = DataLoader(train_dataset, batch_size=CFG.train_bs, num_workers=CFG.num_worker, 
                              shuffle=True, use_shared_memory=False, drop_last=False)
    valid_loader = DataLoader(valid_dataset, batch_size=CFG.valid_bs, num_workers=CFG.num_worker, 
                              shuffle=False, use_shared_memory=False)
    return train_loader, valid_loader

# 构建模型

######################################
### model ###
######################################
class Encoder(nn.Layer):#下采样:两层卷积,两层归一化,最后池化。
    def __init__(self, num_channels, num_filters):
        super(Encoder,self).__init__()#继承父类的初始化
        self.conv1 = nn.Conv2D(in_channels=num_channels,
                              out_channels=num_filters,
                              kernel_size=3,#3x3卷积核,步长为1,填充为1,不改变图片尺寸[H W]
                              stride=1,
                              padding=1)
        self.bn1   = nn.BatchNorm(num_filters,act="relu")#归一化,并使用了激活函数
        
        self.conv2 = nn.Conv2D(in_channels=num_filters,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn2   = nn.BatchNorm(num_filters,act="relu")
        
        self.pool  = nn.MaxPool2D(kernel_size=2,stride=2,padding="SAME")#池化层,图片尺寸减半[H/2 W/2]
        
    def forward(self,inputs):
        x = self.conv1(inputs)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x_conv = x           #两个输出,灰色 ->
        x_pool = self.pool(x)#两个输出,红色 | 
        return x_conv, x_pool
    
    
class Decoder(nn.Layer):#上采样:一层反卷积,两层卷积层,两层归一化
    def __init__(self, num_channels, num_filters):
        super(Decoder,self).__init__()
        self.up = nn.Conv2DTranspose(in_channels=num_channels,
                                    out_channels=num_filters,
                                    kernel_size=2,
                                    stride=2,
                                    padding=0)#图片尺寸变大一倍[2*H 2*W]

        self.conv1 = nn.Conv2D(in_channels=num_filters*2,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn1   = nn.BatchNorm(num_filters,act="relu")
        
        self.conv2 = nn.Conv2D(in_channels=num_filters,
                              out_channels=num_filters,
                              kernel_size=3,
                              stride=1,
                              padding=1)
        self.bn2   = nn.BatchNorm(num_filters,act="relu")
        
    def forward(self,input_conv,input_pool):
        x = self.up(input_pool)
        h_diff = (input_conv.shape[2]-x.shape[2])
        w_diff = (input_conv.shape[3]-x.shape[3])
        pad = nn.Pad2D(padding=[h_diff//2, h_diff-h_diff//2, w_diff//2, w_diff-w_diff//2])
        x = pad(x)                                #以下采样保存的feature map为基准,填充上采样的feature map尺寸
        x = paddle.concat(x=[input_conv,x],axis=1)#考虑上下文信息,in_channels扩大两倍
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        return x
    
class UNet(nn.Layer):
    def __init__(self,num_classes=59):
        super(UNet,self).__init__()
        self.down1 = Encoder(num_channels=  3, num_filters=64) #下采样
        self.down2 = Encoder(num_channels= 64, num_filters=128)
        self.down3 = Encoder(num_channels=128, num_filters=256)
        self.down4 = Encoder(num_channels=256, num_filters=512)
        
        self.mid_conv1 = nn.Conv2D(512,1024,1)                 #中间层
        self.mid_bn1   = nn.BatchNorm(1024,act="relu")
        self.mid_conv2 = nn.Conv2D(1024,1024,1)
        self.mid_bn2   = nn.BatchNorm(1024,act="relu")

        self.up4 = Decoder(1024,512)                           #上采样
        self.up3 = Decoder(512,256)
        self.up2 = Decoder(256,128)
        self.up1 = Decoder(128,64)
        
        self.last_conv = nn.Conv2D(64,num_classes,1)           #1x1卷积,softmax做分类
        
    def forward(self,inputs):
        x1, x = self.down1(inputs)
        x2, x = self.down2(x)
        x3, x = self.down3(x)
        x4, x = self.down4(x)
        
        x = self.mid_conv1(x)
        x = self.mid_bn1(x)
        x = self.mid_conv2(x)
        x = self.mid_bn2(x)
        
        x = self.up4(x4, x)
        x = self.up3(x3, x)
        x = self.up2(x2, x)
        x = self.up1(x1, x)
        
        x = self.last_conv(x)
        
        return x
def build_model(CFG, test_flag=False):
    if test_flag:
        pretrain_weights = None
    else:
        pretrain_weights = "imagenet"

    model = UNet(CFG.num_classes)

    #model.to(CFG.device)
    return model

# - 构建loss函数

def build_loss():
    BCELoss = paddle.nn.BCEWithLogitsLoss()
    DiceLoss    = paddleseg.models.losses.DiceLoss()
    return {"BCELoss":BCELoss, "DiceLoss":DiceLoss}

# - 构建评价指标

def dice_coef(y_true, y_pred, thr=0.5, dim=(2,3), epsilon=0.001):
    y_true = y_true.astype(np.float32)
    y_pred = (y_pred>thr).astype(np.float32)
    inter = paddle.sum((y_true*y_pred), axis=dim)
    den = paddle.sum(y_true, axis=dim) + paddle.sum(y_pred, axis=dim)
    dice = paddle.mean((2*inter+epsilon)/(den+epsilon), axis=(1,0))
    return dice

def iou_coef(y_true, y_pred, thr=0.5, dim=(2,3), epsilon=0.001):
    y_true = y_true.astype(np.float32)
    y_pred = (y_pred>thr).astype(np.float32)
    inter = paddle.sum((y_true*y_pred), axis=dim)
    union = paddle.sum((y_true + y_pred - y_true*y_pred), axis=dim)
    iou = paddle.mean((inter+epsilon)/(union+epsilon), axis=(1,0))
    return iou

# - 构建训练、验证与测试函数

def train_one_epoch(model, train_loader, optimizer, losses_dict, CFG, log, epoch):
    model.train()
    scaler = paddle.amp.GradScaler() 
    losses_all, bce_all, dice_all = 0, 0, 0
    log.write('---------epoch---%d---start----------' %epoch)
    log.write('\n')  
    pbar = tqdm(enumerate(train_loader), total=len(train_loader), desc='Train ')
    for _, (images, masks) in pbar:
        # batch: dict_keys(['index', 'id', 'organ', 'image', 'mask'])
        #optimizer.zero_grad()
        optimizer.clear_grad()
        #pdb.set_trace()
        with paddle.amp.auto_cast(enable=True):
            y_preds = model(images) # [b, c, w, h]
            # pdb.set_trace()
            # preds   = paddle.nn.Sigmoid()(y_preds)
            # pred = preds[0]
            # img = images[0]
            # mask = masks[0]
            # show_masked_img(img, pred, mask, title='')
            #visualizationShowFusion(images, y_preds, masks, "show", input_chennels=3, show=True)
            #pdb.set_trace()
            bce_loss = losses_dict["BCELoss"](y_preds, masks)
            # dice_loss = losses_dict["DiceLoss"](y_preds, masks)
            losses = bce_loss # + dice_loss
        
        scaler.scale(losses).backward()
        scaler.step(optimizer)
        scaler.update()
        
        losses_all += losses.item() / images.shape[0]
        bce_all += bce_loss.item() / images.shape[0]
        # dice_all += dice_loss.item() / images.shape[0]
        dice_all += 0
    #pdb.set_trace()
    current_lr = optimizer.get_lr()
    #current_lr = paddle.optimizer.get_lr()
    log.write('%0.5f  %d    | %+5.3f %5.3f |' % (\
                             current_lr, epoch, bce_all, dice_all))
    #log.write('\n')
    print("lr: {:.4f}".format(current_lr), flush=True)
    print("loss: {:.3f}, bce_all: {:.3f}, dice_all: {:.3f}".format(losses_all, bce_all, dice_all), flush=True)
@paddle.no_grad()
def valid_one_epoch(model, valid_loader, CFG, log):
    model.eval()
    val_scores = []
    # pdb.set_trace()
    pbar = tqdm(enumerate(valid_loader), total=len(valid_loader), desc='Valid ')
    for _, (images, masks) in pbar:
       
        y_preds = model(images) 
        y_preds   = paddle.nn.Sigmoid()(y_preds) # [b, c, w, h]
        
        # pred = y_preds[0]
        # img = images[0]
        # mask = masks[0]
        # show_masked_img(img, pred, mask, title='')
        val_dice = dice_coef(masks, y_preds).cpu().detach().numpy()
        val_jaccard = iou_coef(masks, y_preds).cpu().detach().numpy()
        val_scores.append([val_dice, val_jaccard])
        
    val_scores  = np.mean(val_scores, axis=0)
    val_dice, val_jaccard = val_scores
    #val_dice = val_scores[0].astype(np.float32)
    #val_jaccard = val_scores[1].astype(np.float32)
    log.write('| %+5.3f %5.3f |' % (val_dice[0], val_jaccard[0]))
    log.write('\n')
    print("val_dice: {:.4f}, val_jaccard: {:.4f}".format(val_dice[0], val_jaccard[0]), flush=True)
    
    return images, y_preds, masks, val_dice, val_jaccard

@paddle.no_grad()
def test_one_epoch(ckpt_paths, test_loader, CFG):
    pred_strings = []
    pred_ids = []
    pred_classes = []
    
    pbar = tqdm(enumerate(test_loader), total=len(test_loader), desc='Test: ')
    for _, (images) in pbar:

        size = images.shape
        masks = paddle.zeros((size[0], CFG.num_classes, size[2], size[3]), dtype=paddle.float32) # [b, c, w, h]
        ############################################
        ##### >>>>>>> cross validation infer <<<<<<
        ############################################
        for fold in range(0, CFG.fold):
            model = build_model(CFG, test_flag=True)
            optimizer = paddle.optimizer.AdamW(learning_rate=CFG.lr, parameters=model.parameters(), weight_decay=CFG.wd)#, apply_decay_param_fun=lambda x: x in CFG.decay_params)
            save_path_model = f"{CFG.ckpt_path}/best_fold_model{fold}.pdmodel"
            save_path_opt = f"{CFG.ckpt_path}/best_fold_opt{fold}.pdmodel"
            model.set_state_dict(paddle.load(save_path_model))
            optimizer.set_state_dict(paddle.load(save_path_opt))
            model.eval()
            y_preds = model(images) # [b, c, w, h]
            y_preds   = paddle.nn.Sigmoid()(y_preds) # [b, c, w, h]
            masks += y_preds/len(CFG.fold)
            
            val_dice_test = dice_coef(masks, y_preds)
            val_dice = dice_coef(masks, y_preds).cpu().detach().numpy()
            val_jaccard = iou_coef(masks, y_preds).cpu().detach().numpy()
        
        masks = (masks>CFG.thr).cpu().detach().numpy() # [n, c, h, w]

    return images, masks

# - 显示图像分割结果函数

def plot_img(img, pred, mask='', img_path='', label=True, title =''):
    if label:
        #rescalse mask to 0-1 range regardless of min and max value
        img = np.transpose(img, (1,2,0))
        pred = np.transpose(pred, (1,2,0))
        pred = (pred - pred.min())/(pred.max()-pred.min())
        pred = np.nan_to_num(pred)
        pred = paddle.to_tensor(pred)
        pred = paddle.round(pred)
        pred = pred.reshape(img.shape[:2])

        mask = np.transpose(mask, (1,2,0))
        mask = (mask - mask.min())/(mask.max()-mask.min())
        mask = np.nan_to_num(mask)
        mask = paddle.to_tensor(mask)
        mask = paddle.round(mask)
        mask = mask.reshape(img.shape[:2])
        
        fig, ax = plt.subplots(1, 5, figsize=(15, 3))
        fig.suptitle(title, fontsize=16)
        img, pred, mask = img.numpy(), pred.numpy(), mask.numpy()
        
        ax[0].imshow(mask); ax[0].set_title('Mask')
        ax[1].imshow(pred); ax[1].set_title('Pred')
        ax[2].imshow(img); ax[2].set_title('Image')
        ax[3].imshow(color.label2rgb(mask, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
        ax[3].set_title('Masked Image')
        ax[4].imshow(color.label2rgb(pred, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
        ax[4].set_title('Preded Image')
        plt.savefig(img_path, bbox_inches='tight')
        #plt.show()
    else:
        #rescalse mask to 0-1 range regardless of min and max value
        img = np.transpose(img, (1,2,0))
        pred = np.transpose(pred, (1,2,0))
        pred = (pred - pred.min())/(pred.max() - pred.min())
        pred = np.nan_to_num(pred)
        pred = paddle.to_tensor(pred)
        pred = paddle.round(pred)
        pred = pred.reshape(img.shape[:2])
      
        fig, ax = plt.subplots(1, 3, figsize=(9, 3))
        fig.suptitle(title, fontsize=16)
        img, pred = img.numpy(), pred.numpy()
        
        ax[0].imshow(pred); ax[0].set_title('Pred')
        ax[1].imshow(img); ax[1].set_title('Image')
        ax[2].imshow(color.label2rgb(pred, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
        ax[2].set_title('Preded Image')
        plt.savefig(img_path, bbox_inches='tight')
        #plt.show()

# - 定义保存训练结果的类

class Logger(object):
    def __init__(self):
        self.terminal = sys.stdout  #stdout
        self.file = None

    def open(self, file, mode=None):
        if mode is None: mode ='w'
        self.file = open(file, mode)

    def write(self, message, is_terminal=1, is_file=1 ):
        if '\r' in message: is_file=0

        if is_terminal == 1:
            self.terminal.write(message)
            self.terminal.flush()
            #time.sleep(1)

        if is_file == 1:
            self.file.write(message)
            self.file.flush()

    def flush(self):
        # this flush method is needed for python 3 compatibility.
        # this handles the flush command by doing nothing.
        # you might want to specify some extra behavior here.
        pass

# - 主程序

if __name__ == '__main__':
    ###############################################################
    ##### >>>>>>> config <<<<<<
    ###############################################################
    class CFG:
        # step1: hyper-parameter
        seed = 42 
        device = paddle.device.set_device('gpu:0')
        num_worker = 0 # 0 if debug. 16 if train by "htop" check
        data_path = "/home/aistudio/data/data177948/dataset/"
        ckpt_path = "/home/aistudio/work/ckpt_nonan" # for submit
        # step2: data
        n_fold = 4
        img_size = 224
        train_bs = 4
        valid_bs = train_bs * 2

        # step3: model
        #backbone = 'resnet18'
        num_classes = 1
        # step4: optimizer
        epoch = 5
        lr = 1e-5 # learning_rate 
        wd = 1e-6  # weight_decay
        #lr_drop = 8
        # step5: infer
        thr = 0.3
        resume = False


    set_seed(CFG.seed)
    #pdb.set_trace()
    if not os.path.exists(CFG.ckpt_path):
        os.makedirs(CFG.ckpt_path)
    ########### 训练与验证阶段 
    train_val_flag = True
    if train_val_flag:
        ###############################################################
        ##### 第0步: 数据预处理
        ###############################################################
        df = pd.read_csv(os.path.join(CFG.data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
        # print('before:%d'%len(df))
        df = df.dropna()
        df = df.reset_index(drop=True)
        df['image_path'] = df['name'].apply(lambda x: os.path.join(CFG.data_path, 'train', str(x))) 

        ###############################################################
        ##### 第1步:交叉验证训练设置
        ###############################################################
        kf = KFold(n_splits=CFG.n_fold, shuffle=True, random_state=CFG.seed)   
        df.loc[:,'fold'] = -1

        for fold, (train_idx, val_idx) in enumerate(kf.split(X=df)):
            df.loc[val_idx, 'fold'] = fold

        log = Logger()
        log.open(f"{CFG.ckpt_path}/log_train.txt",mode='a')
        ## start training here! ##############################################
        log.write('** start training here! **\n')
        log.write('                      |-------- -- VALID ----------|---- TRAIN/BATCH ---------\n')
        log.write('rate    epoch |    val_dice,   val_jaccard |     bce_loss,  dice_all, time          \n')
        log.write('---------------------------------- ------------------------------------------------------\n')
        
        start_fold = 0
        if CFG.resume:
            path_checkpoint = os.path.join(CFG.ckpt_path,'checkpoint.pth')
            checkpoint = torch.load(path_checkpoint)
            start_fold = checkpoint['fold']  # 返回到当前的fold。
        
        for fold in range(start_fold, CFG.n_fold):
            log.write('---------fold---%d------------' %fold) 
            ###############################################################
            ##### 第2步:获取数据、建立模型、优化器、损失函数
            ###############################################################
            #data_transforms = {'train':train_augment, 'valid_test': valid_augment}
            #pdb.set_trace()
            data_transforms = build_transforms(CFG) 
            train_loader, valid_loader = build_dataloader(df, fold, data_transforms, CFG) # dataset & dtaloader
            
            model = build_model(CFG) # model
            #scheduler = paddle.optimizer.lr.StepDecay(learning_rate=0.01, step_size=30, gamma=0.1, verbose=False)
            optimizer = paddle.optimizer.AdamW(learning_rate=CFG.lr, parameters=model.parameters(), weight_decay=CFG.wd)#, apply_decay_param_fun=lambda x: x in CFG.decay_params)
            #optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
            losses_dict = build_loss() # loss
            
            start_epoch = 0
            if CFG.resume:
                # path_checkpoint = os.path.join(CFG.ckpt_path,'checkpoint.pth')
                # checkpoint = torch.load(path_checkpoint)
                # start_fold = checkpoint['fold'] - 1  # 返回到当前的fold。fold的起始下标为fold=start_fold+1。
                start_epoch = checkpoint['epoch'] + 1   # 返回下一个epoch下标。如果保存成功的话,说明当前的epoch的训练是没有问题的。
                model.set_dict(checkpoint['model'])
                optimizer.set_dict(checkpoint['optimizer'])
                
            best_val_dice = 0
            best_epoch = 0

            for epoch in range(start_epoch, CFG.epoch):
                start_time = time.time()
                ###############################################################
                ##### 第3步:训练和交叉验证
                ###############################################################
                train_one_epoch(model, train_loader, optimizer, losses_dict, CFG, log, epoch)
                #lr_scheduler.step()
                images, y_preds, masks, val_dice, val_jaccard = valid_one_epoch(model, valid_loader, CFG, log)
                
                ###############################################################
                ##### 第4步:保存模型参数
                ###############################################################
                is_best = (val_dice > best_val_dice)
                best_val_dice = max(best_val_dice, val_dice)
                if is_best:
                    save_path_model = f"{CFG.ckpt_path}/best_fold_model{fold}.pdmodel"
                    save_path_opt = f"{CFG.ckpt_path}/best_fold_opt{fold}.pdmodel"
                    save_path_checkpoint = f"{CFG.ckpt_path}/checkpoint.pdmodel"
                    checkpoint = {
                        'fold':fold,
                        'epoch':epoch,
                        'model':model.state_dict(),
                        'optimizer':optimizer.state_dict()}
                    if os.path.isfile(save_path_model):
                        os.remove(save_path_model)
                    if os.path.isfile(save_path_opt):
                        os.remove(save_path_opt) 
                    #保存模型的参数
                    paddle.save(model.state_dict(), save_path_model)
                    #保存优化器的参数
                    paddle.save(optimizer.state_dict(), save_path_opt)
                    #保存继续训练的模型和参数
                    paddle.save(checkpoint, save_path_checkpoint)
                # 在每个epoch中,显示最后一个batch的图像分割结果
                for idx in range(images.shape[0]):
                    img = images[idx]
                    pred = y_preds[idx]
                    mask = masks[idx]
                    img_path = f"{CFG.ckpt_path}/{fold}_{epoch}_{idx}.png"
                    plot_img(img, pred, mask, img_path, label=True, title='')
                
                epoch_time = time.time() - start_time
                
                log.write('%d  %5.3f %5.3f' % (\
                             epoch, epoch_time, best_val_dice[0]))
                log.write('\n')
                log.write('---------epoch---%d--- end----------' %epoch)  
                log.write('\n')
                print("epoch:{}, time:{:.2f}s, best:{:.2f}\n".format(epoch, epoch_time, best_val_dice[0]))
                
    ######## 测试阶段       
    test_flag = True
    if test_flag:
        set_seed(CFG.seed)
        ###############################################################
        ##### 第0步:数据预处理
        ###############################################################
        test_df = pd.read_csv(os.path.join(CFG.data_path,'test_a_samplesubmit.csv'), sep='\t', names=['name', 'mask'])
        test_df['image_path'] = test_df['name'].apply(lambda x: os.path.join(CFG.data_path, 'test_a', str(x))) 

        data_transforms = build_transforms(CFG)
        test_dataset = build_dataset(test_df, label=False, transforms=data_transforms['valid_test'], cfg=CFG)
        test_loader = DataLoader(test_dataset, batch_size=CFG.train_bs, num_workers=CFG.num_worker, shuffle=True, use_shared_memory=False, drop_last=False)
        ###############################################################
        ##### 第1步:推理
        ###############################################################
        images, y_preds = test_one_epoch(CFG.ckpt_path, test_loader, CFG)
        for idx in range(images.shape[0]):
            img = images[idx]
            pred = y_preds[idx]
            img_path = f"{CFG.ckpt_path}/test_{idx}.png"
            plot_img(img, pred, mask='', img_path=img_path, label=False, title='')

(3)训练结果

验证集的分割结果:

UNet地表建筑物识别-手动实现跟我学_第2张图片
UNet地表建筑物识别-手动实现跟我学_第3张图片

输出的Loss值和Dice系数:

** start training here! **
                      |-------- -- VALID ----------|---- TRAIN/BATCH ---------
rate    epoch |    val_dice,   val_jaccard |     bce_loss,  dice_all, time          
---------------------------------- ------------------------------------------------------
---------fold---0------------
W1209 10:17:34.537297  3759 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1209 10:17:34.542306  3759 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
---------epoch---0---start----------
Train : 100%|██████████| 4650/4650 [06:04<00:00, 12.77it/s]
0.00001  0    | +428.851 0.000 |lr: 0.0000
loss: 428.851, bce_all: 428.851, dice_all: 0.000


Valid :   0%|          | 0/775 [00:00

你可能感兴趣的:(python,人工智能)