此文章为搬运
原项目链接
目标:需要利用航拍影像数据完成地表建筑物识别,将地表航拍图像素划分为有建筑物和无建筑物两类。实例图片如下:
用途:通过航拍影像数据,可以监测全国各地,特别是风景区、水资源保护区、耕地区域等建筑物变迁、规划等,本项目对于监管部分具有重要的辅助作用。
难点:地表建筑物识别是分类还是检测问题?选择使用什么模型?
初衷:为什么不选择PaddleSeg or PaddleRS,对于新手来说,难度较大。难以理解参数怎么设置,为什么这么设置。背后是怎么实现的?射雕英雄传中洪七公教郭靖降龙十八掌,看起来就是很普通的一掌。但是其中千变万化,对于新手还是太难理解了。PaddleSeg就有点这意思。
本数据集使用航拍数据(Inria Aerial Image Labeling),链接为https://aistudio.baidu.com/aistudio/datasetdetail/177948
# 重命名,移动
!unzip -oq -d /home/aistudio/data/data177948/ /home/aistudio/data/data177948/地表建筑物识别.zip
!mv /home/aistudio/data/data177948/地表建筑物识别 /home/aistudio/data/data177948/dataset
# 解压数据集
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/train.zip
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/train_mask.csv.zip
!unzip -oq -d /home/aistudio/data/data177948/dataset/ /home/aistudio/data/data177948/dataset/test_a.zip
数据集包括训练集和测试集。其中训练集中有30000张图像,带标签的图像有24796张。下面的模型就是基于这样带标签的图像训练出来的。注意:利用训练集中的全部数据训练时,会出现“Found inf or nan, current scale is: 0.0, decrease to: 0.0*0.5”类似的提示。
测试集有2500张图像,其中的标签没有提供,即没有mask。因此根据测试集无法得出评价指标。(利用pandas读取train_mask.csv文件后,发现数据表中没有列名字,因此在下面read_csv()函数中写了names=[‘name’, ‘mask’]的参数,作用就是给数据表两列起名字。)获取数据集数量的代码如下:
import pandas as pd
import os
data_path = "/home/aistudio/data/data177948/dataset/"
df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
# 原训练集中图像数量
print(len(df))
# 删除无标签数据后训练集的图像数量
df2=df.dropna()
print(len(df2))
# 测试集中图像数量
df3 = pd.read_csv(os.path.join(data_path,'test_a_samplesubmit.csv'), sep='\t', names=['name', 'mask'])
print(len(df3))
由于直接在原始训练集删除无标签数据的工作量比较大,因此考虑直接对train_mask.csv
进行处理,删除mask为Nan的记录。在train_mask.csv`中删除无标签数据的代码如下:
import pandas as pd
import os
data_path = "/home/aistudio/data/data177948/dataset/"
df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
df = df.dropna() # 删除无标签的记录(删除行)
df = df.reset_index(drop=True) # 删除行之后df中的记录下标不连续,要重置下标
如果不修改train_mask.csv
文件,直接在训练集中删除无标签图像的代码如下:
import pandas as pd
import os
import pdb
data_path = "/home/aistudio/data/data177948/dataset/"
df = pd.read_csv(os.path.join(data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
# print(len(df))
# print(df.shape[0])
# print(df.shape[1])
df.head()
total = 0
for idx in range(len(df)):
df['mask'] = df['mask'].fillna('') # .fillna(): 填充NaN的值为空
# rle mask length
df['rle_len'] = df['mask'].map(len) # .map(): 特定列中的每一个元素应用一个函数len
# image/mask path
df['image_path'] = df['name'].apply(lambda x: os.path.join(data_path, 'train', str(x)))
# empty mask
df['empty'] = (df.rle_len==0)
img_paths = df['image_path'].tolist() # image
if df['empty'].iloc[idx]:
if df.iloc[idx]['name'] in img_paths[idx]:
os.remove(img_paths[idx])
total += 1
print('image--%d--deleted successfully, total=%d'%(idx,total))
前面的数据分析可知,从train_mask.csv
文件中读取的数据有两列(图像名字name、图像mask),但是没有给出图像的路径。因此,需要自己写出图像的路径,代码如下:
df['image_path'] = df['name'].apply(lambda x: os.path.join(data_path, 'train', str(x)))
由于训练集的图像数量不多,本项目采用交叉验证方法将原始训练数据分割为训练集和验证集,完成模型的训练。代码如下:
kf = KFold(n_splits=CFG.n_fold, shuffle=True, random_state=CFG.seed)
# 初始化,在df数据表中创建名字为fold的列
df.loc[:,'fold'] = -1
# 将df中数据利用fold进行标志
for fold, (train_idx, val_idx) in enumerate(kf.split(X=df)):
df.loc[val_idx, 'fold'] = fold
利用opencv读取图像,代码如下:
# 读取第1张图片
img = cv2.imread('/home/aistudio/data/data177948/dataset/train/'+ df['name'].iloc[0])
# opencv读取的图像的通道顺序是BGR,用cv2.cvtColor()转换成RGB
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
# 读取之后,显示图像,代码如下:
plt.figure(figsize=(9,9))
plt.imshow(img)
plt.show()
由于train_mask.csv
中提供的mask是rle编码,需要将rle编码转换为图像(rle是行程长度编码)。编码、解码、显示代码如下:
# 1.将图片编码为rle格式
def rle_encode(im):
'''
im: numpy array, 1 - mask, 0 - background
Returns run length as string formated
'''
pixels = im.flatten(order = 'F')
pixels = np.concatenate([[0], pixels, [0]])
runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
runs[1::2] -= runs[::2]
return ' '.join(str(x) for x in runs)
# 2.将rle格式进行解码为图片
def rle_decode(mask_rle, shape=(512, 512)):
'''
mask_rle: run-length as string formated (start length)
shape: (height,width) of array to return
Returns numpy array, 1 - mask, 0 - background
'''
s = mask_rle.split()
starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
starts -= 1
ends = starts + lengths
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
for lo, hi in zip(starts, ends):
img[lo:hi] = 1
return img.reshape(shape, order='F')
# 3.调用rle_decode()函数可以得到mask的图像,代码如下:
mask = rle_decode(df['mask'].iloc[1])
mask2 = np.array(mask) # 将mask转换为矩阵,mask为单通道
masks = np.stack((mask2,mask2,mask2), axis=2).astype('float32') # 将mask叠加为三通道,[h, w, c]
# 4.mask的显示代码如下:
plt.figure(figsize=(15,10))
plt.imshow(mask)
plt.show()
本项目的任务是判断一幅图像中是否存在建筑物。从整体上判断一幅图像中是否存在建筑物,这是个识别问题。一种思路将整幅图像作为一个特征进行分类,很显然难度是比较大的。结合标签的特点,可以把整幅图像的分类转换为单个像素的分类。即一幅图像中像素的二分类问题。判断一幅图像中某个像素点是否建筑的构成部分,分别用1和0表示。
对于二分类问题,loss函数选择二值交叉熵损失函数。更多的知识可以百度下。
像素级别的二分类问题在计算机视觉领域中称为图像分割,准确的来说,属于语义分割,属于同一类的像素都要被归为一类,即从像素级别来理解图像。本项目采用经典的U-Net网络完成模型搭建,U-Net网络代码如下(值得注意的是U-Net最后一层的输出并没有经过激活函数):
######################################
### U-Net ###
######################################
class Encoder(nn.Layer):#下采样:两层卷积,两层归一化,最后池化。
def __init__(self, num_channels, num_filters):
super(Encoder,self).__init__()#继承父类的初始化
self.conv1 = nn.Conv2D(in_channels=num_channels,
out_channels=num_filters,
kernel_size=3,#3x3卷积核,步长为1,填充为1,不改变图片尺寸[H W]
stride=1,
padding=1)
self.bn1 = nn.BatchNorm(num_filters,act="relu")#归一化,并使用了激活函数
self.conv2 = nn.Conv2D(in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn2 = nn.BatchNorm(num_filters,act="relu")
self.pool = nn.MaxPool2D(kernel_size=2,stride=2,padding="SAME")#池化层,图片尺寸减半[H/2 W/2]
def forward(self,inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = self.conv2(x)
x = self.bn2(x)
x_conv = x #两个输出,灰色 ->
x_pool = self.pool(x)#两个输出,红色 |
return x_conv, x_pool
class Decoder(nn.Layer):#上采样:一层反卷积,两层卷积层,两层归一化
def __init__(self, num_channels, num_filters):
super(Decoder,self).__init__()
self.up = nn.Conv2DTranspose(in_channels=num_channels,
out_channels=num_filters,
kernel_size=2,
stride=2,
padding=0)#图片尺寸变大一倍[2*H 2*W]
self.conv1 = nn.Conv2D(in_channels=num_filters*2,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn1 = nn.BatchNorm(num_filters,act="relu")
self.conv2 = nn.Conv2D(in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn2 = nn.BatchNorm(num_filters,act="relu")
def forward(self,input_conv,input_pool):
x = self.up(input_pool)
h_diff = (input_conv.shape[2]-x.shape[2])
w_diff = (input_conv.shape[3]-x.shape[3])
pad = nn.Pad2D(padding=[h_diff//2, h_diff-h_diff//2, w_diff//2, w_diff-w_diff//2])
x = pad(x) #以下采样保存的feature map为基准,填充上采样的feature map尺寸
x = paddle.concat(x=[input_conv,x],axis=1)#考虑上下文信息,in_channels扩大两倍
x = self.conv1(x)
x = self.bn1(x)
x = self.conv2(x)
x = self.bn2(x)
return x
class UNet(nn.Layer):
def __init__(self,num_classes=59):
super(UNet,self).__init__()
self.down1 = Encoder(num_channels= 3, num_filters=64) #下采样
self.down2 = Encoder(num_channels= 64, num_filters=128)
self.down3 = Encoder(num_channels=128, num_filters=256)
self.down4 = Encoder(num_channels=256, num_filters=512)
self.mid_conv1 = nn.Conv2D(512,1024,1) #中间层
self.mid_bn1 = nn.BatchNorm(1024,act="relu")
self.mid_conv2 = nn.Conv2D(1024,1024,1)
self.mid_bn2 = nn.BatchNorm(1024,act="relu")
self.up4 = Decoder(1024,512) #上采样
self.up3 = Decoder(512,256)
self.up2 = Decoder(256,128)
self.up1 = Decoder(128,64)
self.last_conv = nn.Conv2D(64,num_classes,1) #1x1卷积,softmax做分类
def forward(self,inputs):
x1, x = self.down1(inputs)
x2, x = self.down2(x)
x3, x = self.down3(x)
x4, x = self.down4(x)
x = self.mid_conv1(x)
x = self.mid_bn1(x)
x = self.mid_conv2(x)
x = self.mid_bn2(x)
x = self.up4(x4, x)
x = self.up3(x3, x)
x = self.up2(x2, x)
x = self.up1(x1, x)
x = self.last_conv(x)
return x
本项目使用Dice coefficient来衡量选手结果与真实标签的差异性,Dice coefficient可以按像素差异性来比较结果的差异性。Dice coefficient的具体计算方式如下:
2 ∗ ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ \frac{2 * |X \cap Y|}{|X| + |Y|} ∣X∣+∣Y∣2∗∣X∩Y∣
其中X是预测结果,Y为真实标签的结果。当X与Y完全相同时Dice coefficient为1,排行榜使用所有测试集图片的平均Dice coefficient来衡量,分数值越大越好。
import paddle
# 获取GPU
device = paddle.device.get_device()
print(device)
# 设置GPU
device = paddle.device.set_device()
print(device)
# 解压缩,重命名
!unzip -o -d /home/aistudio/ /home/aistudio/PaddleSeg-release-2.6.zip > /dev/null
!mv /home/aistudio/PaddleSeg-release-2.6 /home/aistudio/PaddleSeg
# 安装
!pip install -e /home/aistudio/PaddleSeg > /dev/null
#更新sys.path,代码如下:
# 因为`sys.path`可能没有及时更新,这里选择手动加载
import sys
sys.path.append('/home/aistudio/PaddleSeg')
###############################3
# 加载各种包
# 常规包,不需要额外安装
import sys
sys.path.append('/home/aistudio/PaddleSeg')
import os
import pdb
import random
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold # Sklearn
import time
import cv2
from tqdm import tqdm
%matplotlib inline
from matplotlib import pyplot as plt
from skimage import color#label2rgb
# paddle相关包,不需要额外安装
import paddle
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle import nn
from paddle.vision import transforms as A
# paddleSeg包
import paddleseg
#####################################
# 设置随机种子,保证训练结果可重复
#random.seed(SEED)
#np.random.seed(SEED)
#paddle.seed(SEED)
def set_seed(seed=42):
##### why 42? The Answer to the Ultimate Question of Life, the Universe, and Everything is 42.
random.seed(seed) # python
np.random.seed(seed) # numpy
paddle.seed # pytorch
# mask转换为图片或图片转换为mask
# 将图片编码为rle格式
def rle_encode(im):
'''
im: numpy array, 1 - mask, 0 - background
Returns run length as string formated
'''
pixels = im.flatten(order = 'F')
pixels = np.concatenate([[0], pixels, [0]])
runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
runs[1::2] -= runs[::2]
return ' '.join(str(x) for x in runs)
# 将rle格式进行解码为图片
def rle_decode(mask_rle, shape=(512, 512)):
'''
mask_rle: run-length as string formated (start length)
shape: (height,width) of array to return
Returns numpy array, 1 - mask, 0 - background
'''
s = mask_rle.split()
starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
starts -= 1
ends = starts + lengths
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
for lo, hi in zip(starts, ends):
img[lo:hi] = 1
return img.reshape(shape, order='F')
# 数据变换:本项目中没有使用数据变换,除了Resize操作。
def build_transforms(CFG):
data_transforms = {
"train": A.Compose([
A.Resize((CFG.img_size,CFG.img_size)), #把数据长宽像素调成224*224
#ColorJitter(0.4, 0.4, 0.4, 0.4),
#A.RandomHorizontalFlip(0.5),
#A.RandomRotation((-5,5)),
#A.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC'), #标准化
#A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], data_format='HWC') #标准化
#Transpose(), #原始数据形状维度是HWC格式,经过Transpose,转换为CHW格式
]),
"valid_test": A.Compose([
#A.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC')
#A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], data_format='HWC') #标准化
A.Resize((CFG.img_size,CFG.img_size)), #把数据长宽像素调成224*224
])
}
return data_transforms
# 数据读取
class build_dataset(Dataset):
def __init__(self, df, label=True, transforms=None):
self.df = df
self.label = label
self.img_paths = df['image_path'].tolist() # image
self.masks = df['mask'].tolist()
#self.ids = df['id'].tolist()
# if 'mask_path' in df.columns:
# self.mask_paths = df['mask_path'].tolist() # mask
# else:
# self.mask_paths = None
self.transforms = transforms
def __len__(self):
return len(self.df)
def __getitem__(self, index):
#pdb.set_trace()
#### load id
#id = self.ids[index]
#### load image
img_path = self.img_paths[index]
image = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB).astype(np.float32)/255
#image = cv2.resize(image, (CFG.img_size, CFG.img_size),interpolation=cv2.INTER_LINEAR)
#mask = cv2.resize(mask, dsize=(CFG.img_size, CFG.img_size),interpolation=cv2.INTER_LINEAR)
if self.label: # train
#### load mask
mask = self.masks[index] # self.df['mask'].iloc[index]
mask = rle_decode(mask)
mask = np.array(mask).astype('float32')
if self.transforms:
image = self.transforms(image)
mask = self.transforms(mask)
#image, mask = self.transforms(image, mask)
#pdb.set_trace()
image = np.transpose(image, (2, 0, 1)) # [h, w, c] => [c, h, w]
mask = mask.reshape((CFG.img_size, CFG.img_size, 1))
mask = np.transpose(mask, (2, 0, 1)) # [h, w, c] => [c, h, w]
return paddle.to_tensor(image), paddle.to_tensor(mask)
else: # test
### augmentations
if self.transforms:
image = self.transforms(image)
image = np.transpose(image, (2, 0, 1)) # [h, w, c] => [c, h, w]
return paddle.to_tensor(image)
def build_dataloader(df, fold, data_transforms, CFG):
train_df = df.query("fold!=@fold").reset_index(drop=True)
valid_df = df.query("fold==@fold").reset_index(drop=True)
#pdb.set_trace()
train_dataset = build_dataset(train_df, label=True, transforms=data_transforms['train'])
valid_dataset = build_dataset(valid_df, label=True, transforms=data_transforms['valid_test'])
train_loader = DataLoader(train_dataset, batch_size=CFG.train_bs, num_workers=CFG.num_worker,
shuffle=True, use_shared_memory=False, drop_last=False)
valid_loader = DataLoader(valid_dataset, batch_size=CFG.valid_bs, num_workers=CFG.num_worker,
shuffle=False, use_shared_memory=False)
return train_loader, valid_loader
# 构建模型
######################################
### model ###
######################################
class Encoder(nn.Layer):#下采样:两层卷积,两层归一化,最后池化。
def __init__(self, num_channels, num_filters):
super(Encoder,self).__init__()#继承父类的初始化
self.conv1 = nn.Conv2D(in_channels=num_channels,
out_channels=num_filters,
kernel_size=3,#3x3卷积核,步长为1,填充为1,不改变图片尺寸[H W]
stride=1,
padding=1)
self.bn1 = nn.BatchNorm(num_filters,act="relu")#归一化,并使用了激活函数
self.conv2 = nn.Conv2D(in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn2 = nn.BatchNorm(num_filters,act="relu")
self.pool = nn.MaxPool2D(kernel_size=2,stride=2,padding="SAME")#池化层,图片尺寸减半[H/2 W/2]
def forward(self,inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = self.conv2(x)
x = self.bn2(x)
x_conv = x #两个输出,灰色 ->
x_pool = self.pool(x)#两个输出,红色 |
return x_conv, x_pool
class Decoder(nn.Layer):#上采样:一层反卷积,两层卷积层,两层归一化
def __init__(self, num_channels, num_filters):
super(Decoder,self).__init__()
self.up = nn.Conv2DTranspose(in_channels=num_channels,
out_channels=num_filters,
kernel_size=2,
stride=2,
padding=0)#图片尺寸变大一倍[2*H 2*W]
self.conv1 = nn.Conv2D(in_channels=num_filters*2,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn1 = nn.BatchNorm(num_filters,act="relu")
self.conv2 = nn.Conv2D(in_channels=num_filters,
out_channels=num_filters,
kernel_size=3,
stride=1,
padding=1)
self.bn2 = nn.BatchNorm(num_filters,act="relu")
def forward(self,input_conv,input_pool):
x = self.up(input_pool)
h_diff = (input_conv.shape[2]-x.shape[2])
w_diff = (input_conv.shape[3]-x.shape[3])
pad = nn.Pad2D(padding=[h_diff//2, h_diff-h_diff//2, w_diff//2, w_diff-w_diff//2])
x = pad(x) #以下采样保存的feature map为基准,填充上采样的feature map尺寸
x = paddle.concat(x=[input_conv,x],axis=1)#考虑上下文信息,in_channels扩大两倍
x = self.conv1(x)
x = self.bn1(x)
x = self.conv2(x)
x = self.bn2(x)
return x
class UNet(nn.Layer):
def __init__(self,num_classes=59):
super(UNet,self).__init__()
self.down1 = Encoder(num_channels= 3, num_filters=64) #下采样
self.down2 = Encoder(num_channels= 64, num_filters=128)
self.down3 = Encoder(num_channels=128, num_filters=256)
self.down4 = Encoder(num_channels=256, num_filters=512)
self.mid_conv1 = nn.Conv2D(512,1024,1) #中间层
self.mid_bn1 = nn.BatchNorm(1024,act="relu")
self.mid_conv2 = nn.Conv2D(1024,1024,1)
self.mid_bn2 = nn.BatchNorm(1024,act="relu")
self.up4 = Decoder(1024,512) #上采样
self.up3 = Decoder(512,256)
self.up2 = Decoder(256,128)
self.up1 = Decoder(128,64)
self.last_conv = nn.Conv2D(64,num_classes,1) #1x1卷积,softmax做分类
def forward(self,inputs):
x1, x = self.down1(inputs)
x2, x = self.down2(x)
x3, x = self.down3(x)
x4, x = self.down4(x)
x = self.mid_conv1(x)
x = self.mid_bn1(x)
x = self.mid_conv2(x)
x = self.mid_bn2(x)
x = self.up4(x4, x)
x = self.up3(x3, x)
x = self.up2(x2, x)
x = self.up1(x1, x)
x = self.last_conv(x)
return x
def build_model(CFG, test_flag=False):
if test_flag:
pretrain_weights = None
else:
pretrain_weights = "imagenet"
model = UNet(CFG.num_classes)
#model.to(CFG.device)
return model
# - 构建loss函数
def build_loss():
BCELoss = paddle.nn.BCEWithLogitsLoss()
DiceLoss = paddleseg.models.losses.DiceLoss()
return {"BCELoss":BCELoss, "DiceLoss":DiceLoss}
# - 构建评价指标
def dice_coef(y_true, y_pred, thr=0.5, dim=(2,3), epsilon=0.001):
y_true = y_true.astype(np.float32)
y_pred = (y_pred>thr).astype(np.float32)
inter = paddle.sum((y_true*y_pred), axis=dim)
den = paddle.sum(y_true, axis=dim) + paddle.sum(y_pred, axis=dim)
dice = paddle.mean((2*inter+epsilon)/(den+epsilon), axis=(1,0))
return dice
def iou_coef(y_true, y_pred, thr=0.5, dim=(2,3), epsilon=0.001):
y_true = y_true.astype(np.float32)
y_pred = (y_pred>thr).astype(np.float32)
inter = paddle.sum((y_true*y_pred), axis=dim)
union = paddle.sum((y_true + y_pred - y_true*y_pred), axis=dim)
iou = paddle.mean((inter+epsilon)/(union+epsilon), axis=(1,0))
return iou
# - 构建训练、验证与测试函数
def train_one_epoch(model, train_loader, optimizer, losses_dict, CFG, log, epoch):
model.train()
scaler = paddle.amp.GradScaler()
losses_all, bce_all, dice_all = 0, 0, 0
log.write('---------epoch---%d---start----------' %epoch)
log.write('\n')
pbar = tqdm(enumerate(train_loader), total=len(train_loader), desc='Train ')
for _, (images, masks) in pbar:
# batch: dict_keys(['index', 'id', 'organ', 'image', 'mask'])
#optimizer.zero_grad()
optimizer.clear_grad()
#pdb.set_trace()
with paddle.amp.auto_cast(enable=True):
y_preds = model(images) # [b, c, w, h]
# pdb.set_trace()
# preds = paddle.nn.Sigmoid()(y_preds)
# pred = preds[0]
# img = images[0]
# mask = masks[0]
# show_masked_img(img, pred, mask, title='')
#visualizationShowFusion(images, y_preds, masks, "show", input_chennels=3, show=True)
#pdb.set_trace()
bce_loss = losses_dict["BCELoss"](y_preds, masks)
# dice_loss = losses_dict["DiceLoss"](y_preds, masks)
losses = bce_loss # + dice_loss
scaler.scale(losses).backward()
scaler.step(optimizer)
scaler.update()
losses_all += losses.item() / images.shape[0]
bce_all += bce_loss.item() / images.shape[0]
# dice_all += dice_loss.item() / images.shape[0]
dice_all += 0
#pdb.set_trace()
current_lr = optimizer.get_lr()
#current_lr = paddle.optimizer.get_lr()
log.write('%0.5f %d | %+5.3f %5.3f |' % (\
current_lr, epoch, bce_all, dice_all))
#log.write('\n')
print("lr: {:.4f}".format(current_lr), flush=True)
print("loss: {:.3f}, bce_all: {:.3f}, dice_all: {:.3f}".format(losses_all, bce_all, dice_all), flush=True)
@paddle.no_grad()
def valid_one_epoch(model, valid_loader, CFG, log):
model.eval()
val_scores = []
# pdb.set_trace()
pbar = tqdm(enumerate(valid_loader), total=len(valid_loader), desc='Valid ')
for _, (images, masks) in pbar:
y_preds = model(images)
y_preds = paddle.nn.Sigmoid()(y_preds) # [b, c, w, h]
# pred = y_preds[0]
# img = images[0]
# mask = masks[0]
# show_masked_img(img, pred, mask, title='')
val_dice = dice_coef(masks, y_preds).cpu().detach().numpy()
val_jaccard = iou_coef(masks, y_preds).cpu().detach().numpy()
val_scores.append([val_dice, val_jaccard])
val_scores = np.mean(val_scores, axis=0)
val_dice, val_jaccard = val_scores
#val_dice = val_scores[0].astype(np.float32)
#val_jaccard = val_scores[1].astype(np.float32)
log.write('| %+5.3f %5.3f |' % (val_dice[0], val_jaccard[0]))
log.write('\n')
print("val_dice: {:.4f}, val_jaccard: {:.4f}".format(val_dice[0], val_jaccard[0]), flush=True)
return images, y_preds, masks, val_dice, val_jaccard
@paddle.no_grad()
def test_one_epoch(ckpt_paths, test_loader, CFG):
pred_strings = []
pred_ids = []
pred_classes = []
pbar = tqdm(enumerate(test_loader), total=len(test_loader), desc='Test: ')
for _, (images) in pbar:
size = images.shape
masks = paddle.zeros((size[0], CFG.num_classes, size[2], size[3]), dtype=paddle.float32) # [b, c, w, h]
############################################
##### >>>>>>> cross validation infer <<<<<<
############################################
for fold in range(0, CFG.fold):
model = build_model(CFG, test_flag=True)
optimizer = paddle.optimizer.AdamW(learning_rate=CFG.lr, parameters=model.parameters(), weight_decay=CFG.wd)#, apply_decay_param_fun=lambda x: x in CFG.decay_params)
save_path_model = f"{CFG.ckpt_path}/best_fold_model{fold}.pdmodel"
save_path_opt = f"{CFG.ckpt_path}/best_fold_opt{fold}.pdmodel"
model.set_state_dict(paddle.load(save_path_model))
optimizer.set_state_dict(paddle.load(save_path_opt))
model.eval()
y_preds = model(images) # [b, c, w, h]
y_preds = paddle.nn.Sigmoid()(y_preds) # [b, c, w, h]
masks += y_preds/len(CFG.fold)
val_dice_test = dice_coef(masks, y_preds)
val_dice = dice_coef(masks, y_preds).cpu().detach().numpy()
val_jaccard = iou_coef(masks, y_preds).cpu().detach().numpy()
masks = (masks>CFG.thr).cpu().detach().numpy() # [n, c, h, w]
return images, masks
# - 显示图像分割结果函数
def plot_img(img, pred, mask='', img_path='', label=True, title =''):
if label:
#rescalse mask to 0-1 range regardless of min and max value
img = np.transpose(img, (1,2,0))
pred = np.transpose(pred, (1,2,0))
pred = (pred - pred.min())/(pred.max()-pred.min())
pred = np.nan_to_num(pred)
pred = paddle.to_tensor(pred)
pred = paddle.round(pred)
pred = pred.reshape(img.shape[:2])
mask = np.transpose(mask, (1,2,0))
mask = (mask - mask.min())/(mask.max()-mask.min())
mask = np.nan_to_num(mask)
mask = paddle.to_tensor(mask)
mask = paddle.round(mask)
mask = mask.reshape(img.shape[:2])
fig, ax = plt.subplots(1, 5, figsize=(15, 3))
fig.suptitle(title, fontsize=16)
img, pred, mask = img.numpy(), pred.numpy(), mask.numpy()
ax[0].imshow(mask); ax[0].set_title('Mask')
ax[1].imshow(pred); ax[1].set_title('Pred')
ax[2].imshow(img); ax[2].set_title('Image')
ax[3].imshow(color.label2rgb(mask, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
ax[3].set_title('Masked Image')
ax[4].imshow(color.label2rgb(pred, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
ax[4].set_title('Preded Image')
plt.savefig(img_path, bbox_inches='tight')
#plt.show()
else:
#rescalse mask to 0-1 range regardless of min and max value
img = np.transpose(img, (1,2,0))
pred = np.transpose(pred, (1,2,0))
pred = (pred - pred.min())/(pred.max() - pred.min())
pred = np.nan_to_num(pred)
pred = paddle.to_tensor(pred)
pred = paddle.round(pred)
pred = pred.reshape(img.shape[:2])
fig, ax = plt.subplots(1, 3, figsize=(9, 3))
fig.suptitle(title, fontsize=16)
img, pred = img.numpy(), pred.numpy()
ax[0].imshow(pred); ax[0].set_title('Pred')
ax[1].imshow(img); ax[1].set_title('Image')
ax[2].imshow(color.label2rgb(pred, img, bg_label=0, bg_color=(1.,1.,1.), alpha=0.25))
ax[2].set_title('Preded Image')
plt.savefig(img_path, bbox_inches='tight')
#plt.show()
# - 定义保存训练结果的类
class Logger(object):
def __init__(self):
self.terminal = sys.stdout #stdout
self.file = None
def open(self, file, mode=None):
if mode is None: mode ='w'
self.file = open(file, mode)
def write(self, message, is_terminal=1, is_file=1 ):
if '\r' in message: is_file=0
if is_terminal == 1:
self.terminal.write(message)
self.terminal.flush()
#time.sleep(1)
if is_file == 1:
self.file.write(message)
self.file.flush()
def flush(self):
# this flush method is needed for python 3 compatibility.
# this handles the flush command by doing nothing.
# you might want to specify some extra behavior here.
pass
# - 主程序
if __name__ == '__main__':
###############################################################
##### >>>>>>> config <<<<<<
###############################################################
class CFG:
# step1: hyper-parameter
seed = 42
device = paddle.device.set_device('gpu:0')
num_worker = 0 # 0 if debug. 16 if train by "htop" check
data_path = "/home/aistudio/data/data177948/dataset/"
ckpt_path = "/home/aistudio/work/ckpt_nonan" # for submit
# step2: data
n_fold = 4
img_size = 224
train_bs = 4
valid_bs = train_bs * 2
# step3: model
#backbone = 'resnet18'
num_classes = 1
# step4: optimizer
epoch = 5
lr = 1e-5 # learning_rate
wd = 1e-6 # weight_decay
#lr_drop = 8
# step5: infer
thr = 0.3
resume = False
set_seed(CFG.seed)
#pdb.set_trace()
if not os.path.exists(CFG.ckpt_path):
os.makedirs(CFG.ckpt_path)
########### 训练与验证阶段
train_val_flag = True
if train_val_flag:
###############################################################
##### 第0步: 数据预处理
###############################################################
df = pd.read_csv(os.path.join(CFG.data_path,'train_mask.csv'), sep='\t', names=['name', 'mask'])
# print('before:%d'%len(df))
df = df.dropna()
df = df.reset_index(drop=True)
df['image_path'] = df['name'].apply(lambda x: os.path.join(CFG.data_path, 'train', str(x)))
###############################################################
##### 第1步:交叉验证训练设置
###############################################################
kf = KFold(n_splits=CFG.n_fold, shuffle=True, random_state=CFG.seed)
df.loc[:,'fold'] = -1
for fold, (train_idx, val_idx) in enumerate(kf.split(X=df)):
df.loc[val_idx, 'fold'] = fold
log = Logger()
log.open(f"{CFG.ckpt_path}/log_train.txt",mode='a')
## start training here! ##############################################
log.write('** start training here! **\n')
log.write(' |-------- -- VALID ----------|---- TRAIN/BATCH ---------\n')
log.write('rate epoch | val_dice, val_jaccard | bce_loss, dice_all, time \n')
log.write('---------------------------------- ------------------------------------------------------\n')
start_fold = 0
if CFG.resume:
path_checkpoint = os.path.join(CFG.ckpt_path,'checkpoint.pth')
checkpoint = torch.load(path_checkpoint)
start_fold = checkpoint['fold'] # 返回到当前的fold。
for fold in range(start_fold, CFG.n_fold):
log.write('---------fold---%d------------' %fold)
###############################################################
##### 第2步:获取数据、建立模型、优化器、损失函数
###############################################################
#data_transforms = {'train':train_augment, 'valid_test': valid_augment}
#pdb.set_trace()
data_transforms = build_transforms(CFG)
train_loader, valid_loader = build_dataloader(df, fold, data_transforms, CFG) # dataset & dtaloader
model = build_model(CFG) # model
#scheduler = paddle.optimizer.lr.StepDecay(learning_rate=0.01, step_size=30, gamma=0.1, verbose=False)
optimizer = paddle.optimizer.AdamW(learning_rate=CFG.lr, parameters=model.parameters(), weight_decay=CFG.wd)#, apply_decay_param_fun=lambda x: x in CFG.decay_params)
#optimizer = paddle.optimizer.Adam(learning_rate=scheduler, parameters=model.parameters())
losses_dict = build_loss() # loss
start_epoch = 0
if CFG.resume:
# path_checkpoint = os.path.join(CFG.ckpt_path,'checkpoint.pth')
# checkpoint = torch.load(path_checkpoint)
# start_fold = checkpoint['fold'] - 1 # 返回到当前的fold。fold的起始下标为fold=start_fold+1。
start_epoch = checkpoint['epoch'] + 1 # 返回下一个epoch下标。如果保存成功的话,说明当前的epoch的训练是没有问题的。
model.set_dict(checkpoint['model'])
optimizer.set_dict(checkpoint['optimizer'])
best_val_dice = 0
best_epoch = 0
for epoch in range(start_epoch, CFG.epoch):
start_time = time.time()
###############################################################
##### 第3步:训练和交叉验证
###############################################################
train_one_epoch(model, train_loader, optimizer, losses_dict, CFG, log, epoch)
#lr_scheduler.step()
images, y_preds, masks, val_dice, val_jaccard = valid_one_epoch(model, valid_loader, CFG, log)
###############################################################
##### 第4步:保存模型参数
###############################################################
is_best = (val_dice > best_val_dice)
best_val_dice = max(best_val_dice, val_dice)
if is_best:
save_path_model = f"{CFG.ckpt_path}/best_fold_model{fold}.pdmodel"
save_path_opt = f"{CFG.ckpt_path}/best_fold_opt{fold}.pdmodel"
save_path_checkpoint = f"{CFG.ckpt_path}/checkpoint.pdmodel"
checkpoint = {
'fold':fold,
'epoch':epoch,
'model':model.state_dict(),
'optimizer':optimizer.state_dict()}
if os.path.isfile(save_path_model):
os.remove(save_path_model)
if os.path.isfile(save_path_opt):
os.remove(save_path_opt)
#保存模型的参数
paddle.save(model.state_dict(), save_path_model)
#保存优化器的参数
paddle.save(optimizer.state_dict(), save_path_opt)
#保存继续训练的模型和参数
paddle.save(checkpoint, save_path_checkpoint)
# 在每个epoch中,显示最后一个batch的图像分割结果
for idx in range(images.shape[0]):
img = images[idx]
pred = y_preds[idx]
mask = masks[idx]
img_path = f"{CFG.ckpt_path}/{fold}_{epoch}_{idx}.png"
plot_img(img, pred, mask, img_path, label=True, title='')
epoch_time = time.time() - start_time
log.write('%d %5.3f %5.3f' % (\
epoch, epoch_time, best_val_dice[0]))
log.write('\n')
log.write('---------epoch---%d--- end----------' %epoch)
log.write('\n')
print("epoch:{}, time:{:.2f}s, best:{:.2f}\n".format(epoch, epoch_time, best_val_dice[0]))
######## 测试阶段
test_flag = True
if test_flag:
set_seed(CFG.seed)
###############################################################
##### 第0步:数据预处理
###############################################################
test_df = pd.read_csv(os.path.join(CFG.data_path,'test_a_samplesubmit.csv'), sep='\t', names=['name', 'mask'])
test_df['image_path'] = test_df['name'].apply(lambda x: os.path.join(CFG.data_path, 'test_a', str(x)))
data_transforms = build_transforms(CFG)
test_dataset = build_dataset(test_df, label=False, transforms=data_transforms['valid_test'], cfg=CFG)
test_loader = DataLoader(test_dataset, batch_size=CFG.train_bs, num_workers=CFG.num_worker, shuffle=True, use_shared_memory=False, drop_last=False)
###############################################################
##### 第1步:推理
###############################################################
images, y_preds = test_one_epoch(CFG.ckpt_path, test_loader, CFG)
for idx in range(images.shape[0]):
img = images[idx]
pred = y_preds[idx]
img_path = f"{CFG.ckpt_path}/test_{idx}.png"
plot_img(img, pred, mask='', img_path=img_path, label=False, title='')
验证集的分割结果:
输出的Loss值和Dice系数:
** start training here! **
|-------- -- VALID ----------|---- TRAIN/BATCH ---------
rate epoch | val_dice, val_jaccard | bce_loss, dice_all, time
---------------------------------- ------------------------------------------------------
---------fold---0------------
W1209 10:17:34.537297 3759 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1209 10:17:34.542306 3759 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
---------epoch---0---start----------
Train : 100%|██████████| 4650/4650 [06:04<00:00, 12.77it/s]
0.00001 0 | +428.851 0.000 |lr: 0.0000
loss: 428.851, bce_all: 428.851, dice_all: 0.000
Valid : 0%| | 0/775 [00:00, ?it/s]
Valid : 100%|██████████| 775/775 [01:02<00:00, 12.42it/s]
| +0.605 0.471 |
val_dice: 0.6054, val_jaccard: 0.4714
0 431.882 0.605
---------epoch---0--- end----------
epoch:0, time:431.88s, best:0.61
---------epoch---1---start----------
Train : 0%| | 0/4650 [00:00, ?it/s]
Train : 0%| | 1/4650 [00:00<08:53, 8.71it/s]
Train : 100%|██████████| 4650/4650 [05:59<00:00, 12.94it/s]
0.00001 1 | +299.959 0.000 |lr: 0.0000
loss: 299.959, bce_all: 299.959, dice_all: 0.000
Valid : 0%| | 0/775 [00:00, ?it/s]
Valid : 0%| | 1/775 [00:00<01:22, 9.43it/s]
Valid : 100%|██████████| 775/775 [01:01<00:00, 12.53it/s]
| +0.652 0.522 |
val_dice: 0.6519, val_jaccard: 0.5219
1 427.866 0.652
---------epoch---1--- end----------
epoch:1, time:427.87s, best:0.65
---------epoch---2---start----------
Train : 0%| | 0/4650 [00:00, ?it/s]
Train : 0%| | 1/4650 [00:00<10:27, 7.40it/s]
Train : 100%|██████████| 4650/4650 [06:04<00:00, 12.77it/s]
0.00001 2 | +260.141 0.000 |lr: 0.0000
loss: 260.141, bce_all: 260.141, dice_all: 0.000
Valid : 0%| | 0/775 [00:00, ?it/s]
Valid : 100%|██████████| 775/775 [01:03<00:00, 12.21it/s]
| +0.666 0.535 |
val_dice: 0.6659, val_jaccard: 0.5353
2 433.935 0.666
---------epoch---2--- end----------
epoch:2, time:433.94s, best:0.67
---------epoch---3---start----------
Train : 0%| | 0/4650 [00:00, ?it/s]
Train : 100%|██████████| 4650/4650 [06:03<00:00, 12.80it/s]
0.00001 3 | +238.226 0.000 |lr: 0.0000
loss: 238.226, bce_all: 238.226, dice_all: 0.000
Valid : 0%| | 0/775 [00:00, ?it/s]
Valid : 100%|██████████| 775/775 [01:01<00:00, 12.66it/s]
| +0.712 0.588 |
val_dice: 0.7116, val_jaccard: 0.5882
3 431.075 0.712
---------epoch---3--- end----------
epoch:3, time:431.07s, best:0.71
---------epoch---4---start----------
Train : 0%| | 0/4650 [00:00, ?it/s]
Train : 0%| | 1/4650 [00:00<09:01, 8.59it/s]
Train : 100%|██████████| 4650/4650 [06:01<00:00, 12.86it/s]
0.00001 4 | +222.726 0.000 |lr: 0.0000
loss: 222.726, bce_all: 222.726, dice_all: 0.000
Valid : 0%| | 0/775 [00:00, ?it/s]
Valid : 100%|██████████| 775/775 [01:02<00:00, 12.44it/s]
| +0.716 0.593 |
val_dice: 0.7164, val_jaccard: 0.5932
4 430.155 0.716
---------epoch---4--- end----------
epoch:4, time:430.15s, best:0.72
## (4)测试结果
有建筑物的图分割结果:


没有建筑物的图分割结果:


# 6.总结与致谢
在本项目的开发过程中,张一乔老师给予了很多指导,使得项目顺利完成。对于一些难题,刘建建老师也给出了宝贵的指导意见。感谢白鱼等老师的帮助,感谢百度提供的算力。
通过本项目,收获挺大。收获最大的就是了解了notebook中的pdb debug,调试工具帮助我解决了不少小问题。