语义分割网络对图像中的每个像素进行分类,从而对图像进行分割。语义分割的应用包括用于自动驾驶的道路分割和用于医疗诊断的癌细胞分割。本文展示了如何使用 MATLAB 训练语义分割网络 Deeplab v3+,实现了自动驾驶场景下的全景分割。
本例使用剑桥大学的CamVid数据集进行训练。这个数据集是一个图像集合,包含驾驶时获得的街道视图。该数据集提供了32个语义类的像素级标签,包括汽车、行人和道路。如下图所示:
这个例子创建了Deeplab v3+网络,权值由预先训练的Resnet-18网络初始化。ResNet-18是一个高效的网络,非常适合处理资源有限的应用程序。根据应用需求,还可以使用其他预先训练过的网络,如MobileNet v2或ResNet-50。
使用ResNet-18前,需要打开附加功能资源管理器,并点击安装Deep Learning Toolbox Model for ResNet-18 Network。
if ~exist('pretrainedNetwork/deeplabv3plusResnet18CamVid.mat','file')
disp('Downloading pretrained network (58 MB)...');
pretrainedURL = 'https://www.mathworks.com/supportfiles/vision/data/deeplabv3plusResnet18CamVid.mat';
websave('pretrainedNetwork/deeplabv3plusResnet18CamVid.mat', pretrainedURL);
end
本次实验使用的是CamVid数据集,下面我们进行下载和解压:
imageURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/files/701_StillsRaw_full.zip';
labelURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/data/LabeledApproved_full.zip';
outputFolder = 'CamVid';
labelsZip = fullfile(outputFolder,'labels.zip');
imagesZip = fullfile(outputFolder,'images.zip');
if ~exist(outputFolder,'file')
mkdir(outputFolder);
end
if ~exist(labelsZip, 'file')
disp('Downloading 16 MB CamVid dataset labels...');
websave(labelsZip, labelURL);
disp('Complete Download CamVid dataset labels!');
unzip(labelsZip, fullfile(outputFolder,'labels'));
disp('Complete Unzip CamVid dataset labels!');
end
if ~exist(imagesZip,'file')
disp('Downloading 557 MB CamVid dataset images...');
websave(imagesZip, imageURL);
disp('Complete Download CamVid dataset images!');
unzip(imagesZip, fullfile(outputFolder,'images'));
disp('Complete Unzip CamVid dataset images!');
end
imgDir = fullfile(outputFolder,'images');
imds = imageDatastore(imgDir);
Img = readimage(imds,559);
Img = histeq(Img);
imshow(Img)
classes = [
"Sky"
"Building"
"Pole"
"Road"
"Pavement"
"Tree"
"SignSymbol"
"Fence"
"Car"
"Pedestrian"
"Bicyclist"
];
% 标签转换
labelIDs = camvidPixelLabelIDs();
% 使用类和标签id创建pixelLabelDatastore
labelDir = fullfile(outputFolder,'labels');
pxds = pixelLabelDatastore(labelDir,classes,labelIDs);
[imdsTrain, imdsVal, pxdsTrain, pxdsVal] = partitionCamVidData(imds, pxds);
% 输入图像尺寸
imageSize = [720 960 3];
% 类别数
numClasses = numel(classes);
% Create DeepLab v3+.
lgraph = deeplabv3plusLayers(imageSize, numClasses, "resnet18");
tbl = countEachLabel(pxds);
imageFreq = tbl.PixelCount ./ tbl.ImagePixelCount;
classWeights = median(imageFreq) ./ imageFreq;
pxLayer = pixelClassificationLayer('Name','labels','Classes',tbl.Name,'ClassWeights',classWeights);
lgraph = replaceLayer(lgraph,"classification",pxLayer);
dsVal = combine(imdsVal,pxdsVal);
dsTrain = combine(imdsTrain, pxdsTrain);
% Define training options.
options = trainingOptions('sgdm', ...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',10,...
'LearnRateDropFactor',0.3,...
'Momentum',0.9, ...
'InitialLearnRate',1e-3, ...
'L2Regularization',0.005, ...
'ValidationData',dsVal,...
'MaxEpochs',30, ...
'MiniBatchSize',8, ...
'Shuffle','every-epoch', ...
'CheckpointPath', tempdir, ...
'VerboseFrequency',2,...
'Plots','training-progress',...
'ValidationPatience', 4);
doTraining = false;
if doTraining
[net, info] = trainNetwork(dsTrain,lgraph,options);
else
data = load(pretrainedNetwork);
net = data.net;
end
I = readimage(imdsVal,35);
C = semanticseg(I, net);
cmap = camvidColorMap;
B = labeloverlay(I,C,'Colormap',cmap,'Transparency',0.4);
imshow(B);
pixelLabelColorbar(cmap, classes);