采用VOC数据集训练Deeplab V3

1. DeepLab介绍

DeepLab 是一种用于图像语义分割的顶尖深度学习模型，其目标是将语义标签（如人、狗、猫等）分配给输入图像的每个像素。目前来说，在图像语义分割上，DeepLabv3+ 已是业内顶尖水准。
语义图像分割（Semantic Image Segmentation）是为图像中的每个像素分配一个语义标签（如「路」、「天」、「人」、「狗」）的任务，能应用于新的应用程序中，因此比其他视觉实体识别任务（例如图像分类或边框检测）有着更严格的定位精度要求。

可以从https://github.com/tensorflow/models下载modelmaster
本文中将其解压在E:\models-master目录底下。

2. DeepLab效果预览

google公开了在 Pascal VOC 2012 和 Cityscapes数据集中上语义分割任务上预训练过的模型。在deeplab目录底下提供了deeplab_demo.ipynb，先操练一下，先看看能达到什么效果:
在E:\models-master目录下，输入jupyter-notebook
即

E:\models-master>jupyter-notebook

点击运行deeplab_demo.ipynb。

其默认的例子显示分割图片效果如下:

默认的图片是从网上动态下载的。

现在测试一下本地图片。
屏蔽Run on sample images原有的代码,改成

IMAGE_PATH = "E:/pets.test/images/Abyssinian_13.jpg"

def run_visualization(path):
    oringnal_im = Image.open(path)
    print('running deeplab on image %s...' % path)
    resized_im, seg_map = MODEL.run(oringnal_im)
    vis_segmentation(resized_im, seg_map)

run_visualization(IMAGE_PATH)

从PETS数据集中，抽取一只猫的图片:

效果很不错
再从网上随便找人和动物在一起的图片

略有瑕疵，女孩的手被当作猫的一部分了。
再来一张背景比较接近的

猫的白爪和雪地颜色比较接近，没有区分出来，不过也很不错了。

3. 准备工作

为了避免不必要的麻烦，先说一下我的系统情况:
Python 3.6Tensorflow 1.10 windows10(64Bit) 8G内存
显卡: GTX 750Ti (2G显存)

3.1 下载modelmaster

https://github.com/tensorflow/models
将其解压在E:\models-master

3.2 下载数据集

PASCAL VOC为图像识别和分类提供了一整套标准化的优秀的数据集，此数据集可以用于图像分类、目标检测、图像分割。
在此采用VOCtrainval_11-May-2012.tar 1.86 GB (1,999,639,040 字节)。
将其解压在E:\models-master\research\deeplab\datasets\pascal_voc_seg\VOCdevkit
即在research\deeplab\datasets目录底下，建立pascal_voc_seg子目录，将文件解压于此。

3.3 下载预训练模型

有很多训练过的模型，可以在https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md看到相应的列表。
在此采用的是xception65_coco_voc_trainval
下载文件:deeplabv3_pascal_trainval_2018_01_04.tar.gz
439 MB (460,669,898 字节),并解压到E:\models-master\research\deeplab目录底下。
一共是3个文件:

frozen_inference_graph.pb
model.ckpt.data-00000-of-00001
model.ckpt.index

注意：从文档中可以看到，模型训练时采用CPU E5-1650 v3 @ 3.50GHz and 32GB memory,我们在训练时，如果配置不同，可能需要修改。

4. 数据生成

可以看到，deeplab提供了download_and_convert_voc2012.sh脚本，用于下载和生成相应的TFRecord，但是在Windows底下没有这么幸福，必须改造（顺带说明一下，在windows下批处理的多行连接符是^)。

4.1 remove_gt_colormap

在E:\models-master\research\deeplab\datasets目录底下，建立批处理文件X-remove_gt_colormap.bat:

echo "Removing the color map in ground truth annotations..."
python remove_gt_colormap.py ^
--original_gt_folder="./pascal_voc_seg/VOCdevkit/VOC2012/SegmentationClass" ^
--output_dir="./pascal_voc_seg/VOCdevkit/VOC2012/SegmentationClassRaw"
PAUSE

然后运行它。

4.2 build_voc2012_data

在pascal_voc_seg目录底下，建立tfrecord子目录，作为输出路径用。
在E:\models-master\research\deeplab\datasets目录底下，建立批处理文件X-build_voc2012_data.bat:

echo "Converting PASCAL VOC 2012 dataset..."
python build_voc2012_data.py ^
  --image_folder=".\pascal_voc_seg\VOCdevkit\VOC2012\JPEGImages" ^
  --semantic_segmentation_folder="./pascal_voc_seg/VOCdevkit/VOC2012/SegmentationClassRaw" ^
  --list_folder="./pascal_voc_seg/VOCdevkit/VOC2012/ImageSets/Segmentation" ^
  --image_format="jpg" ^
  --output_dir="./pascal_voc_seg/tfrecord"
PAUSE

上面的路径分隔符，正斜和反斜关系不大。

然后运行它。运行几分钟以后，在pascal_voc_seg\tfrecord下面产生这些文件

FileSize	FileName
34,422,451	train-00000-of-00004.tfrecord
45,062,917	train-00001-of-00004.tfrecord
45,477,554	train-00002-of-00004.tfrecord
41,047,929	train-00003-of-00004.tfrecord
69,318,720	trainval-00000-of-00004.tfrecord
88,699,697	trainval-00001-of-00004.tfrecord
91,606,236	trainval-00002-of-00004.tfrecord
82,750,941	trainval-00003-of-00004.tfrecord
34,894,757	val-00000-of-00004.tfrecord
43,608,555	val-00001-of-00004.tfrecord
46,043,634	val-00002-of-00004.tfrecord
41,817,797	val-00003-of-00004.tfrecord

5. 训练评估

5.1 train

在E:\models-master\research\deeplab目录底下建立批处理文件X-Train.bat:

python train.py  ^
    --logtostderr  ^
    --training_number_of_steps=30000  ^
    --train_split="train"  ^
    --model_variant="xception_65"  ^
    --atrous_rates=6  ^
    --atrous_rates=12  ^
    --atrous_rates=18  ^
    --output_stride=16  ^
    --decoder_output_stride=4  ^
    --train_crop_size=321  ^
    --train_crop_size=321  ^
    --train_batch_size=1 ^
    --fine_tune_batch_norm=False ^
    --dataset="pascal_voc_seg" ^
    --tf_initial_checkpoint=.\model.ckpt ^
    --train_logdir=../deeplab ^
    --dataset_dir=./datasets/pascal_voc_seg/tfrecord

这个地方有几点需要说明:

training_number_of_steps 在文档中提供的数值是30000，如果想简化处理，可以取较小值，比如1000，但是那样效果会差一些。
train_logdir 运行以后，输出的模型存放此处
train_crop_size在文档中提供的数值是513，这需要比较大的内存，在此调小，以适用我当前机器情况。网上有几篇文章提到需要大于300,暂时没有找到出处。在这个网站中
https://github.com/tensorflow/models/issues/3939
提到

During eval, we always do whole-image inference, meaning you need to set eval_crop_size >= largest image dimension.
We always set crop_size = output_stride * k + 1, where k is an integer. When working on PASCAL images, the largest dimension is 512. Thus, we set crop_size = 513 = 16 * 32 + 1 > 512. Similarly, we set eval_crop_size = 1025x2049 for Cityscapes images.

train_batch_size，因为机器内存不够，所以将它改为最小值，即train_batch_size=1。
fine_tune_batch_norm，因为train_batch_size=1原因，所以将它改为False。从代码中可以看出原因:

# Set to True if one wants to fine-tune the 
#    batch norm parameters in DeepLabv3.
# Set to False and use small batch size to 
#    save GPU memory.
flags.DEFINE_boolean('fine_tune_batch_norm', 
          True,
           'Fine tune the batch norm parameters or not.')

运行大体是这样:

INFO:tensorflow:global step 29970: loss = 0.1446 (0.680 sec/step)
INFO:tensorflow:global step 29980: loss = 0.2215 (0.696 sec/step)
INFO:tensorflow:global step 29990: loss = 0.2202 (0.711 sec/step)
INFO:tensorflow:global step 30000: loss = 0.1596 (0.701 sec/step)
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.

训练完成以后，会形成下列文件

checkpoint
graph.pbtxt
model.ckpt-30000.data-00000-of-00001
model.ckpt-30000.index
model.ckpt-30000.meta
...

5.2 eval

由于代码逻辑原因，eval.py在运行行会一直检测等待，
INFO:tensorflow:Waiting for new checkpoint at ./model.ckpt...
这个不是我们希望的结果，我们在此只想运行一次，看到结果即可，需要修改eval.py，找到163行，进行置换:

#    slim.evaluation.evaluation_loop(
#        master=FLAGS.master,
#        checkpoint_dir=FLAGS.checkpoint_dir,
#        logdir=FLAGS.eval_logdir,
#        num_evals=num_batches,
#        eval_op=list(metrics_to_updates.values()),
#        max_number_of_evaluations=num_eval_iters,
#        eval_interval_secs=FLAGS.eval_interval_secs)

    slim.evaluation.evaluate_once(
       master=FLAGS.master,
       checkpoint_path=FLAGS.checkpoint_dir,
       logdir=FLAGS.eval_logdir,
       num_evals=num_batches,
       )

然后在E:\models-master\research\deeplab目录底下建立批处理文件X-Eval.bat:

python eval.py ^
    --logtostderr ^
    --eval_split="val" ^
    --model_variant="xception_65" ^
    --atrous_rates=6 ^
    --atrous_rates=12 ^
    --atrous_rates=18 ^
    --output_stride=16 ^
    --decoder_output_stride=4 ^
    --train_crop_size=513  ^
    --train_crop_size=513  ^
    --dataset="pascal_voc_seg" ^
    --checkpoint_dir=./model.ckpt-30000 ^
    --eval_logdir=../deeplab ^
    --dataset_dir=./datasets/pascal_voc_seg/tfrecord

5.3 vis

和eval类似，由于代码逻辑原因，vis.py在运行时会一直检测等待，需要修改vis.py，找到280行，进行置换:

#      last_checkpoint = slim.evaluation.wait_for_new_checkpoint(
#          FLAGS.checkpoint_dir, last_checkpoint)
      last_checkpoint = FLAGS.checkpoint_dir

然后在E:\models-master\research\deeplab目录底下建立批处理文件X-Vis.bat:

python vis.py ^
    --logtostderr ^
    --vis_split="val" ^
    --model_variant="xception_65" ^
    --atrous_rates=6 ^
    --atrous_rates=12 ^
    --atrous_rates=18 ^
    --output_stride=16 ^
    --decoder_output_stride=4 ^
    --vis_crop_size=513 ^
    --vis_crop_size=513 ^
    --dataset="pascal_voc_seg" ^
    --checkpoint_dir=.\model.ckpt-30000 ^
    --vis_logdir=../deeplab ^
    --dataset_dir=./datasets/pascal_voc_seg/tfrecord


PAUSE

运行时，它会在当前目录，即E:\models-master\research\deeplab产生

raw_segmentation_results
segmentation_results
其中segmentation_results就是所输出的结果。

在我的机器上，平均2幅/秒，真够慢的。

5.4 export_model

在E:\models-master\research\deeplab目录底下建立批处理文件X-export_model.bat:

python export_model.py ^
    --checkpoint_path=.\model.ckpt-300 ^
    --export_path=.\XOut\frozen_inference_graph.pb ^
    --model_variant="xception_65" ^
    --atrous_rates=6 ^
    --atrous_rates=12 ^
    --atrous_rates=18 ^
    --output_stride=16 ^
    --decoder_output_stride=4

PAUSE

运行后，会产生目录XOut,并且有生成的模型文件frozen_inference_graph.pb。

不知为什么，export_model参数稍多一些，反而会出错。

5.5 模型测试

为了测试方便，比照deeplab_demo.ipynb，照虎画猫，编一个单独测试程序，使用起来方便一些（不太喜欢在Jupyter中连续按Shiter+Enter，感觉有点Low)：
在E:\models-master\research\deeplab目录底下建立Python文件X_visualization.py:

# -*- coding: utf-8 -*-
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
from PIL import Image

import tensorflow as tf

#这个地方指定输出的模型路径
TEST_PB_PATH    = 'XOut/frozen_inference_graph.pb'

#这个地方指定需要测试的图片
TEST_IMAGE_PATH = "E:/Abyssinian_13.jpg"


class DeepLabModel(object):
  """Class to load deeplab model and run inference."""
  INPUT_TENSOR_NAME  = 'ImageTensor:0'
  OUTPUT_TENSOR_NAME = 'SemanticPredictions:0'
  INPUT_SIZE         = 513
  FROZEN_GRAPH_NAME  = 'frozen_inference_graph'

  def __init__(self):      
    """Creates and loads pretrained deeplab model."""
    self.graph = tf.Graph()

    graph_def = None

    with open(TEST_PB_PATH, 'rb') as fhandle:
        graph_def = tf.GraphDef.FromString(fhandle.read())


    if graph_def is None:
      raise RuntimeError('Cannot find inference graph in tar archive.')

    with self.graph.as_default():
      tf.import_graph_def(graph_def, name='')

    self.sess = tf.Session(graph=self.graph)

  def run(self, image):
    """Runs inference on a single image.

    Args:
      image: A PIL.Image object, raw input image.

    Returns:
      resized_image: RGB image resized from original input image.
      seg_map: Segmentation map of `resized_image`.
    """
    width, height = image.size
    resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)
    target_size = (int(resize_ratio * width), int(resize_ratio * height))
    resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
    batch_seg_map = self.sess.run(
        self.OUTPUT_TENSOR_NAME,
        feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
    seg_map = batch_seg_map[0]
    return resized_image, seg_map


def create_pascal_label_colormap():
  """Creates a label colormap used in PASCAL VOC segmentation benchmark.

  Returns:
    A Colormap for visualizing segmentation results.
  """
  colormap = np.zeros((256, 3), dtype=int)
  ind = np.arange(256, dtype=int)

  for shift in reversed(range(8)):
    for channel in range(3):
      colormap[:, channel] |= ((ind >> channel) & 1) << shift
    ind >>= 3

  return colormap


def label_to_color_image(label):
  """Adds color defined by the dataset colormap to the label.

  Args:
    label: A 2D array with integer type, storing the segmentation label.

  Returns:
    result: A 2D array with floating type. The element of the array
      is the color indexed by the corresponding element in the input label
      to the PASCAL color map.

  Raises:
    ValueError: If label is not of rank 2 or its value is larger than color
      map maximum entry.
  """
  if label.ndim != 2:
    raise ValueError('Expect 2-D input label')

  colormap = create_pascal_label_colormap()

  if np.max(label) >= len(colormap):
    raise ValueError('label value too large.')

  return colormap[label]


def vis_segmentation(image, seg_map):
  """Visualizes input image, segmentation map and overlay view."""
  plt.figure(figsize=(15, 5))
  grid_spec = gridspec.GridSpec(1, 4, width_ratios=[6, 6, 6, 1])

  plt.subplot(grid_spec[0])
  plt.imshow(image)
  plt.axis('off')
  plt.title('input image')

  plt.subplot(grid_spec[1])
  seg_image = label_to_color_image(seg_map).astype(np.uint8)
  plt.imshow(seg_image)
  plt.axis('off')
  plt.title('segmentation map')

  plt.subplot(grid_spec[2])
  plt.imshow(image)
  plt.imshow(seg_image, alpha=0.7)
  plt.axis('off')
  plt.title('segmentation overlay')

  unique_labels = np.unique(seg_map)
  ax = plt.subplot(grid_spec[3])
  plt.imshow(
      FULL_COLOR_MAP[unique_labels].astype(np.uint8), interpolation='nearest')
  ax.yaxis.tick_right()
  plt.yticks(range(len(unique_labels)), LABEL_NAMES[unique_labels])
  plt.xticks([], [])
  ax.tick_params(width=0.0)
  plt.grid('off')
  plt.show()


LABEL_NAMES = np.asarray([
    'background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
    'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',
    'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tv'
])

FULL_LABEL_MAP = np.arange(len(LABEL_NAMES)).reshape(len(LABEL_NAMES), 1)
FULL_COLOR_MAP = label_to_color_image(FULL_LABEL_MAP)


MODEL = DeepLabModel()
print('model loaded successfully!')


#------------------------------------

def run_visualization(path):
    oringnal_im = Image.open(path)
    print('running deeplab on image %s...' % path)
    resized_im, seg_map = MODEL.run(oringnal_im)
    vis_segmentation(resized_im, seg_map)

run_visualization(TEST_IMAGE_PATH)

运行它:

model loaded successfully!
running deeplab on image E:/Abyssinian_13.jpg...

效果不错，收工。

6. 其它

注意到上面train过程中train_crop_size=321，而在eval和vis中，仍然延续文档中提供的513参数。由于时间原因，没有仔细去分析有何影响。