SSD算法TensorFlow版代码地址:https://github.com/balancap/SSD-Tensorflow
运行环境:Ubuntu 16.04
使用anaconda3安装python3.5,并安装如下模块
如需使用GPU运行,自行安装CUDA,再用anaconda安装tensorflow-gpu模块
将VOC2007格式的数据转换为TF-Records格式
运行脚本代码:
DATASET_DIR=./VOC2007/test/
OUTPUT_DIR=./tfrecords
python tf_convert_data.py \
--dataset_name=pascalvoc \
--dataset_dir=${DATASET_DIR} \
--output_name=voc_2007_train \
--output_dir=${OUTPUT_DIR}
在Ubuntu下,建立一个后缀为shell_name.sh的文件,将代码写入该文件,然后在终端上输入:
./shell_name.sh
shell_name:该脚本文件的名字
/VOC2007/test/此处改为自己的数据集
运行结果是生成一个TF-Records的集合文件,不再是分开的单个文件
该文件用于评价SSD在数据集上的表现,并且画出召回准确率曲线以及mAP(mean Average Precision)指标
运行脚本代码:
EVAL_DIR=./logs/
CHECKPOINT_PATH=./checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt
python eval_ssd_network.py \
--eval_dir=${EVAL_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2007 \
--dataset_split_name=test \
--model_name=ssd_300_vgg \
--checkpoint_path=${CHECKPOINT_PATH} \
--batch_size=1
CHECKPOINT_PATH:需要加载的模型所在路径,可以修改
EVAL_DIR:生成的评价所存放的位置
batch_size:一次处理的数据量
将caffe模型的断点(checkpoint)转为tensorflow的断点(checkpoint)
如果想跑的模型是在caffe框架下搭建的,生成的是caffe的断点,可以利用这个文件转为tensorflow的断点
运行脚本代码:
CAFFE_MODEL=./ckpts/SSD_300x300_ft_VOC0712/VGG_VOC0712_SSD_300x300_ft_iter_120000.caffemodel
python caffe_to_tensorflow.py \
--model_name=ssd_300_vgg \
--num_classes=21 \
--caffemodel_path=${CAFFE_MODEL}
该文件用于训练SSD神经网络,有很多训练过程中可选的项:
运行脚本代码:
DATASET_DIR=./tfrecords
TRAIN_DIR=./logs/
CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt
python train_ssd_network.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2012 \
--dataset_split_name=train \
--model_name=ssd_300_vgg \
--checkpoint_path=${CHECKPOINT_PATH} \
--save_summaries_secs=60 \
--save_interval_secs=600 \
--weight_decay=0.0005 \
--optimizer=adam \
--learning_rate=0.001 \
--batch_size=32
--model_name:模型名称
--save_summaries_secs
--save_interval_secs
--weight_decay
--optimizer:优化器的选择(有多种最小化损失函数的方法)
--learning_rate:学习率
--batch_size:一次可以输入到神经网络的数据量
--ssd_300_vgg.ckpt:模型名称
--logs:训练保存目录
将评估的文件与训练文件一同运行,可以实时观察:
EVAL_DIR=${TRAIN_DIR}/eval
python eval_ssd_network.py \
--eval_dir=${EVAL_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2007 \
--dataset_split_name=test \
--model_name=ssd_300_vgg \
--checkpoint_path=${TRAIN_DIR} \
--wait_for_checkpoints=True \
--batch_size=1 \
--max_num_batches=500
加载其他体系结构的网络:
DATASET_DIR=./tfrecords
TRAIN_DIR=./log/
CHECKPOINT_PATH=./checkpoints/vgg_16.ckpt
python train_ssd_network.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2007 \
--dataset_split_name=train \
--model_name=ssd_300_vgg \
--checkpoint_path=${CHECKPOINT_PATH} \
--checkpoint_model_scope=vgg_16 \
--checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
--trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
--save_summaries_secs=60 \
--save_interval_secs=600 \
--weight_decay=0.0005 \
--optimizer=adam \
--learning_rate=0.001 \
--learning_rate_decay_factor=0.94 \
--batch_size=32
trainable_scopes:可以设置训练参数的范围,比如:ssd_300_vgg/conv6,这样VGG16的参数在训练过程中不会发生改变
这段代码是说只训练SSD组件,不对其他网络进行修改
--trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \
当通过训练得到的网络有一个良好的mAP时,可以对整个网络进行微调
DATASET_DIR=./tfrecords
TRAIN_DIR=./log_finetune/
CHECKPOINT_PATH=./log/model.ckpt-N
python train_ssd_network.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2007 \
--dataset_split_name=train \
--model_name=ssd_300_vgg \
--checkpoint_path=${CHECKPOINT_PATH} \
--checkpoint_model_scope=vgg_16 \
--save_summaries_secs=60 \
--save_interval_secs=600 \
--weight_decay=0.0005 \
--optimizer=adam \
--learning_rate=0.00001 \
--learning_rate_decay_factor=0.94 \
--batch_size=32
save_interval_secs:设置多少秒保存一次训练模型
该文件用于数据增强,比如:对图像进行旋转、分辨率变化、随机裁剪等
网络参数:特征层、锚框等参数
首先要将PascalVOC格式的数据转换为tfrecord格式,这里使tf_convert_data.py将图片及其信息转为tfrecord格式
运行代码:
DATASET_DIR=VOC2007/trainval/
OUTPUT_DIR=PascalVOC/dataset
python tf_convert_data.py \
--dataset_name=pascalvoc \
--dataset_dir=${DATASET_DIR} \
--output_name=voc_2007_train \
--output_dir=${OUTPUT_DIR}
DATASET_DIR和OUTPUT_DIR参数可以修改,需要事先建好目录,否则会报错
DATASET_DIR:里面应该有PascalVOC格式的数据集
这里可能会报错:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid
修改datasets目录下的pascalvoc_to_tfrecords.py文件
将
image_data = tf.gfile.FastGFile(filename, 'r').read()
修改为:
image_data = tf.gfile.FastGFile(filename, 'rb').read()
运行脚本代码:
DATASET_DIR=./tfrecords
TRAIN_DIR=./logs/
CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt
python train_ssd_network.py \
--train_dir=${TRAIN_DIR} \
--dataset_dir=${DATASET_DIR} \
--dataset_name=pascalvoc_2012 \
--dataset_split_name=train \
--model_name=ssd_300_vgg \
--checkpoint_path=${CHECKPOINT_PATH} \
--save_summaries_secs=60 \
--save_interval_secs=600 \
--weight_decay=0.0005 \
--optimizer=adam \
--learning_rate=0.001 \
--batch_size=32
代码如下:
# coding: utf-8
import os
import math
import random
import numpy as np
import tensorflow as tf
import cv2
slim = tf.contrib.slim
import sys
sys.path.append('../')
from nets import ssd_vgg_300, ssd_common, np_methods, ssd_vgg_512
from preprocessing import ssd_vgg_preprocessing
# draw boxes
from notebooks import visualization_camera
# TensorFlow session: grow memory when needed. TF, DO NOT USE ALL MY GPU MEMORY!!!
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(log_device_placement=False, gpu_options=gpu_options)
isess = tf.InteractiveSession(config=config)
# Input placeholder. use 300*300
# net_shape = (300, 300)
net_shape = (512, 512)
data_format = 'NHWC'
img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
# Evaluation pre-processing: resize to SSD net shape.
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
img_input, None, None, net_shape, data_format, resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
image_4d = tf.expand_dims(image_pre, 0)
# Define the SSD model.
reuse = True if 'ssd_net' in locals() else None
# ssd_net = ssd_vgg_300.SSDNet()
ssd_net = ssd_vgg_512.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)
# 这里填写自己模型的加载路径与模型名字
ckpt_filename = '../checkpoints/model/model.ckpt-314913'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)
# SSD default anchor boxes.
ssd_anchors = ssd_net.anchors(net_shape)
# Main image processing routine.两个关键参数:选择阈值、非极大抑制值
def process_image(img, select_threshold=0.8, nms_threshold=.2, net_shape=(512, 512)):
# def process_image(img, select_threshold=0.5, nms_threshold=.45, net_shape=(300, 300)):
# Run SSD network.
rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
feed_dict={img_input: img})
# Get classes and bboxes from the net outputs.
rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
rpredictions, rlocalisations, ssd_anchors,
select_threshold=select_threshold, img_shape=net_shape, num_classes=2, decode=True)
rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
# Resize bboxes to original image shape. Note: useless for Resize.WARP!
rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
return rclasses, rscores, rbboxes
# following are added for picture
path = 'D:/DeepLearning/VOC2007/train/JPEGImages'
files = os.listdir(path) # 得到文件夹下所有文件名称
i = 1
delay = 1
for pictrue_name in files:
image_np = cv2.imread(os.path.join(path, pictrue_name))
#image_np = cv2.resize(src, (width, height), interpolation=cv2.INTER_CUBIC)
# Actual detection.
rclasses, rscores, rbboxes = process_image(image_np)
# Visualization of the results of a detection.
visualization_camera.bboxes_draw_on_img(image_np, rclasses, rscores, rbboxes)
pic_name = "D:/DeepLearning/SSD-Tensorflow/picture_process1/" + str(i) + ".jpg"
cv2.imshow('Detecting Test of Vehicle', image_np)
#cv2.imwrite(pic_name, image_np)
# delay delay ms
cv2.waitKey(delay)
print('Ongoing...', i)
i = i + 1
cv2.destroyAllWindows()
代码如下:
# coding: utf-8
import os
import math
import random
import numpy as np
import tensorflow as tf
import cv2
slim = tf.contrib.slim
import sys
sys.path.append('../')
from nets import ssd_vgg_300, ssd_common, np_methods, ssd_vgg_512
from preprocessing import ssd_vgg_preprocessing
# draw boxes
from notebooks import visualization_camera
# TensorFlow session: grow memory when needed. TF, DO NOT USE ALL MY GPU MEMORY!!!
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(log_device_placement=False, gpu_options=gpu_options)
isess = tf.InteractiveSession(config=config)
# Input placeholder. use 300*300
#net_shape = (300, 300)
net_shape = (512, 512)
data_format = 'NHWC'
img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
# Evaluation pre-processing: resize to SSD net shape.
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
img_input, None, None, net_shape, data_format, resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
image_4d = tf.expand_dims(image_pre, 0)
# Define the SSD model.
reuse = True if 'ssd_net' in locals() else None
#ssd_net = ssd_vgg_300.SSDNet()
ssd_net = ssd_vgg_512.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)
# Restore SSD model.
ckpt_filename = '../checkpoints/model/model.ckpt-314913'
#ckpt_filename = '../checkpoints/model11/model.ckpt-75822'
#ckpt_filename = '../checkpoints/model9/model.ckpt-7171'
#ckpt_filename = '../checkpoints/model7/model.ckpt-92084'
# ckpt_filename = '../checkpoints/model6/model.ckpt-72392'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)
# SSD default anchor boxes.
ssd_anchors = ssd_net.anchors(net_shape)
# Main image processing routine.
#NMS——非极大值抑制
def process_image(img, select_threshold=0.8, nms_threshold=.2, net_shape=(512, 512)):
#def process_image(img, select_threshold=0.5, nms_threshold=.45, net_shape=(300, 300)):
# Run SSD network.
rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
feed_dict={img_input: img})
# Get classes and bboxes from the net outputs.
rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
rpredictions, rlocalisations, ssd_anchors,
select_threshold=select_threshold, img_shape=net_shape, num_classes=2, decode=True)
rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
# Resize bboxes to original image shape. Note: useless for Resize.WARP!
rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
return rclasses, rscores, rbboxes
# following are added for camera demo
cap = cv2.VideoCapture(r'1.mp4')
fps = cap.get(cv2.CAP_PROP_FPS)
# number_of_frames = cap.get(cv2.CV_CAP_PROP_FRAME_COUNT)
size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = cap.get(cv2.CAP_PROP_FOURCC)
print('fps=%d,size=%r,fourcc=%r' % (fps, size, fourcc))
#delay = int(30/int(fps))
delay=25
i = 1
# picture is too large
width = int(size[0])
height = int(size[1])
#width = 300
#height = 300
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True and i > 3296:
image_np = cv2.resize(frame, (width, height), interpolation=cv2.INTER_CUBIC)
# Actual detection.
rclasses, rscores, rbboxes = process_image(image_np)
# Visualization of the results of a detection.
visualization_camera.bboxes_draw_on_img(image_np, rclasses, rscores, rbboxes)
pic_name = "D:/DeepLearning/SSD-Tensorflow/picture_process/" + str(i) + ".jpg"
cv2.imshow('Detecting Test of Vehicle', image_np)
#cv2.imwrite(pic_name, image_np)
# delay delay ms
cv2.waitKey(delay)
print('Ongoing...', i)
i = i + 1
cap.release()
cv2.destroyAllWindows()