如下图所示,根据自己制定的“Deep Learning”学习计划,11月份的主要任务是:熟悉各大DL网络模型,主要以分类和检测为主;看论文;熟悉病理数据等。我们有一个2人组的小分队,我这个月的主要工作集中在学习目标检测的经典算法以及基于tensorflow或者keras跑一些经典的案例,主要有R-CNN,SPP-Net,Fast-RCNN,Faster-RCNN,YOLO等;另一名成员主要学习分类相关的经典网络模型,主要是google-net一系列的模型(inception-v1,inception-v2,inception-v3,resnet 等)。我们分别要整理出一份关于检测和分类的详细报告,然后不断完善、互相交流讨论、分享,发挥小分队的优势。
本文不打算介绍理论知识,网上有很多整理好的资源,如果后续有更深刻的理解,再另写博文整理。本文主要整理了Faster-RCNN的实践,使用了resnet网络参数做与训练、KITTI数据集做 fine-tuning。最近在 github 上找到一位乐于开源的大佬,这个大佬主要也是使用tensorflow和keras框架实现一些深度学习的案例。其中,本文的Faster-RCNN实践也是基于这位大佬开源的源码进行整理和归纳。本文的目的是,有需要的读者看到这篇博文,然后跟着做就能够把代码跑通,感受一下Faster-RCNN的魅力所在。如果读者在实践的过程中遇到任何问题,欢迎留言,我也会尽力跟大家一起解决问题。
本文假设实践的读者们已经搭好了 tensorflow 和 keras 框架。如果没有的话,可以参考如下几篇博文:
1)http://blog.csdn.net/houchaoqun_xmu/article/details/72461592
2)http://blog.csdn.net/houchaoqun_xmu/article/details/78508783
建议读者们使用 python3,tensorflow(>1.1),keras(2.0.9) 等环境,实践本文提供的Faster-RCNN案例。
github 源码地址:https://github.com/Houchaoqun/keras_frcnn
TFFRCNN:https://github.com/CharlesShang/TFFRCNN
KITTI Datasets:http://www.cvlibs.net/datasets/kitti/index.php
h5 - 模型参数(inception-v3,resnet50,VGG16,VGG19):http://pan.baidu.com/s/1dET5J7z 密码:hdp9
"Can't open attribute (can't locate attribute: 'layer_names')":http://blog.csdn.net/dugudaibo/article/details/78008918
深度学习与计算机视觉 看这一篇就够了:http://blog.csdn.net/u012507022/article/details/51441629
1)下载 github 源码:
git clone https://github.com/Houchaoqun/keras_frcnn
2)下载 KITTI 数据集:
- 训练数据标签:http://kitti.is.tue.mpg.de/kitti/data_object_label_2.zip
- 训练数据图像:http://kitti.is.tue.mpg.de/kitti/data_object_image_2.zip
3)下载模型参数并存放到如下路径(需新建 model 文件夹),本文使用的是 resnet50 模型的参数:
./model/resnet50_weights_tf_dim_ordering_tf_kernels.h5 # 上文有提供一些模型参数的下载地址
4)根据源码,创建存放数据的文件夹(注意区分大小写):
./media/jintian/Netac/Datasets/Kitti/object/training/image_2
./media/jintian/Netac/Datasets/Kitti/object/training/label_2
5)将下载好的KITTI数据图像和标签分别存放在对应的路径下(根据源码而定),本文的路径示例如下所示:
./media/jintian/Netac/Datasets/Kitti/object/training/image_2/002468.png # 训练数据图像
./media/jintian/Netac/Datasets/Kitti/object/training/label_2/002468.png # 训练数据标签
hcq@hcq-home:~/document/deepLearning/github/keras_frcnn$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.1.0'
>>> import keras
Using TensorFlow backend.
... ...
... ...
>>> keras.__version__
'2.0.9'
要注意各个框架、工具的版本,不然有可能会报错。比如keras的版本2.1.1时,就会报如下错误:
Error when checking target: expected rpn_out_class to have shape (None, None
KITTI 是一个测试交通场景中车辆检测,车辆追踪,语义分割等算法的公开数据集。现在测试自动驾驶等车辆识别算法的,都用这个数据集。
KITTI 主页链接:http://www.cvlibs.net/datasets/kitti/
训练数据图像和训练数据标签下载如下图所示:
本文提供可以直接下载的链接(官网需要提供邮箱,然后再将下载链接发至对应的邮箱):
1)训练数据标签:http://kitti.is.tue.mpg.de/kitti/data_object_label_2.zip
2)训练数据图像:http://kitti.is.tue.mpg.de/kitti/data_object_image_2.zip
hcq@hcq-home:~/document/deepLearning/github/keras_frcnn$ tree -L 2
.
├── config.pickle
├── extract_featuremap.py
├── generate_simple_kitti_anno_file.py
├── images
│ ├── 000000.png
│ ├── 000001.png
│ ├── 000002.png
│ ├── 000003.png
│ ├── 000004.png
│ ├── 000005.png
│ ├── 000006.png
│ ├── 000007.png
│ ├── 000008.png
│ ├── 000009.png
│ ├── 000010.png
│ ├── 000011.png
│ ├── 000012.png
│ ├── 000013.png
│ ├── 000014.png
│ └── 000015.png
├── keras_frcnn
│ ├── config.py
│ ├── config.pyc
│ ├── data_augment.py
│ ├── data_augment.pyc
│ ├── data_generators.py
│ ├── data_generators.pyc
│ ├── fixed_batch_normalization.py
│ ├── fixed_batch_normalization.pyc
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── losses.py
│ ├── losses.pyc
│ ├── pascal_voc_parser.py
│ ├── __pycache__
│ ├── resnet.py
│ ├── resnet.pyc
│ ├── roi_helpers.py
│ ├── roi_helpers.pyc
│ ├── roi_pooling_conv.py
│ ├── roi_pooling_conv.pyc
│ ├── simple_parser.py
│ ├── simple_parser.pyc
│ ├── vgg.py
│ ├── visualize.py
│ └── visualize.pyc
├── kitti_simple_label-backup.txt
├── kitti_simple_label.txt
├── measure_map.py
├── media
│ ├── jintian
│ └── tri_images
├── model
│ ├── inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5
│ ├── inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5
│ ├── kitti_frcnn_last.hdf5
│ └── resnet50_weights_tf_dim_ordering_tf_kernels.h5
├── README.md
├── requirements.txt
├── results_images
│ └── backup-images
├── test_frcnn_kitti.py
└── train_frcnn_kitti.py
9 directories, 54 files
注:从上述的代码结构可以看出,./model 目录下存放了几个"XX.h5"和1个"XX.hdf5"模型参数。本文使用了“inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5”作为模型的初始化参数(需要自行下载并放到对应的目录下,上文提供了下载链接),“Kitti_frcnn_last.hdf5”是用来存储模型使用KITTI数据集进行fine-tuning得到的参数。
# -*- encoding: utf-8 -*-
from keras import backend as K
class Config:
def __init__(self):
self.verbose = True
# 使用resnet50做预训练
self.network = 'resnet50'
# setting for data augmentation
self.use_horizontal_flips = False
self.use_vertical_flips = False
self.rot_90 = False
# 配置 faster-rcnn 参数
# anchor box scales
self.anchor_box_scales = [128, 256, 512]
# anchor box ratios
self.anchor_box_ratios = [[1, 1], [1, 2], [2, 1]]
# size to resize the smallest side of the image
self.im_size = 600
# image channel-wise mean to subtract
self.img_channel_mean = [103.939, 116.779, 123.68]
self.img_scaling_factor = 1.0
# number of ROIs at once
self.num_rois = 4
# stride at the RPN (this depends on the network configuration)
self.rpn_stride = 16
self.balanced_classes = False
# scaling the stdev
self.std_scaling = 4.0
self.classifier_regr_std = [8.0, 8.0, 4.0, 4.0]
# overlaps for RPN
self.rpn_min_overlap = 0.3
self.rpn_max_overlap = 0.7
# overlaps for classifier ROIs
self.classifier_min_overlap = 0.1
self.classifier_max_overlap = 0.5
# placeholder for the class mapping, automatically generated by the parser
self.class_mapping = None
# location of pretrained weights for the base network
# 设置模型预训练的参数,本文主要使用如下路径:
# 1)'./model/resnet50_weights_tf_dim_ordering_tf_kernels.h5'
# 2)'./model/kitti_frcnn_last.hdf5'
self.model_path = './model/kitti_frcnn_last.hdf5'
self.data_dir = '.data/'
# 设置模型参数:
self.num_epochs = 3000
# 指定标签存储文件
self.kitti_simple_label_file = 'kitti_simple_label.txt'
# TODO: this field is set to simple_label txt, which in very simple format like:
# TODO: /path/image_2/000000.png,712.40,143.00,810.73,307.92,Pedestrian, see kitti_simple_label.txt for detail
self.simple_label_file = 'simple_label.txt'
self.config_save_file = 'config.pickle'
num_epochs = 3000 表示模型需要将数据训练3000轮次,本文使用1080Ti的GPU训练需要好几天,因此将 num_epochs 设置成 200 也可以达到不错的效果。你可以根据自己的需求调整参数。
1)完成前期准备的相关工作
2)执行如下操作,生成 kitti_simple_label.txt 标签文件(目录文件可根据实际情况修改)
python generate_simple_kitti_anno_file.py \
./data/training/image_2 \
./data/training/label_2
执行成功后,提示如下所示:
lab406@lab406-yang:~/hcq/deep_learning/github/keras_frcnn$ python generate_simple_kitti_anno_file.py \
> ./data/training/image_2 \
> ./data/training/label_2
got 7481 label files.
convert finished.
kitti_simple_label.txt 文件格式如下所示:
./data/training/image_2/000090.png,5.08,199.56,126.68,269.46,Car
./data/training/image_2/005525.png,585.62,176.58,602.32,189.82,Car
./data/training/image_2/005525.png,475.73,177.95,510.45,202.70,Car
./data/training/image_2/005525.png,531.44,176.06,546.07,188.60,DontCare
./data/training/image_2/005525.png,566.85,171.90,581.48,188.61,DontCare
./data/training/image_2/000513.png,568.87,174.25,772.15,366.01,Car
./data/training/image_2/000513.png,1163.37,178.04,1241.00,374.00,Car
./data/training/image_2/000513.png,719.08,169.65,842.26,226.62,Car
./data/training/image_2/000513.png,688.62,172.92,762.41,208.73,Car
./data/training/image_2/000513.png,668.50,174.27,735.15,201.33,Car
./data/training/image_2/000513.png,508.35,177.79,543.90,201.43,Car
./data/training/image_2/000513.png,41.53,193.83,230.53,267.98,Car
./data/training/image_2/000513.png,581.35,173.22,605.75,189.20,Car
./data/training/image_2/000513.png,351.93,181.28,426.19,216.65,Car
./data/training/image_2/000513.png,402.42,173.81,452.22,209.80,Car
./data/training/image_2/000513.png,457.03,175.66,500.62,199.92,Car
./data/training/image_2/000513.png,514.89,176.56,554.22,193.83,Car
可以看出,kitti 这组训练数据虽然只有7481的训练图像,但是每张图像都可能存在多个目标类。比如 000513.png 这张图像就存在很多目标类,所有对应了多个标签。
Kitti数据集中的7481张图像,通过./keras_frcnn/keras_frcnn/simple_parser.py 脚本中的 get_data() 函数划分为训练数据和测试数据,数量如下所示:
Training images per class:
{'Car': 28742,
'Cyclist': 1627,
'DontCare': 11295,
'Misc': 973,
'Pedestrian': 4487,
'Person_sitting': 222,
'Tram': 511,
'Truck': 1094,
'Van': 2914,
'bg': 0}
Num classes (including bg) = 10 # 包括背景的目标类个数
Num train samples 6220 # 训练数据
Num val samples 1261 # 验证数据
3)使用 resnet50 参数作为模型初始化参数,有几个关键的位置如下所示:
# 【1】 ./keras_frcnn/train_frcnn_kitti.py
# base_net_weights
try:
print('loading weights from {}'.format(cfg.base_net_weights))
model_rpn.load_weights(cfg.base_net_weights, by_name=True)
model_classifier.load_weights(cfg.base_net_weights, by_name=True)
except Exception as e:
print(e)
print('Could not load pretrained model weights. Weights can be found in the keras application folder '
'https://github.com/fchollet/keras/tree/master/keras/applications')
# 【2】./keras_frcnn/keras_frcnn/config.py
# self.num_epochs = 3000
self.num_epochs = 100
4)训练模型,执行如下命令:
python train_frcnn_kitti.py
模型开始训练的效果如下所示:
经过几天的 training,模型最终训练完成,参数保存在"./keras_frcnn/model/kitti_frcnn_last.hdf5“。
Average number of overlapping bounding boxes from RPN = 34.579 for 1000 previous iterations
1000/1000 [==============================] - 1329s 1s/step - rpn_cls: 0.2544 - rpn_regr: 0.0987 - detector_cls: 0.3043 - detector_regr: 0.1163
Mean number of bounding boxes from RPN overlapping ground truth boxes: 35.861
Classifier accuracy for bounding boxes from RPN: 0.87734375
Loss RPN classifier: 0.23741177825622123
Loss RPN regression: 0.09788954605642357
Loss Detector classifier: 0.29615414681006225
Loss Detector regression: 0.11018231464223936
Elapsed time: 1329.4224348068237
Training complete, exiting.
5)测试训练好的模型,执行如下命令:
### usage
python test_frcnn_kitti.py # 测试默认的图像,default='images/000010.png'
python test_frcnn_kitti.py -p ./images/000010.png # 测试指定的图像
python test_frcnn_kitti.py -p ./images # 测试指定文件夹下的所有图像,其中images是一个文件夹
如果你还没安装opencv,可以执行如下步骤进行安装:
pip install opencv-python
测试结果如下所示:
- python test_frcnn_kitti.py # 测试默认图像
Loading weights from ./model/kitti_frcnn_last.hdf5
2017-11-26 14:14:11.646544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
predict image from images/000010.png
Car:
[ 359. 189. 539. 299.] prob: 0.9999945163726807
[ 559. 179. 629. 229.] prob: 0.9998922348022461
[ 819. 179. 919. 249.] prob: 0.9997738003730774
[ 999. 199. 1249. 369.] prob: 0.9964742064476013
[ 789. 179. 849. 219.] prob: 0.9433906674385071
[ 589. 179. 639. 219.] prob: 0.9140269756317139
[ 809. 189. 879. 239.] prob: 0.8051236271858215
DontCare:
[ 139. 189. 179. 199.] prob: 0.8146459460258484
Truck:
[ 879. 0. 1259. 249.] prob: 0.9663242697715759
Elapsed time = 6.536675214767456
result saved into ./results_images/000010.png
Please enter any keyboard to exit...
- python test_frcnn_kitti.py -p ./images/001.png # 训练自己的图像
可以看到,训练这张随手拍的照片,效果并不是很好,只能识别出一辆车。之后如果有更多的数据集,再训练训练会有更好的效果。还有一个原因就是,我只训练了250个epoch,不过这时候模型的总损失已经变化不大了。
- python test_frcnn_kitti.py -p ./images # 测试images目录下的所有图像,并把结果存入 results_images
从上图可以看到,模型对images目录下的图像依次进行测试,并将结果存入指定的文件夹(results_images)