coco数据集的全称是MS COCO(Microsoft Common Objects in Context), 起源于微软2014年出资标注的Microsoft COCO数据集,COCO是当前目标识别,检测领域最重要,最权威的一个标杆。
yolo3网络物体的检测
搭建好后,用预训练好的权重测试推理功能:
现在我们用coco数据库训练自己的网络:
要下载COCO数据和标签,darknet下scripts/get_coco_dataset.sh就是干这个的
执行如下命令:
$cp scripts/get_coco_dataset.sh data
$cd data
$bash get_coco_dataset.sh
查看脚本,实际上执行的工作很简单,测试集和训练集图片文件非常大,如果嫌wget过慢,可以在下载工具中输入其中的链接单独下载,不影响我们的目的
#!/bin/bash
# Clone COCO API
git clone https://github.com/pdollar/coco
cd coco
mkdir images
cd images
# Download Images
wget -c https://pjreddie.com/media/files/train2014.zip
wget -c https://pjreddie.com/media/files/val2014.zip
# Unzip
unzip -q train2014.zip
unzip -q val2014.zip
cd ..
# Download COCO Metadata
wget -c https://pjreddie.com/media/files/instances_train-val2014.zip
wget -c https://pjreddie.com/media/files/coco/5k.part
wget -c https://pjreddie.com/media/files/coco/trainvalno5k.part
wget -c https://pjreddie.com/media/files/coco/labels.tgz
tar xzf labels.tgz
unzip -q instances_train-val2014.zip
# Set Up Image Lists
paste <(awk "{print \"$PWD\"}" <5k.part) 5k.part | tr -d '\t' > 5k.txt
paste <(awk "{print \"$PWD\"}" trainvalno5k.txt
完成后,修改数据配置文件和网络配置文件,数据配置文件指向我们新下载的数据集,而网络配置文件告诉环境是训练而非推理,YOLOV3能识别80种类别的物体,如下图:
caozilong@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet$ git diff
diff --git a/cfg/coco.data b/cfg/coco.data
index 3003841..6391e06 100644
--- a/cfg/coco.data
+++ b/cfg/coco.data
@@ -1,8 +1,8 @@
classes= 80
-train = /home/pjreddie/data/coco/trainvalno5k.txt
+train = /media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet/data/coco/trainvalno5k.txt
valid = coco_testdev
#valid = data/coco_val_5k.list
names = data/coco.names
-backup = /home/pjreddie/backup/
+backup = /media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet/backup/
eval=coco
diff --git a/cfg/yolov3.cfg b/cfg/yolov3.cfg
index 938ffff..ae0f4dc 100644
--- a/cfg/yolov3.cfg
+++ b/cfg/yolov3.cfg
@@ -4,7 +4,7 @@
# subdivisions=1
# Training
batch=64
-subdivisions=16
+subdivisions=8
width=608
height=608
channels=3
caozilong@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet$
backup目录是存放训练过程中的权重文件的目录,训练前为空.
我们不从头训练,选择一个预训练好的权重文件
wget https://pjreddie.com/media/files/darknet53.conv.74
执行训练的命令是
./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74
root@caozilong-Vostro-3268:/media/caozilong/64bce557-96fc-4340-a2a7-973b0be41336/darknet# ./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74
yolov3
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BFLOPs
1 conv 64 3 x 3 / 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BFLOPs
2 conv 32 1 x 1 / 1 304 x 304 x 64 -> 304 x 304 x 32 0.379 BFLOPs
3 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BFLOPs
4 res 1 304 x 304 x 64 -> 304 x 304 x 64
5 conv 128 3 x 3 / 2 304 x 304 x 64 -> 152 x 152 x 128 3.407 BFLOPs
6 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
7 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
8 res 5 152 x 152 x 128 -> 152 x 152 x 128
9 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
10 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
11 res 8 152 x 152 x 128 -> 152 x 152 x 128
12 conv 256 3 x 3 / 2 152 x 152 x 128 -> 76 x 76 x 256 3.407 BFLOPs
13 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
14 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
15 res 12 76 x 76 x 256 -> 76 x 76 x 256
16 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
17 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
18 res 15 76 x 76 x 256 -> 76 x 76 x 256
19 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
20 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
21 res 18 76 x 76 x 256 -> 76 x 76 x 256
22 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
23 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
24 res 21 76 x 76 x 256 -> 76 x 76 x 256
25 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
26 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
27 res 24 76 x 76 x 256 -> 76 x 76 x 256
28 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
29 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
30 res 27 76 x 76 x 256 -> 76 x 76 x 256
31 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
32 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
33 res 30 76 x 76 x 256 -> 76 x 76 x 256
34 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
35 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
36 res 33 76 x 76 x 256 -> 76 x 76 x 256
37 conv 512 3 x 3 / 2 76 x 76 x 256 -> 38 x 38 x 512 3.407 BFLOPs
38 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
39 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
40 res 37 38 x 38 x 512 -> 38 x 38 x 512
41 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
42 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
43 res 40 38 x 38 x 512 -> 38 x 38 x 512
44 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
45 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
46 res 43 38 x 38 x 512 -> 38 x 38 x 512
47 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
48 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
49 res 46 38 x 38 x 512 -> 38 x 38 x 512
50 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
51 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
52 res 49 38 x 38 x 512 -> 38 x 38 x 512
53 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
54 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
55 res 52 38 x 38 x 512 -> 38 x 38 x 512
56 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
57 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
58 res 55 38 x 38 x 512 -> 38 x 38 x 512
59 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
60 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
61 res 58 38 x 38 x 512 -> 38 x 38 x 512
62 conv 1024 3 x 3 / 2 38 x 38 x 512 -> 19 x 19 x1024 3.407 BFLOPs
63 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
64 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
65 res 62 19 x 19 x1024 -> 19 x 19 x1024
66 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
67 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
68 res 65 19 x 19 x1024 -> 19 x 19 x1024
69 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
70 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
71 res 68 19 x 19 x1024 -> 19 x 19 x1024
72 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
73 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
74 res 71 19 x 19 x1024 -> 19 x 19 x1024
75 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
76 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
77 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
78 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
79 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
80 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
81 conv 255 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 255 0.189 BFLOPs
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 19 x 19 x 512 -> 19 x 19 x 256 0.095 BFLOPs
85 upsample 2x 19 x 19 x 256 -> 38 x 38 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 38 x 38 x 768 -> 38 x 38 x 256 0.568 BFLOPs
88 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
89 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
90 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
91 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
92 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
93 conv 255 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 255 0.377 BFLOPs
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 38 x 38 x 256 -> 38 x 38 x 128 0.095 BFLOPs
97 upsample 2x 38 x 38 x 128 -> 76 x 76 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 76 x 76 x 384 -> 76 x 76 x 128 0.568 BFLOPs
100 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
101 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
102 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
103 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
104 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
105 conv 255 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 255 0.754 BFLOPs
106 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
512
Loaded: 0.421137 seconds
Region 82 Avg IOU: 0.201348, Class: 0.455102, Obj: 0.244645, No Obj: 0.390294, .5R: 0.062500, .75R: 0.000000, count: 16
Region 94 Avg IOU: 0.120858, Class: 0.537060, Obj: 0.543120, No Obj: 0.518701, .5R: 0.038462, .75R: 0.000000, count: 26
Region 106 Avg IOU: 0.165164, Class: 0.496659, Obj: 0.426623, No Obj: 0.418840, .5R: 0.020833, .75R: 0.000000, count: 48
在没有GPU的平台上,这个训练要花费很长很长的时间, region表示网络的层数,region 106表示网络的层数,例如,上面的最后一行,表示的是第106层的输出。
经过上面的操作后,coco数据集目录变成下面的样子:
比较重要的目录有images目录和labels目录,其中,images/train2014包含的是全部的训练集图片:
训练集包含13G,共计82783张照片,以其中的一个图片文件为例,查看其内容:eog COCO_train2014_000000387558.jpg
images/val2014包含验证集的图片,共计6.3G. 40504张照片, 查看其中一张的内容eog COCO_val2014_000000485307.jpg
由于深度学习属于监督学习的一种,训练集和测试集都要有对应的标签,labels/train2014即为训练集标签,直观体验一下它的内容
YOLO采用中心坐标的形式画框,标签的格式为:
class_id x y w h
其中,class_id表示类别ID的编号,标签的第一列是类别,第二列应该是归一化后的坐标.
x=x_center/width
y=y_center/height
w=(xmax-xmin)/width
h=(ymax-ymin)/height
以滑冰男人的图像为例,它的内容是vim COCO_train2014_000000387558.txt
我们可以根据上面的公式,计算实际物体的像素坐标
比如,以class id为0的object为例,图片信息为:
则
x_center = x * width = 0x381617 * 640 = 244 pixels
y_center = y * height = 0.534906 * 425 = 227 pixels
xmax-xmin = w * width = 0.270672 * 640 = 173 pixels
ymax-ymin = h * height = 0.586706 * 425 = 249 pixels
联立:
xmax+xmin = 2 * 244 = 488 =>xmax = 330, xmin=158
ymax+ymin = 2 * 227 = 454 =>ymax=352, ymin=102
得到对角线坐标分别为(158,102)(330,352)
同理,物体36:
x_center = x * width = 0.308930 * 640 = 198 pixels
y_center = y * height = 0.888165 * 425 = 377 pixels
xmax-xmin = w * width = 0.169859 * 640 = 109 pixels
ymax-ymin = h * height = 0.130729 * 425 = 56 pixels
联立:
xmax+xmin = 2 * 198 = 396 =>xmax = 253, xmin=143
ymax+ymin = 2 * 377 = 754 =>ymax=405, ymin=349
得到对角线坐标分别为(143,349)(253, 405)
我们使用下面的代码把它绘制出来:
import cv2
fname = './COCO_train2014_000000387558.jpg'
img = cv2.imread(fname)
pt1 = (158, 102)
pt2 = (330, 352)
cv2.rectangle(img, pt1, pt2, (0, 255, 0), 2)
pt3 = (143, 349)
pt4 = (253, 405)
cv2.rectangle(img, pt3, pt4, (255, 0, 0), 2)
cv2.imwrite('22.jpg', img)
绘制出来的效果如下,可以印证,我们对label文件中坐标的理解以及算法推演是正确的。
labels/val2014是验证集的标签:
以其中一个文件为例,COCO_val2014_000000581929.txt.
对应图片如下,可见类别号17表示的是马匹。
上道硬菜,用GPU加速的YOLO+OpenCV,可以做出下面精彩的方案: