YOLO: Real-Time Object Detection

- 常用模型结果对比

Model Train Test mAP FLOPS FPS Cfg Weights
Old YOLO VOC 2007+2012 2007 63.4 40.19 Bn 45 link
SSD300 VOC 2007+2012 2007 74.3 - 46 link
SSD500 VOC 2007+2012 2007 76.8 - 19 link
YOLOv2 VOC 2007+2012 2007 76.8 34.90 Bn 67 cfg weights
YOLOv2 544x544 VOC 2007+2012 2007 78.6 59.68 Bn 40 cfg weights
Tiny YOLO VOC 2007+2012 2007 57.1 6.97 Bn 207 cfg weights
SSD300 COCO trainval test-dev 41.2 - 46 link
SSD500 COCO trainval test-dev 46.5 - 19 link
YOLOv2 608x608 COCO trainval test-dev 48.1 62.94 Bn 40 cfg weights
Tiny YOLO COCO trainval - - 7.07 Bn 200 cfg weights

从表中可以看出,在VOC2007+2010数据集上,从mAP的角度来衡量几种方法,SSD和YOLO2的结果接近而优于YOLO1的结果和Tiny YOLO的结果,而从FPS速度的角度来衡量,SSD500最差,Tiny YOLO最优,YOLO2的速度要优于YOLO1和SSD300.

- YOLO环境搭建

  • clone yolo包
git clone https://github.com/pjreddie/darknet
cd darknet
  • 配置makefile文件
GPU=1    #配置好cuda环境 这里将GPU=0改为GPU=1
CUDNN=0  #优于作者使用的是V4版本的cudnn,如果系统内安装的cudnnV5以上版本的,此处最好不要开启cudnn加速,否则在make的时候会报错
OPENCV=1 #开启opencv环境

#ARCH处可以删除compute_20这一行,build compute_20已经被弃用了
ARCH= -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52]

# This is what I use, uncomment if you know your arch and want to specify
# ARCH=  -gencode arch=compute_52,code=compute_52

NVCC=/usr/local/cuda-8.0/bin/nvcc  #此处自己添加NVCC的路径,我用的是cuda8.0版本
  • 编译
make -j8


gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/gemm.c -o obj/gemm.o
gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/utils.c -o obj/utils.o
gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/cuda.c -o obj/cuda.o
gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/deconvolutional_layer.c -o obj/deconvolutional_layer.o
gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/convolutional_layer.c -o obj/convolutional_layer.o
gcc  -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -Wall -Wfatal-errors  -Ofast -DOPENCV -DGPU -c ./src/list.c -o obj/list.o
  • 测试opencv
./darknet imtest data/eagle.jpg


YOLO: Real-Time Object Detection_第1张图片
YOLO: Real-Time Object Detection_第2张图片
  • 可选项
    (1)change what card Darknet uses
./darknet -i 1 imagenet test cfg/alexnet.cfg alexnet.weights


./darknet -nogpu imagenet test cfg/alexnet.cfg alexnet.weights
  • 下载the pre-trained weight
wget http://pjreddie.com/media/files/yolo.weights
  • 分类和检测
./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg


layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32
    1 max          2 x 2 / 2   416 x 416 x  32   ->   208 x 208 x  32
    2 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64
    3 max          2 x 2 / 2   208 x 208 x  64   ->   104 x 104 x  64
    4 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
    5 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64
    6 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
    7 max          2 x 2 / 2   104 x 104 x 128   ->    52 x  52 x 128
    8 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
    9 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   10 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   11 max          2 x 2 / 2    52 x  52 x 256   ->    26 x  26 x 256
   12 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   13 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   14 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   15 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   16 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   17 max          2 x 2 / 2    26 x  26 x 512   ->    13 x  13 x 512
   18 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   19 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   20 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   21 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   22 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   23 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   24 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   25 route  16
   26 conv     64  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x  64
   27 reorg              / 2    26 x  26 x  64   ->    13 x  13 x 256
   28 route  27 24
   29 conv   1024  3 x 3 / 1    13 x  13 x1280   ->    13 x  13 x1024
   30 conv    425  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 425
   31 detection
Loading weights from yolo.weights...Done!
data/dog.jpg: Predicted in 0.213838 seconds.  #检测所用的时间
pottedplant: 26%                              #此处以下几项为检测到的目标的分类以及其检测精度
truck: 74%
bicycle: 25%
dog: 81%
bicycle: 83%
YOLO: Real-Time Object Detection_第3张图片


Loading weights from yolo.weights...Done!
data/timg.jpg: Predicted in 0.210836 seconds.
person: 58%
person: 61%
person: 36%
person: 68%
person: 40%
person: 81%
horse: 60%
horse: 76%
horse: 84%
horse: 79%
horse: 72%
YOLO: Real-Time Object Detection_第4张图片
Loading weights from yolo.weights...Done!
data/plane.jpg: Predicted in 0.213104 seconds.
aeroplane: 73%
aeroplane: 63%
aeroplane: 75%
aeroplane: 72%
aeroplane: 40%
aeroplane: 78%
aeroplane: 54%
aeroplane: 65%
YOLO: Real-Time Object Detection_第5张图片


  • Tiny YOLO(速度比YOLO要快但是精度有所下降)
wget http://pjreddie.com/media/files/tiny-yolo-voc.weights #下载预训练的tiny yolo的超参数文件


./darknet detector test cfg/voc.data cfg/tiny-yolo-voc.cfg tiny-yolo-voc.weights data/dog.jpg


Loading weights from tiny-yolo-voc.weights...Done!
data/person.jpg: Predicted in 0.187108 seconds.
dog: 53%
person: 73%
sheep: 60%                 #检测错误
YOLO: Real-Time Object Detection_第6张图片


Loading weights from yolo.weights...Done!
data/person.jpg: Predicted in 0.252314 seconds.
dog: 85%
person: 85%
horse: 91%
YOLO: Real-Time Object Detection_第7张图片


