SSD详细版

SSD详细版

  • 1 数据处理
    • 1.1 数据集划分voc2ssd.py
    • 1.2 读入数据voc_annotation.py
      • 1.2.1 代码步骤
      • 1.2.2 代码
  • 2 主干网络
    • 2.1 主干网络流程
      • 2.1.1 建立VGG16模型步骤
      • 2.1.2 VGG16代码
      • 2.1.3 SSD300代码
    • 2.2 对特征层分别分类、回归、生成先验框
      • 2.2.1 对特征层分别分类、回归、生成先验框
      • 2.2.2 对特征层分别分类和回归代码
  • 3 制作标签
    • 3.1 流程
    • 3.2 代码步骤
    • 3.3 代码
  • 4 预测
    • 4.1 预测流程
      • 4.1.1 预测代码步骤
      • 4.1.2 预测代码
    • 4.2 检测目标
      • 4.2.1 检测步骤
      • 4.2.2 检测代码
      • 4.2.3 图像加入灰条
      • 4.2.4 将预测结果进行解码-->筛选-->nms-->选出top_k
      • 4.2.5 筛选出其中得分高于confidence的框
      • 4.2.6 去掉灰条
      • 4.2.7 画框
  • 5 训练
    • 5.1 流程
      • 5.1.1 bbox_util代码步骤
      • 5.1.2 loss代码步骤
      • 5.1.3 loss 代码
      • 5.2.1 gen.generate(True)代码步骤
      • 5.2.2 gen.generate(True)代码
  • 6 模型评估
    • 6.1 流程
      • 6.1.1 获得预测框代码步骤
      • 6.1.2 获得真实框代码步骤
      • 6.1.3 计算m_AP代码步骤
      • 6.1.4 get_dr_txt.py 代码
      • 6.1.5 get_gt_txt.py 代码
      • 6.1.6 get_map 代码
  • 7 训练自己数据集的步骤
    • 7.1 流程

SSD在VGGNet基础上又增加几个卷基层,然后用3*3的卷积核在不同尺度上进行分类和回归。SSD的创新点:数据增强、VGGNet+卷积块儿、PriorBox与多层特征图、正负样本选取与损失函数的设计。

SSD的优点:

  1. 利用多特征检图测在某些场景下可以与FasterRCNN媲美。
  2. 检测速度可以超过同期的FasterRCNN和YOLO算法。
  3. 网络优化简单。

SSD的缺点:
1 . PriorBox需要人工设置。
2. 检测精度有限。

1 数据处理

1.1 数据集划分voc2ssd.py

这个代码的目的是根据annotation里每个图片的名称划分数据为训练集、测试集、验证集,并把划分好的数据集的名称存放在以’.txt‘结束的文件中。

代码步骤:

  1. 设置xml的地址和数据集名称的保存地址.xmlfilepath,saveBasePath
  2. 根据训练测试、测试比例、数据集总量确定各个数据集的数量.trainval_percent,train_percent ,total_xml
  3. 根据总的数据集和各个数据集的数量抽样得到各个数据集下标的集合. tv,tr,trainval,train
  4. 根据上一步的结果写入各个数据集并保存.ftrainval,ftest,ftrain,fval

代码

'''
xmlfilepath,saveBasePath + trainval_percent,train_percent ,total_xml + tv,tr,trainval,train + ftrainval,ftest,ftrain,fval + 遍历
'''
import os
import random

xmlFilePath = r'/Users/liushuang/Desktop/LearnGit/Bubbliiiing资料/Keras/目标检测/ssd-keras-master/VOCdevkit/VOC2007/Annotations'
saveBasePath = r'/Users/liushuang/Desktop/LearnGit/Bubbliiiing资料/Keras/目标检测/ssd-keras-master/VOCdevkit/VOC2007/ImageSets/Main'

trainval_percent = 0.9
train_percent = 0.9
temp = os.listdir(xmlFilePath)
total_xml = []
for i in temp:
    if i.endswith('.xml'):
        total_xml.append(i)

num = len(total_xml)
tv = int(trainval_percent*num)
tr = int(tv*train_percent)
list = range(num)
trainval = random.sample(list,tv)
train = random.sample(trainval,tr)

ftrainval = open(os.path.join(saveBasePath,'LStrainval.txt'),'w')
ftest =  open(os.path.join(saveBasePath,'LStest.txt'),'w')
ftrain =  open(os.path.join(saveBasePath,'LStrain.txt'),'w')
fval =  open(os.path.join(saveBasePath,'LSval.txt'),'w')

for i in list:
    name = total_xml[i][:-4]+'\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftest.close()
ftrain.close()
fval.close()

1.2 读入数据voc_annotation.py

这个文件的目的是读入图片地址、框、类别信息。

1.2.1 代码步骤

  1. 写一个读入xml框和类别的函数。
  2. 遍历每一个数据集,分别读入图片地址、框和类别的信息。

1.2.2 代码

'''
convert_annotation(year, image_id, list_file): difficult,cls + cls_id ,xmlbox + b ;
year,imge_set + image_ids,list_file + list_file.write(wd, year, image_id) , convert_annotation(year, image_id, list_file)
'''
import os
import xml.etree.ElementTree as ET

sets = [('2007','train'),('2007','val'),('2007','test')]
classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

def convert_annotaion(year,image_id,list_file):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year,image_id))
    tree = ET.parse(in_file)
    root = tree.getroot()
    
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text),int(xmlbox.find('ymin').text),int(xmlbox.find('xmax').text),int(xmlbox.find('ymax').text))
        list_file.write(' '+','.join([str(a) for a in b])+','+str(cls_id))
        
wd = os.getcwd()

for year,image_set in sets:
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year,image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year,image_set),'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg'%(wd,year,image_id ))
        convert_annotaion(year,image_id,list_file)
        list_file.write('\n')
    list_file.close()

2 主干网络

主干网络对输入的图片进行一系列卷机、池化、激活提取不同的特征,再根据特征进行分类和回归。

self.ssd_model = ssd.SSD300(self.model_image_size, self.num_classes)

2.1 主干网络流程

  1. 输入图片[300,300,3]带入VGG16中得到net。提取net([‘conv4_3’])、net[‘fc7’]、提取netnet[‘conv6_2’] 、 提取net[‘conv7_2’] 、 net[‘conv8_2’]、net[‘conv9_2’] 。
  2. 对上一步提取的特征分类和回归

2.1.1 建立VGG16模型步骤

  1. Block 1 (300,300,3 -> 150,150,64) : input–>Conv2D*2+MaxPooling2D–>net[‘conv1_1’]+net[‘conv1_2’]+net[‘pool1’]
  2. Block 2 (150,150,64 -> 75,75,128) : net[‘pool1’]–>Conv2D*2+MaxPooling2D–>net[‘conv2_1’]+net[‘conv2_2’]+net[‘pool2’]
  3. Block 3 (75,75,128 -> 38,38,256) : net[‘pool2’] --> Conv2D*3 + MaxPooling2D --> net[‘conv3_1’]+net[‘conv3_2’]+net[‘conv3_2’] + net[‘pool3’]
  4. Block 4 (38,38,256 -> 19,19,512) : net[‘pool3’] --> Conv2D*3 + MaxPooling2D --> net[‘conv4_1’]+net[‘conv4_2’]+net[‘conv4_2’] + net[‘pool4’]
  5. Block 5 (19,19,512 -> 19,19,512) : net[‘pool4’] --> Conv2D*3 + MaxPooling2D --> net[‘conv5_1’]+net[‘conv5_2’]+net[‘conv5_2’] + net[‘pool5’]
  6. FC6 (19,19,512 -> 19,19,1024) : net[‘pool5’] --> Conv2D --> net[‘fc6’]
  7. FC7 (19,19,1024 -> 19,19,1024) :net[‘fc6’]–> Conv2D --> net[‘fc7’]
  8. Block 6 (19,19,512 -> 10,10,512) : net[‘fc7’] --> Conv2D*2 --> net[‘conv6_1’] + net[‘conv6_2’]
  9. Block 7 (10,10,512 -> 5,5,256) : net[‘conv6_2’] --> Conv2D*2 --> net[‘conv7_1’] + net[‘conv7_2’]
  10. Block 8 (5,5,256 -> 3,3,256) : net[‘conv7_2’] --> Conv2D*2 --> net[‘conv8_1’] + net[‘conv8_2’]
  11. Block 9 (3,3,256 -> 1,1,256) : net[‘conv8_2’] --> Conv2D*2 --> net[‘conv9_1’] + net[‘conv9_2’]

2.1.2 VGG16代码

import keras.backend as K
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import GlobalAveragePooling2D
from keras.layers import Input
from keras.layers import MaxPooling2D
from keras.layers import merge, concatenate
from keras.layers import Reshape
from keras.layers import ZeroPadding2D
from keras.models import Model

def VGG16(input_tensor):
    #----------------------------主干特征提取网络开始---------------------------#
    # SSD结构,net字典
    net = {} 
    # Block 1
    net['input'] = input_tensor
    # 300,300,3 -> 150,150,64
    net['conv1_1'] = Conv2D(64, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv1_1')(net['input'])
    net['conv1_2'] = Conv2D(64, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv1_2')(net['conv1_1'])
    net['pool1'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
                                name='pool1')(net['conv1_2'])

    
    # Block 2
    # 150,150,64 -> 75,75,128
    net['conv2_1'] = Conv2D(128, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv2_1')(net['pool1'])
    net['conv2_2'] = Conv2D(128, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv2_2')(net['conv2_1'])
    net['pool2'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
                                name='pool2')(net['conv2_2'])
    # Block 3
    # 75,75,128 -> 38,38,256
    net['conv3_1'] = Conv2D(256, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv3_1')(net['pool2'])
    net['conv3_2'] = Conv2D(256, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv3_2')(net['conv3_1'])
    net['conv3_3'] = Conv2D(256, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv3_3')(net['conv3_2'])
    net['pool3'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
                                name='pool3')(net['conv3_3'])
    # Block 4
    # 38,38,256 -> 19,19,512
    net['conv4_1'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv4_1')(net['pool3'])
    net['conv4_2'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv4_2')(net['conv4_1'])
    net['conv4_3'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv4_3')(net['conv4_2'])
    net['pool4'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
                                name='pool4')(net['conv4_3'])
    # Block 5
    # 19,19,512 -> 19,19,512
    net['conv5_1'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv5_1')(net['pool4'])
    net['conv5_2'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv5_2')(net['conv5_1'])
    net['conv5_3'] = Conv2D(512, kernel_size=(3,3),
                                   activation='relu',
                                   padding='same',
                                   name='conv5_3')(net['conv5_2'])
    net['pool5'] = MaxPooling2D((3, 3), strides=(1, 1), padding='same',
                                name='pool5')(net['conv5_3'])
    # FC6
    # 19,19,512 -> 19,19,1024
    net['fc6'] = Conv2D(1024, kernel_size=(3,3), dilation_rate=(6, 6),
                                     activation='relu', padding='same',
                                     name='fc6')(net['pool5'])

    # x = Dropout(0.5, name='drop6')(x)
    # FC7
    # 19,19,1024 -> 19,19,1024
    net['fc7'] = Conv2D(1024, kernel_size=(1,1), activation='relu',
                               padding='same', name='fc7')(net['fc6'])

    # x = Dropout(0.5, name='drop7')(x)
    # Block 6
    # 19,19,512 -> 10,10,512
    net['conv6_1'] = Conv2D(256, kernel_size=(1,1), activation='relu',
                                   padding='same',
                                   name='conv6_1')(net['fc7'])
    net['conv6_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv6_padding')(net['conv6_1'])
    net['conv6_2'] = Conv2D(512, kernel_size=(3,3), strides=(2, 2),
                                   activation='relu',
                                   name='conv6_2')(net['conv6_2'])

    # Block 7
    # 10,10,512 -> 5,5,256
    net['conv7_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
                                   padding='same', 
                                   name='conv7_1')(net['conv6_2'])
    net['conv7_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv7_padding')(net['conv7_1'])
    net['conv7_2'] = Conv2D(256, kernel_size=(3,3), strides=(2, 2),
                                   activation='relu', padding='valid',
                                   name='conv7_2')(net['conv7_2'])
    # Block 8
    # 5,5,256 -> 3,3,256
    net['conv8_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
                                   padding='same',
                                   name='conv8_1')(net['conv7_2'])
    net['conv8_2'] = Conv2D(256, kernel_size=(3,3), strides=(1, 1),
                                   activation='relu', padding='valid',
                                   name='conv8_2')(net['conv8_1'])

    # Block 9
    # 3,3,256 -> 1,1,256
    net['conv9_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
                                   padding='same',
                                   name='conv9_1')(net['conv8_2'])
    net['conv9_2'] = Conv2D(256, kernel_size=(3,3), strides=(1, 1),
                                   activation='relu', padding='valid',
                                   name='conv9_2')(net['conv9_1'])
    #----------------------------主干特征提取网络结束---------------------------#
    return net

if __name__ == "__main__":
    from keras.layers import Input
    input_tensor = Input(shape = [300,300,3])
    net = VGG16(input_tensor)
    for i in net:
        print(net[i])
        # print('\n')

'''
Tensor("input_1:0", shape=(?, 300, 300, 3), dtype=float32)
Tensor("conv1_1/Relu:0", shape=(?, 300, 300, 64), dtype=float32)
Tensor("conv1_2/Relu:0", shape=(?, 300, 300, 64), dtype=float32)
Tensor("pool1/MaxPool:0", shape=(?, 150, 150, 64), dtype=float32)
Tensor("conv2_1/Relu:0", shape=(?, 150, 150, 128), dtype=float32)
Tensor("conv2_2/Relu:0", shape=(?, 150, 150, 128), dtype=float32)
Tensor("pool2/MaxPool:0", shape=(?, 75, 75, 128), dtype=float32)
Tensor("conv3_1/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("conv3_2/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("conv3_3/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("pool3/MaxPool:0", shape=(?, 38, 38, 256), dtype=float32)
Tensor("conv4_1/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("conv4_2/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("conv4_3/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("pool4/MaxPool:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_1/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_2/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_3/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("pool5/MaxPool:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("fc6/Relu:0", shape=(?, 19, 19, 1024), dtype=float32)
Tensor("fc7/Relu:0", shape=(?, 19, 19, 1024), dtype=float32)
Tensor("conv6_1/Relu:0", shape=(?, 19, 19, 256), dtype=float32)
Tensor("conv6_2/Relu:0", shape=(?, 10, 10, 512), dtype=float32)
Tensor("conv7_1/Relu:0", shape=(?, 10, 10, 128), dtype=float32)
Tensor("conv7_2/Relu:0", shape=(?, 5, 5, 256), dtype=float32)
Tensor("conv8_1/Relu:0", shape=(?, 5, 5, 128), dtype=float32)
Tensor("conv8_2/Relu:0", shape=(?, 3, 3, 256), dtype=float32)
Tensor("conv9_1/Relu:0", shape=(?, 3, 3, 128), dtype=float32)
Tensor("conv9_2/Relu:0", shape=(?, 1, 1, 256), dtype=float32)
'''

2.1.3 SSD300代码

import keras.backend as K
from keras.layers import Activation
#from keras.layers import AtrousConvolution2D
from keras.layers import Conv2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import GlobalAveragePooling2D
from keras.layers import Input
from keras.layers import MaxPooling2D
from keras.layers import merge, concatenate
from keras.layers import Reshape
from keras.layers import ZeroPadding2D
from keras.models import Model
from nets.VGG16 import VGG16
from nets.ssd_layers import Normalize
from nets.ssd_layers import PriorBox


def SSD300(input_shape, num_classes=21):
    # 300,300,3
    input_tensor = Input(shape=input_shape)
    img_size = (input_shape[1], input_shape[0])

    # SSD结构,net字典
    net = VGG16(input_tensor)
    #-----------------------将提取到的主干特征进行处理---------------------------#
    # 对conv4_3进行处理 38,38,512
    net['conv4_3_norm'] = Normalize(20, name='conv4_3_norm')(net['conv4_3'])
    num_priors = 4
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    net['conv4_3_norm_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same', name='conv4_3_norm_mbox_loc')(net['conv4_3_norm'])
    net['conv4_3_norm_mbox_loc_flat'] = Flatten(name='conv4_3_norm_mbox_loc_flat')(net['conv4_3_norm_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    net['conv4_3_norm_mbox_conf'] = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv4_3_norm_mbox_conf')(net['conv4_3_norm'])
    net['conv4_3_norm_mbox_conf_flat'] = Flatten(name='conv4_3_norm_mbox_conf_flat')(net['conv4_3_norm_mbox_conf'])
    priorbox = PriorBox(img_size, 30.0,max_size = 60.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv4_3_norm_mbox_priorbox')
    net['conv4_3_norm_mbox_priorbox'] = priorbox(net['conv4_3_norm']) # prior_boxes_tensor.shape :TensorShape([Dimension(38), Dimension(5776), Dimension(8)])
    
    # 对fc7层进行处理 
    num_priors = 6
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    net['fc7_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3,3),padding='same',name='fc7_mbox_loc')(net['fc7'])
    net['fc7_mbox_loc_flat'] = Flatten(name='fc7_mbox_loc_flat')(net['fc7_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    net['fc7_mbox_conf'] = Conv2D(num_priors * num_classes, kernel_size=(3,3),padding='same',name='fc7_mbox_conf')(net['fc7'])
    net['fc7_mbox_conf_flat'] = Flatten(name='fc7_mbox_conf_flat')(net['fc7_mbox_conf'])

    priorbox = PriorBox(img_size, 60.0, max_size=111.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='fc7_mbox_priorbox')
    net['fc7_mbox_priorbox'] = priorbox(net['fc7'])

    # 对conv6_2进行处理
    num_priors = 6
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv6_2_mbox_loc')(net['conv6_2'])
    net['conv6_2_mbox_loc'] = x
    net['conv6_2_mbox_loc_flat'] = Flatten(name='conv6_2_mbox_loc_flat')(net['conv6_2_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv6_2_mbox_conf')(net['conv6_2'])
    net['conv6_2_mbox_conf'] = x
    net['conv6_2_mbox_conf_flat'] = Flatten(name='conv6_2_mbox_conf_flat')(net['conv6_2_mbox_conf'])

    priorbox = PriorBox(img_size, 111.0, max_size=162.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv6_2_mbox_priorbox')
    net['conv6_2_mbox_priorbox'] = priorbox(net['conv6_2'])

    # 对conv7_2进行处理
    num_priors = 6
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv7_2_mbox_loc')(net['conv7_2'])
    net['conv7_2_mbox_loc'] = x
    net['conv7_2_mbox_loc_flat'] = Flatten(name='conv7_2_mbox_loc_flat')(net['conv7_2_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv7_2_mbox_conf')(net['conv7_2'])
    net['conv7_2_mbox_conf'] = x
    net['conv7_2_mbox_conf_flat'] = Flatten(name='conv7_2_mbox_conf_flat')(net['conv7_2_mbox_conf'])

    priorbox = PriorBox(img_size, 162.0, max_size=213.0, aspect_ratios=[2, 3],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv7_2_mbox_priorbox')
    net['conv7_2_mbox_priorbox'] = priorbox(net['conv7_2'])

    # 对conv8_2进行处理
    num_priors = 4
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv8_2_mbox_loc')(net['conv8_2'])
    net['conv8_2_mbox_loc'] = x
    net['conv8_2_mbox_loc_flat'] = Flatten(name='conv8_2_mbox_loc_flat')(net['conv8_2_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv8_2_mbox_conf')(net['conv8_2'])
    net['conv8_2_mbox_conf'] = x
    net['conv8_2_mbox_conf_flat'] = Flatten(name='conv8_2_mbox_conf_flat')(net['conv8_2_mbox_conf'])

    priorbox = PriorBox(img_size, 213.0, max_size=264.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv8_2_mbox_priorbox')
    net['conv8_2_mbox_priorbox'] = priorbox(net['conv8_2'])

    # 对conv9_2进行处理
    num_priors = 4
    # 预测框的处理
    # num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
    x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv9_2_mbox_loc')(net['conv9_2'])
    net['conv9_2_mbox_loc'] = x
    net['conv9_2_mbox_loc_flat'] = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])
    # num_priors表示每个网格点先验框的数量,num_classes是所分的类
    x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv9_2_mbox_conf')(net['conv9_2'])
    net['conv9_2_mbox_conf'] = x
    net['conv9_2_mbox_conf_flat'] = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])
    
    priorbox = PriorBox(img_size, 264.0, max_size=315.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv9_2_mbox_priorbox')

    net['conv9_2_mbox_priorbox'] = priorbox(net['conv9_2'])

    # 将所有结果进行堆叠
    net['mbox_loc'] = concatenate([net['conv4_3_norm_mbox_loc_flat'],
                             net['fc7_mbox_loc_flat'],
                             net['conv6_2_mbox_loc_flat'],
                             net['conv7_2_mbox_loc_flat'],
                             net['conv8_2_mbox_loc_flat'],
                             net['conv9_2_mbox_loc_flat']],
                            axis=1, name='mbox_loc')
    net['mbox_conf'] = concatenate([net['conv4_3_norm_mbox_conf_flat'],
                              net['fc7_mbox_conf_flat'],
                              net['conv6_2_mbox_conf_flat'],
                              net['conv7_2_mbox_conf_flat'],
                              net['conv8_2_mbox_conf_flat'],
                              net['conv9_2_mbox_conf_flat']],
                             axis=1, name='mbox_conf')
    net['mbox_priorbox'] = concatenate([net['conv4_3_norm_mbox_priorbox'],
                                  net['fc7_mbox_priorbox'],
                                  net['conv6_2_mbox_priorbox'],
                                  net['conv7_2_mbox_priorbox'],
                                  net['conv8_2_mbox_priorbox'],
                                  net['conv9_2_mbox_priorbox']],
                                  axis=1, name='mbox_priorbox')

    if hasattr(net['mbox_loc'], '_keras_shape'):
        num_boxes = net['mbox_loc']._keras_shape[-1] // 4
    elif hasattr(net['mbox_loc'], 'int_shape'):
        num_boxes = K.int_shape(net['mbox_loc'])[-1] // 4 # 8732
    # 8732,4
    net['mbox_loc'] = Reshape((num_boxes, 4),name='mbox_loc_final')(net['mbox_loc'])
    # 8732,21
    net['mbox_conf'] = Reshape((num_boxes, num_classes),name='mbox_conf_logits')(net['mbox_conf'])
    net['mbox_conf'] = Activation('softmax',name='mbox_conf_final')(net['mbox_conf'])

    net['predictions'] = concatenate([net['mbox_loc'],
                               net['mbox_conf'],
                               net['mbox_priorbox']],
                               axis=2, name='predictions')
    # predictions(Concatenate)(None, 8732, 33)  8732= 38**2*4+19**2*6+10**2*6+5**2*6+3**2*4+1**2*4
    # print(net['predictions']) # 4+21+8=33 预测偏移+背景+类别+先验框x1y1x2y2+variances
    # print(net['predictions'].shape) : (None, 8732, 33)
    z=0
    for i ,j in net.items():
        print('{}  {}: {}'.format(z,i,j.shape))
        z+=1
    model = Model(net['input'], net['predictions'])
    return model



if __name__=='__main__':
    model = SSD300((300,300,3), num_classes=21)
    model.summary()


'''
/usr/local/bin/python3.6
Instructions for updating:
Colocations handled automatically by placer.
0  input: (?, 300, 300, 3)
1  conv1_1: (?, 300, 300, 64)
2  conv1_2: (?, 300, 300, 64)
3  pool1: (?, 150, 150, 64)
4  conv2_1: (?, 150, 150, 128)
5  conv2_2: (?, 150, 150, 128)
6  pool2: (?, 75, 75, 128)
7  conv3_1: (?, 75, 75, 256)
8  conv3_2: (?, 75, 75, 256)
9  conv3_3: (?, 75, 75, 256)
10  pool3: (?, 38, 38, 256)
11  conv4_1: (?, 38, 38, 512)
12  conv4_2: (?, 38, 38, 512)
13  conv4_3: (?, 38, 38, 512)
14  pool4: (?, 19, 19, 512)
15  conv5_1: (?, 19, 19, 512)
16  conv5_2: (?, 19, 19, 512)
17  conv5_3: (?, 19, 19, 512)
18  pool5: (?, 19, 19, 512)
19  fc6: (?, 19, 19, 1024)
20  fc7: (?, 19, 19, 1024)
21  conv6_1: (?, 19, 19, 256)
22  conv6_2: (?, 10, 10, 512)
23  conv7_1: (?, 10, 10, 128)
24  conv7_2: (?, 5, 5, 256)
25  conv8_1: (?, 5, 5, 128)
26  conv8_2: (?, 3, 3, 256)
27  conv9_1: (?, 3, 3, 128)
28  conv9_2: (?, 1, 1, 256)
29  conv4_3_norm: (?, 38, 38, 512)
30  conv4_3_norm_mbox_loc: (?, 38, 38, 16)
31  conv4_3_norm_mbox_loc_flat: (?, ?)
32  conv4_3_norm_mbox_conf: (?, 38, 38, 84)
33  conv4_3_norm_mbox_conf_flat: (?, ?)
34  conv4_3_norm_mbox_priorbox: (?, 5776, 8)
35  fc7_mbox_loc: (?, 19, 19, 24)
36  fc7_mbox_loc_flat: (?, ?)
37  fc7_mbox_conf: (?, 19, 19, 126)
38  fc7_mbox_conf_flat: (?, ?)
39  fc7_mbox_priorbox: (?, 2166, 8)
40  conv6_2_mbox_loc: (?, 10, 10, 24)
41  conv6_2_mbox_loc_flat: (?, ?)
42  conv6_2_mbox_conf: (?, 10, 10, 126)
43  conv6_2_mbox_conf_flat: (?, ?)
44  conv6_2_mbox_priorbox: (?, 600, 8)
45  conv7_2_mbox_loc: (?, 5, 5, 24)
46  conv7_2_mbox_loc_flat: (?, ?)
47  conv7_2_mbox_conf: (?, 5, 5, 126)
48  conv7_2_mbox_conf_flat: (?, ?)
49  conv7_2_mbox_priorbox: (?, 150, 8)
50  conv8_2_mbox_loc: (?, 3, 3, 16)
51  conv8_2_mbox_loc_flat: (?, ?)
52  conv8_2_mbox_conf: (?, 3, 3, 84)
53  conv8_2_mbox_conf_flat: (?, ?)
54  conv8_2_mbox_priorbox: (?, 36, 8)
55  conv9_2_mbox_loc: (?, 1, 1, 16)
56  conv9_2_mbox_loc_flat: (?, ?)
57  conv9_2_mbox_conf: (?, 1, 1, 84)
58  conv9_2_mbox_conf_flat: (?, ?)
59  conv9_2_mbox_priorbox: (?, 4, 8)
60  mbox_loc: (?, 8732, 4)
61  mbox_conf: (?, 8732, 21)
62  mbox_priorbox: (?, 8732, 8)
63  predictions: (?, 8732, 33)
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 300, 300, 3)  0                                            
__________________________________________________________________________________________________
conv1_1 (Conv2D)                (None, 300, 300, 64) 1792        input_1[0][0]                    
__________________________________________________________________________________________________
conv1_2 (Conv2D)                (None, 300, 300, 64) 36928       conv1_1[0][0]                    
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (None, 150, 150, 64) 0           conv1_2[0][0]                    
__________________________________________________________________________________________________
conv2_1 (Conv2D)                (None, 150, 150, 128 73856       pool1[0][0]                      
__________________________________________________________________________________________________
conv2_2 (Conv2D)                (None, 150, 150, 128 147584      conv2_1[0][0]                    
__________________________________________________________________________________________________
pool2 (MaxPooling2D)            (None, 75, 75, 128)  0           conv2_2[0][0]                    
__________________________________________________________________________________________________
conv3_1 (Conv2D)                (None, 75, 75, 256)  295168      pool2[0][0]                      
__________________________________________________________________________________________________
conv3_2 (Conv2D)                (None, 75, 75, 256)  590080      conv3_1[0][0]                    
__________________________________________________________________________________________________
conv3_3 (Conv2D)                (None, 75, 75, 256)  590080      conv3_2[0][0]                    
__________________________________________________________________________________________________
pool3 (MaxPooling2D)            (None, 38, 38, 256)  0           conv3_3[0][0]                    
__________________________________________________________________________________________________
conv4_1 (Conv2D)                (None, 38, 38, 512)  1180160     pool3[0][0]                      
__________________________________________________________________________________________________
conv4_2 (Conv2D)                (None, 38, 38, 512)  2359808     conv4_1[0][0]                    
__________________________________________________________________________________________________
conv4_3 (Conv2D)                (None, 38, 38, 512)  2359808     conv4_2[0][0]                    
__________________________________________________________________________________________________
pool4 (MaxPooling2D)            (None, 19, 19, 512)  0           conv4_3[0][0]                    
__________________________________________________________________________________________________
conv5_1 (Conv2D)                (None, 19, 19, 512)  2359808     pool4[0][0]                      
__________________________________________________________________________________________________
conv5_2 (Conv2D)                (None, 19, 19, 512)  2359808     conv5_1[0][0]                    
__________________________________________________________________________________________________
conv5_3 (Conv2D)                (None, 19, 19, 512)  2359808     conv5_2[0][0]                    
__________________________________________________________________________________________________
pool5 (MaxPooling2D)            (None, 19, 19, 512)  0           conv5_3[0][0]                    
__________________________________________________________________________________________________
fc6 (Conv2D)                    (None, 19, 19, 1024) 4719616     pool5[0][0]                      
__________________________________________________________________________________________________
fc7 (Conv2D)                    (None, 19, 19, 1024) 1049600     fc6[0][0]                        
__________________________________________________________________________________________________
conv6_1 (Conv2D)                (None, 19, 19, 256)  262400      fc7[0][0]                        
__________________________________________________________________________________________________
conv6_padding (ZeroPadding2D)   (None, 21, 21, 256)  0           conv6_1[0][0]                    
__________________________________________________________________________________________________
conv6_2 (Conv2D)                (None, 10, 10, 512)  1180160     conv6_padding[0][0]              
__________________________________________________________________________________________________
conv7_1 (Conv2D)                (None, 10, 10, 128)  65664       conv6_2[0][0]                    
__________________________________________________________________________________________________
conv7_padding (ZeroPadding2D)   (None, 12, 12, 128)  0           conv7_1[0][0]                    
__________________________________________________________________________________________________
conv7_2 (Conv2D)                (None, 5, 5, 256)    295168      conv7_padding[0][0]              
__________________________________________________________________________________________________
conv8_1 (Conv2D)                (None, 5, 5, 128)    32896       conv7_2[0][0]                    
__________________________________________________________________________________________________
conv8_2 (Conv2D)                (None, 3, 3, 256)    295168      conv8_1[0][0]                    
__________________________________________________________________________________________________
conv9_1 (Conv2D)                (None, 3, 3, 128)    32896       conv8_2[0][0]                    
__________________________________________________________________________________________________
conv4_3_norm (Normalize)        (None, 38, 38, 512)  512         conv4_3[0][0]                    
__________________________________________________________________________________________________
conv9_2 (Conv2D)                (None, 1, 1, 256)    295168      conv9_1[0][0]                    
__________________________________________________________________________________________________
conv4_3_norm_mbox_conf (Conv2D) (None, 38, 38, 84)   387156      conv4_3_norm[0][0]               
__________________________________________________________________________________________________
fc7_mbox_conf (Conv2D)          (None, 19, 19, 126)  1161342     fc7[0][0]                        
__________________________________________________________________________________________________
conv6_2_mbox_conf (Conv2D)      (None, 10, 10, 126)  580734      conv6_2[0][0]                    
__________________________________________________________________________________________________
conv7_2_mbox_conf (Conv2D)      (None, 5, 5, 126)    290430      conv7_2[0][0]                    
__________________________________________________________________________________________________
conv8_2_mbox_conf (Conv2D)      (None, 3, 3, 84)     193620      conv8_2[0][0]                    
__________________________________________________________________________________________________
conv9_2_mbox_conf (Conv2D)      (None, 1, 1, 84)     193620      conv9_2[0][0]                    
__________________________________________________________________________________________________
conv4_3_norm_mbox_loc (Conv2D)  (None, 38, 38, 16)   73744       conv4_3_norm[0][0]               
__________________________________________________________________________________________________
fc7_mbox_loc (Conv2D)           (None, 19, 19, 24)   221208      fc7[0][0]                        
__________________________________________________________________________________________________
conv6_2_mbox_loc (Conv2D)       (None, 10, 10, 24)   110616      conv6_2[0][0]                    
__________________________________________________________________________________________________
conv7_2_mbox_loc (Conv2D)       (None, 5, 5, 24)     55320       conv7_2[0][0]                    
__________________________________________________________________________________________________
conv8_2_mbox_loc (Conv2D)       (None, 3, 3, 16)     36880       conv8_2[0][0]                    
__________________________________________________________________________________________________
conv9_2_mbox_loc (Conv2D)       (None, 1, 1, 16)     36880       conv9_2[0][0]                    
__________________________________________________________________________________________________
conv4_3_norm_mbox_conf_flat (Fl (None, 121296)       0           conv4_3_norm_mbox_conf[0][0]     
__________________________________________________________________________________________________
fc7_mbox_conf_flat (Flatten)    (None, 45486)        0           fc7_mbox_conf[0][0]              
__________________________________________________________________________________________________
conv6_2_mbox_conf_flat (Flatten (None, 12600)        0           conv6_2_mbox_conf[0][0]          
__________________________________________________________________________________________________
conv7_2_mbox_conf_flat (Flatten (None, 3150)         0           conv7_2_mbox_conf[0][0]          
__________________________________________________________________________________________________
conv8_2_mbox_conf_flat (Flatten (None, 756)          0           conv8_2_mbox_conf[0][0]          
__________________________________________________________________________________________________
conv9_2_mbox_conf_flat (Flatten (None, 84)           0           conv9_2_mbox_conf[0][0]          
__________________________________________________________________________________________________
conv4_3_norm_mbox_loc_flat (Fla (None, 23104)        0           conv4_3_norm_mbox_loc[0][0]      
__________________________________________________________________________________________________
fc7_mbox_loc_flat (Flatten)     (None, 8664)         0           fc7_mbox_loc[0][0]               
__________________________________________________________________________________________________
conv6_2_mbox_loc_flat (Flatten) (None, 2400)         0           conv6_2_mbox_loc[0][0]           
__________________________________________________________________________________________________
conv7_2_mbox_loc_flat (Flatten) (None, 600)          0           conv7_2_mbox_loc[0][0]           
__________________________________________________________________________________________________
conv8_2_mbox_loc_flat (Flatten) (None, 144)          0           conv8_2_mbox_loc[0][0]           
__________________________________________________________________________________________________
conv9_2_mbox_loc_flat (Flatten) (None, 16)           0           conv9_2_mbox_loc[0][0]           
__________________________________________________________________________________________________
mbox_conf (Concatenate)         (None, 183372)       0           conv4_3_norm_mbox_conf_flat[0][0]
                                                                 fc7_mbox_conf_flat[0][0]         
                                                                 conv6_2_mbox_conf_flat[0][0]     
                                                                 conv7_2_mbox_conf_flat[0][0]     
                                                                 conv8_2_mbox_conf_flat[0][0]     
                                                                 conv9_2_mbox_conf_flat[0][0]     
__________________________________________________________________________________________________
mbox_loc (Concatenate)          (None, 34928)        0           conv4_3_norm_mbox_loc_flat[0][0] 
                                                                 fc7_mbox_loc_flat[0][0]          
                                                                 conv6_2_mbox_loc_flat[0][0]      
                                                                 conv7_2_mbox_loc_flat[0][0]      
                                                                 conv8_2_mbox_loc_flat[0][0]      
                                                                 conv9_2_mbox_loc_flat[0][0]      
__________________________________________________________________________________________________
mbox_conf_logits (Reshape)      (None, 8732, 21)     0           mbox_conf[0][0]                  
__________________________________________________________________________________________________
conv4_3_norm_mbox_priorbox (Pri (None, 5776, 8)      0           conv4_3_norm[0][0]               
__________________________________________________________________________________________________
fc7_mbox_priorbox (PriorBox)    (None, 2166, 8)      0           fc7[0][0]                        
__________________________________________________________________________________________________
conv6_2_mbox_priorbox (PriorBox (None, 600, 8)       0           conv6_2[0][0]                    
__________________________________________________________________________________________________
conv7_2_mbox_priorbox (PriorBox (None, 150, 8)       0           conv7_2[0][0]                    
__________________________________________________________________________________________________
conv8_2_mbox_priorbox (PriorBox (None, 36, 8)        0           conv8_2[0][0]                    
__________________________________________________________________________________________________
conv9_2_mbox_priorbox (PriorBox (None, 4, 8)         0           conv9_2[0][0]                    
__________________________________________________________________________________________________
mbox_loc_final (Reshape)        (None, 8732, 4)      0           mbox_loc[0][0]                   
__________________________________________________________________________________________________
mbox_conf_final (Activation)    (None, 8732, 21)     0           mbox_conf_logits[0][0]           
__________________________________________________________________________________________________
mbox_priorbox (Concatenate)     (None, 8732, 8)      0           conv4_3_norm_mbox_priorbox[0][0] 
                                                                 fc7_mbox_priorbox[0][0]          
                                                                 conv6_2_mbox_priorbox[0][0]      
                                                                 conv7_2_mbox_priorbox[0][0]      
                                                                 conv8_2_mbox_priorbox[0][0]      
                                                                 conv9_2_mbox_priorbox[0][0]      
__________________________________________________________________________________________________
predictions (Concatenate)       (None, 8732, 33)     0           mbox_loc_final[0][0]             
                                                                 mbox_conf_final[0][0]            
                                                                 mbox_priorbox[0][0]              
==================================================================================================
Total params: 26,285,486
Trainable params: 26,285,486
Non-trainable params: 0
__________________________________________________________________________________________________

Process finished with exit code 0

'''

2.2 对特征层分别分类、回归、生成先验框

2.2.1 对特征层分别分类、回归、生成先验框

  1. 生成框的偏移
  2. 生成类别
  3. 生成先验框

2.2.2 对特征层分别分类和回归代码

回归:生成(num_priors * 4)个通道。

# 对conv9_2进行处理
num_priors = 4
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv9_2_mbox_loc')(net['conv9_2'])
net['conv9_2_mbox_loc'] = x
net['conv9_2_mbox_loc_flat'] = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])

分类:生成(num_priors * num_classes)个通道。

# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv9_2_mbox_conf')(net['conv9_2'])
net['conv9_2_mbox_conf'] = x
net['conv9_2_mbox_conf_flat'] = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])

先验框:

  1. 生成先验框的宽高
  2. 生成先验框的中心点
  3. 获得先验框的左上角和右下角
priorbox = PriorBox(img_size, 264.0, max_size=315.0, aspect_ratios=[2],
                        variances=[0.1, 0.1, 0.2, 0.2],
                        name='conv9_2_mbox_priorbox')
net['conv9_2_mbox_priorbox'] = priorbox(net['conv9_2'])

# PriorBox(img_size, min_size, max_size=None, aspect_ratios=None,
                 flip=True, variances=[0.1], clip=True, **kwargs) 详解


    def call(self, x, mask=None):
        if hasattr(x, '_keras_shape'):
            input_shape = x._keras_shape
        elif hasattr(K, 'int_shape'):
            input_shape = K.int_shape(x)
        # ------------------ #
        #   获取宽和高
        # ------------------ #
        layer_width = input_shape[self.waxis]
        layer_height = input_shape[self.haxis]

        img_width = self.img_size[0]
        img_height = self.img_size[1]
        box_widths = []
        box_heights = []
        for ar in self.aspect_ratios:
            if ar == 1 and len(box_widths) == 0:
                box_widths.append(self.min_size)
                box_heights.append(self.min_size)
            elif ar == 1 and len(box_widths) > 0:
                box_widths.append(np.sqrt(self.min_size * self.max_size))
                box_heights.append(np.sqrt(self.min_size * self.max_size))
            elif ar != 1:
                box_widths.append(self.min_size * np.sqrt(ar))
                box_heights.append(self.min_size / np.sqrt(ar))
        box_widths = 0.5 * np.array(box_widths)
        box_heights = 0.5 * np.array(box_heights)
        step_x = img_width / layer_width
        step_y = img_height / layer_height
        linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
                           layer_width)
        liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
                           layer_height)
        centers_x, centers_y = np.meshgrid(linx, liny)
        centers_x = centers_x.reshape(-1, 1)
        centers_y = centers_y.reshape(-1, 1)

        num_priors_ = len(self.aspect_ratios)
        # 每一个先验框需要两个(centers_x, centers_y),前一个用来计算左上角,后一个计算右下角
        prior_boxes = np.concatenate((centers_x, centers_y), axis=1)
        prior_boxes = np.tile(prior_boxes, (1, 2 * num_priors_))
        
        # 获得先验框的左上角和右下角
        prior_boxes[:, ::4] -= box_widths
        prior_boxes[:, 1::4] -= box_heights
        prior_boxes[:, 2::4] += box_widths
        prior_boxes[:, 3::4] += box_heights

        # 变成小数的形式
        prior_boxes[:, ::2] /= img_width
        prior_boxes[:, 1::2] /= img_height
        prior_boxes = prior_boxes.reshape(-1, 4)

        prior_boxes = np.minimum(np.maximum(prior_boxes, 0.0), 1.0)

        num_boxes = len(prior_boxes)
        
        if len(self.variances) == 1:
            variances = np.ones((num_boxes, 4)) * self.variances[0]
        elif len(self.variances) == 4:
            variances = np.tile(self.variances, (num_boxes, 1))
        else:
            raise Exception('Must provide one or four variances.')

        prior_boxes = np.concatenate((prior_boxes, variances), axis=1)
        prior_boxes_tensor = K.expand_dims(K.variable(prior_boxes), 0)
    
        pattern = [tf.shape(x)[0], 1, 1]
        prior_boxes_tensor = tf.tile(prior_boxes_tensor, pattern)

        return prior_boxes_tensor

3 制作标签

y = bbox_util.assign_boxes(y) [box_Num,4+cls]–> [8732,4+1+cls+8]

3.1 流程

一张图片上有多个真实框,对每一个真实框找一个先验框预测

3.2 代码步骤

  1. 对每一个真实框根据IoU编码,得到encoded_box[box_num,8732,4+1(iou)]
  2. 对encoded_box筛选。先对0坐标轴根据IoU取最大值及对应的下标;再上一步的结果选取IoU>0的先验框及下标;然后根据上一步的得到每个真实框对应的先验框。OK,完结撒花!

3.3 代码

    def assign_boxes(self, boxes): # boxes.shape [-1,框+类别] 筛选框,得到 y_true  ;  y = self.bbox_util.assign_boxes(y)
        assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8))  # assignment.shape (8732, 33) y.shape=(7, 24)
        assignment[:, 4] = 1.0  # 背景的概率
        if len(boxes) == 0:
            return assignment
        # 对每一个真实框都进行iou计算 encoded_boxes.shape = (7, 43660) 7 是图片有7个框,43660 = 8732*5
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])  # 找到框并编码[ num_priors , 4 + 1 ]
        # 每一个真实框的编码后的值,和iou encoded_boxes.shape = (7, 43660)
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5) # encoded_boxes.shape = (7, 8732, 5)
        
        # 一个先验框可以匹配多个真实框,但是一个先验框只能拟合一个真实框,所以找出先验框最匹配的真实框,
        best_iou = encoded_boxes[:, :, -1].max(axis=0)         # encoded_boxes[:, :, -1].shape :(7, 8732) # best_iou .shape = (8732,)
        best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)  # 取每个先验框对应iou最大的值 , (8732,) 每个先验框对应真实框的坐标
        best_iou_mask = best_iou > 0                           # 取iou大于零的框的小标
        best_iou_idx = best_iou_idx[best_iou_mask]             # 取iou大于零的框 ; best_iou_idx.shape =  (64,)

        assign_num = len(best_iou_idx)                         # 可以用来预测先验框的个数 ;  assign_num = 64
        # 保留重合程度最大的先验框的应该有的预测结果
        encoded_boxes = encoded_boxes[:, best_iou_mask, :]     # encoded_boxes.shape = (7, 64, 5)
        assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx , np.arange(assign_num),:4] # 偏移
        # 4代表为背景的概率,为0
        assignment[:, 4][best_iou_mask] = 0                    # 背 景
        assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:]    # 类 别
        assignment[:, -8][best_iou_mask] = 1                   # 代表有物体?为什么有8,因为y_pre也有8吗?
        # 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的
        return assignment    #  assignment.shape = (8732, 33)  33 = 4 + 21 + 8

4 预测

4.1 预测流程

输入图片地址、检测、显示.

4.1.1 预测代码步骤

  1. 实例化主干网络
  2. 读入图片
  3. 检测图片的目标并显示

4.1.2 预测代码

predict.py

from ssd import SSD
from PIL import Image

ssd = SSD()
while True:
    img = input('Picture path:')
    try:
        image = Image.open(img)
    except:
        print('Open Error! Try again!')
        continue
    else:
        r_image = ssd.detect_image(image)
         r_image.show()
ssd.close_sesstion()

4.2 检测目标

r_image = ssd.detect_image(image)

4.2.1 检测步骤

  1. 图片加入灰条,使得所有输入图片大小一样。
  2. 图片预处理,归一化、预测
  3. 将预测结果进行解码–>筛选–>nms–>选出top_k
  4. 筛选出其中得分高于confidence的框 ,results[label,conf,det_xmin, det_ymin, det_xmax, det_ymax]
  5. 去掉灰条
  6. 画出物体的框并表明心迹类别

4.2.2 检测代码

def detect_image(self, image):
        image_shape = np.array(np.shape(image)[0:2])  # 图片尺寸
        crop_img, x_offset, y_offset = letterbox_image(image, (self.model_image_size[0], self.model_image_size[1]))  # 加入灰条
        photo = np.array(crop_img, dtype=np.float64) # photo.shape =(300,300,3)

        # 图片预处理,归一化、预测
        photo = preprocess_input(np.reshape(photo, [1, self.model_image_size[0], self.model_image_size[1], 3]))
        preds = self.ssd_model.predict(photo)  # predictions(Concatenate)(None, 8732, 33) 4+21+8=33

        # 将预测结果进行解码-->筛选-->nms-->选出top_k
        results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)

        if len(results[0]) <= 0:
            return image

        # 筛选出其中得分高于confidence的框 ,results[label,conf,det_xmin, det_ymin, det_xmax, det_ymax]
        det_label = results[0][:, 0]
        det_conf = results[0][:, 1]
        det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
        top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
        top_conf = det_conf[top_indices]
        top_label_indices = det_label[top_indices].tolist()
        top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices], -1), np.expand_dims(
            det_ymin[top_indices], -1), np.expand_dims(det_xmax[top_indices], -1), np.expand_dims(det_ymax[top_indices],
                                                                                                  -1)

        # 去掉灰条
        boxes = ssd_correct_boxes(top_ymin, top_xmin, top_ymax, top_xmax,  # [200,4]
                                  np.array([self.model_image_size[0], self.model_image_size[1]]), image_shape)

        font = ImageFont.truetype(font='model_data/simhei.ttf',
                                  size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))

        thickness = (np.shape(image)[0] + np.shape(image)[1]) // self.model_image_size[0]

        for i, c in enumerate(top_label_indices):  #  [2.0, 15.0, 15.0, 15.0, 7.0]
            predicted_class = self.class_names[int(c) - 1]
            score = top_conf[i]

            top, left, bottom, right = boxes[i]  # np.shape(image)=(1330, 1330, 3)
            top = top - 5  #
            left = left - 5
            bottom = bottom + 5
            right = right + 5

            top = max(0, np.floor(top + 0.5).astype('int32'))     # 向上取整,让框在image之内
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
            right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))

            # 画框框
            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)
            label = label.encode('utf-8')
            print(label)

            # label在框中的位置
            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])  # xy
            else:
                text_origin = np.array([left, top + 1])

            for i in range(thickness):
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=self.colors[int(c) - 1])  # 画框
            draw.rectangle(
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=self.colors[int(c) - 1])    # 画label
            draw.text(text_origin, str(label, 'UTF-8'), fill=(0, 0, 0), font=font) # 写文字
            del draw
        return image

4.2.3 图像加入灰条

通过图像加入灰条,让所有输入图像统一到一样的尺寸,又保持原有宽高比例而不失真。

  1. 计算原图片尺寸和目标尺寸的比例
  2. 选取最小的比例对原图变形
  3. 把变换后的图片粘贴在灰度图上
'''
crop_img, x_offset, y_offset = letterbox_image(image, (self.model_image_size[0], self.model_image_size[1]))
'''
import numpy as np
from PIL import Image

def letterbox_image(image, size):
    ih, iw = image.size
    h, w = size
    scale = min(h/ih,w/iw)
    nw = int(iw*scale)
    nh = int(ih*scale)
    
    image = image.resize((nh,nw),Image.BICUBIC)
    new_img = Image.new('RGB',size,(128,128,128))
    new_img.paste(image,((h-nh)//2,(w-nw)//2))
    x_offset,y_offset = (w-nw)//2/300,(h-nh)//2/300
    
    return new_img,x_offset,y_offset 
    
if __name__ == '__main__':
    image = np.random.randint(0,256,[10,10],dtype=np.uint8)
    new_img,x_offset,y_offset = letterbox_image(image, size=[15,15])
    print(new_img.shape)
    print(x_offset,y_offset)

4.2.4 将预测结果进行解码–>筛选–>nms–>选出top_k

步骤:

  1. 解码
  2. 筛选
  3. nms
  4. 按照置信度进行排序,选出top_k
# results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)
    def detection_out(self, predictions, background_label_id=0, keep_top_k=200,
                      confidence_threshold=0.5):
        # 网络预测的结果  [4+1+20+4+4] 预测偏移 + 置信度 + 类别 + 先验框 + variance[0.1,0.1,0.2,0.2]
        mbox_loc = predictions[:, :, :4]          # (1, 8732, 4)
        # 0.1,0.1,0.2,0.2
        variances = predictions[:, :, -4:]        # (1, 8732, 4)
        # 先验框
        mbox_priorbox = predictions[:, :, -8:-4]  # (1, 8732, 4)
        # 置信度
        mbox_conf = predictions[:, :, 4:-8]       # (1, 8732, 21)
        results = []
        # 处理每张图片
        for i in range(len(mbox_loc)):
            results.append([])
            ### 1. 解码
            decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox[i],  variances[i]) # 解码 decode_bbox.shape: (8732, 4)

            for c in range(self.num_classes):
                if c == background_label_id: # index为零的是背景
                    continue
                c_confs = mbox_conf[i, :, c]
                c_confs_m = c_confs > confidence_threshold
                if len(c_confs[c_confs_m]) > 0:
                    # 取出得分高于confidence_threshold的框
                    boxes_to_process = decode_bbox[c_confs_m]
                    confs_to_process = c_confs[c_confs_m]
                    # 进行iou的非极大抑制
                    feed_dict = {self.boxes: boxes_to_process,
                                 self.scores: confs_to_process}
                    idx = self.sess.run(self.nms, feed_dict=feed_dict)
                    # 取出在非极大抑制中效果较好的框
                    good_boxes = boxes_to_process[idx]
                    confs = confs_to_process[idx][:, None]  # 变成行列的形式
                    # 将label、置信度、框的位置进行堆叠。
                    labels = c * np.ones((len(idx), 1))  # c 是类别对应的数字
                    c_pred = np.concatenate((labels, confs, good_boxes),
                                            axis=1)
                    # 添加进result里
                    results[-1].extend(c_pred)
            if len(results[-1]) > 0:
                # 按照置信度进行排序
                results[-1] = np.array(results[-1])
                argsort = np.argsort(results[-1][:, 1])[::-1]   # 按照概率排序
                results[-1] = results[-1][argsort]
                # 选出置信度最大的keep_top_k个
                results[-1] = results[-1][:keep_top_k]
        return results


(1) 解码:

  1. 获得先验框的宽与高、中心点
  2. 预测框中心点、宽与高
  3. 预测框的左上角与右下角
'''
解码
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox[i],  variances[i]) # 解码 decode_bbox.shape: (8732, 4)
'''
    def decode_boxes(self, mbox_loc, mbox_priorbox, variances):
        # 1 获得先验框的宽与高  x1,y1,x2,y2 --> cx,cy,h,w
        prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]
        prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]
        # 获得先验框的中心点
        prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])
        prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])

        # 2 预测框距离先验框中心的xy轴偏移情况
        decode_bbox_center_x = mbox_loc[:, 0] * prior_width * variances[:, 0]
        decode_bbox_center_x += prior_center_x
        decode_bbox_center_y = mbox_loc[:, 1] * prior_height * variances[:, 1]
        decode_bbox_center_y += prior_center_y
        
        # 预测框的宽与高的求取
        decode_bbox_width = np.exp(mbox_loc[:, 2] * variances[:, 2])
        decode_bbox_width *= prior_width
        decode_bbox_height = np.exp(mbox_loc[:, 3] * variances[:, 3])
        decode_bbox_height *= prior_height

        # 3 获取预测框的左上角与右下角
        decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
        decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
        decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
        decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height

        # 预测框的左上角与右下角进行堆叠
        decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
                                      decode_bbox_ymin[:, None],
                                      decode_bbox_xmax[:, None],
                                      decode_bbox_ymax[:, None]), axis=-1)
        # 防止超出0与1
        decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
        return decode_bbox

(2) 筛选

'''
c_confs_m = c_confs > confidence_threshold
                if len(c_confs[c_confs_m]) > 0:
                    # 取出得分高于confidence_threshold的框
                    boxes_to_process = decode_bbox[c_confs_m]
                    confs_to_process = c_confs[c_confs_m]
'''

(3) nms

'''
feed_dict = {self.boxes: boxes_to_process,
                                 self.scores: confs_to_process}
                    idx = self.sess.run(self.nms, feed_dict=feed_dict)
'''

(4) 按照置信度进行排序,选出top_k

'''
                    results[-1].extend(c_pred)
            if len(results[-1]) > 0:
                # 按照置信度进行排序
                results[-1] = np.array(results[-1])
                argsort = np.argsort(results[-1][:, 1])[::-1]   # 按照概率排序
                results[-1] = results[-1][argsort]
                # 选出置信度最大的keep_top_k个
                results[-1] = results[-1][:keep_top_k]
'''

4.2.5 筛选出其中得分高于confidence的框

        det_label = results[0][:, 0]
        det_conf = results[0][:, 1]
        det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
        top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
        top_conf = det_conf[top_indices]
        top_label_indices = det_label[top_indices].tolist()
        top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices], -1), np.expand_dims(
            det_ymin[top_indices], -1), np.expand_dims(det_xmax[top_indices], -1), np.expand_dims(det_ymax[top_indices], -1)

4.2.6 去掉灰条

  1. 计算offset、scale、框的左上角和右下角坐标转换为中心点和宽高
  2. 带入公式box_yx = (box_yx - offset) * scale、box_hw *= scale
  3. 框的中心点和宽高转换为左上角和右下角坐标
  4. 把框映射到原图
'''
 boxes = ssd_correct_boxes(top_ymin, top_xmin, top_ymax, top_xmax,  # [200,4]
                                  np.array([self.model_image_size[0], self.model_image_size[1]]), image_shape)
'''
def ssd_correct_boxes(top, left, bottom, right, input_shape, image_shape):
    new_shape = image_shape*np.min(input_shape/image_shape)

    offset = (input_shape-new_shape)/2./input_shape
    scale = input_shape/new_shape

    box_yx = np.concatenate(((top+bottom)/2,(left+right)/2),axis=-1)
    box_hw = np.concatenate((bottom-top,right-left),axis=-1)

    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes =  np.concatenate([
        box_mins[:, 0:1],
        box_mins[:, 1:2],
        box_maxes[:, 0:1],
        box_maxes[:, 1:2]
    ],axis=-1)
    print(np.shape(boxes))
    boxes *= np.concatenate([image_shape, image_shape],axis=-1)
    return boxes

4.2.7 画框

for i, c in enumerate(top_label_indices):  #  [2.0, 15.0, 15.0, 15.0, 7.0]
            predicted_class = self.class_names[int(c) - 1]
            score = top_conf[i]

            top, left, bottom, right = boxes[i]  # np.shape(image)=(1330, 1330, 3)
            top = top - 5  #
            left = left - 5
            bottom = bottom + 5
            right = right + 5

            top = max(0, np.floor(top + 0.5).astype('int32'))     # 向上取整,让框在image之内
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
            right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))

            # 画框框
            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)
            label = label.encode('utf-8')
            print(label)

            # label在框中的位置
            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])  # xy
            else:
                text_origin = np.array([left, top + 1])

            for i in range(thickness):
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=self.colors[int(c) - 1])  # 画框
            draw.rectangle(
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=self.colors[int(c) - 1])    # 画label
            draw.text(text_origin, str(label, 'UTF-8'), fill=(0, 0, 0), font=font) # 写文字
            del draw
        return image

5 训练

5.1 流程

  1. 生成标签。y = bbox_util.assign_boxes(y) # y[ 框 + cls ]
  2. 加载模型。model = SSD300(input_shape, num_classes=NUM_CLASSES)
  3. 设置训练参数。logging + checkpoint + reduce_lr + early_stopping
  4. 数据生成器。gen.generate(True)
  5. 训练。model.fit_generator() + loss = MultiboxLoss()

5.1.1 bbox_util代码步骤

参考## 3 制作标签

5.1.2 loss代码步骤

  1. SmothL1损失
  2. 交叉熵损失
  3. 正样本的回归损失和分类损失

5.1.3 loss 代码

    def compute_loss(self, y_true, y_pred):
        batch_size = tf.shape(y_true)[0]             # 输入图片的数量
        num_boxes = tf.to_float(tf.shape(y_true)[1]) # 每个图片先验框的数量  8732

        # 计算所有的loss
        # 分类的loss
        # batch_size,8732,4(gt)+1(bg)+21(cls)+4(anchor)+4(variance) -> batch_size,8732
        conf_loss = self._softmax_loss(y_true[:, :, 4:-8],
                                       y_pred[:, :, 4:-8])
        # 框的位置的loss
        # batch_size,8732,4 -> batch_size,8732
        loc_loss = self._l1_smooth_loss(y_true[:, :, :4],
                                        y_pred[:, :, :4])

        # 获取所有的正标签的loss
        # 每一张图的pos的个数 y_true.shape[6,8,1]; num_pos = array([5., 3., 1., 2., 2., 4.])
        num_pos = tf.reduce_sum(y_true[:, :, -8], axis=-1)  # 64  [batch_size,64]
        # 每一张图的pos_loc_loss
        pos_loc_loss = tf.reduce_sum(loc_loss * y_true[:, :, -8],
                                     axis=1)
        # 每一张图的pos_conf_loss
        pos_conf_loss = tf.reduce_sum(conf_loss * y_true[:, :, -8],
                                      axis=1)

        # 获取每张图片有的负样本数 一定的负样本  neg_pos_ratio * num_pos = 192.0  [batch_size,64]
        num_neg = tf.minimum(self.neg_pos_ratio * num_pos, #  num_boxes - num_pos =8668.0 = 8732.0-64.0
                             num_boxes - num_pos)          # num_boxes = tf.to_float(tf.shape(y_true)[1])

        # 找到了哪些值是大于0的 ; array([ True,  True,  True,  True,  True,  True])
        pos_num_neg_mask = tf.greater(num_neg, 0)    # 判断哪些图片有负样本      # return boolean : True
        # 获得一个1.0,判断哪些值大于零
        has_min = tf.to_float(tf.reduce_any(pos_num_neg_mask))   # has_min = 1.0
        num_neg = tf.concat( axis=0,values=[num_neg,  # 如果不存在负样本,就设置负样本的数量
                                [(1 - has_min) * self.negatives_for_hard]])  # array([192,   0])
        # 求平均每个图片要取多少个负样本
        num_neg_batch = tf.reduce_mean(tf.boolean_mask(num_neg,  # num_neg_batch = 192
                                                      tf.greater(num_neg, 0)))  # tf.greater(num_neg, 0)=array([ True, False])
        num_neg_batch = tf.to_int32(num_neg_batch)

        # conf的起始[5:-8]
        confs_start = 4 + self.background_label_id + 1  # confs_start = 5
        # conf的结束
        confs_end = confs_start + self.num_classes - 1  # confs_end = 25

        # 找到实际上在该位置不应该有预测结果的框,求他们最大的置信度。取top_k个置信度,作为负样本
        max_confs = tf.reduce_max(y_pred[:, :, confs_start:confs_end],
                                  axis=2)
        _, indices = tf.nn.top_k(max_confs * (1 - y_true[:, :, -8]), # indices.shape=(?, 192)
                                 k=num_neg_batch)  # num_neg_batch = 192 ; indices.shape=(?, 192)

        # 找到其在1维上的索引  ???  ??? batch_size = 1
        batch_idx = tf.expand_dims(tf.range(0, batch_size), 1) # batch_idx.shape = (32, 1)
        batch_idx = tf.tile(batch_idx, (1, num_neg_batch))     # batch_idx.shape = (32, 33)
        full_indices = (tf.reshape(batch_idx, [-1]) * tf.to_int32(num_boxes) +  # num_boxes=8732
                        tf.reshape(indices, [-1]))   # (8732, 33)

        # full_indices = tf.concat(2, [tf.expand_dims(batch_idx, 2),
        #                              tf.expand_dims(indices, 2)])
        # neg_conf_loss = tf.gather_nd(conf_loss, full_indices)
        neg_conf_loss = tf.gather(tf.reshape(conf_loss, [-1]),
                                  full_indices)
        neg_conf_loss = tf.reshape(neg_conf_loss,
                                   [batch_size, num_neg_batch])
        neg_conf_loss = tf.reduce_sum(neg_conf_loss, axis=1)

        # loss is sum of positives and negatives

        num_pos = tf.where(tf.not_equal(num_pos, 0), num_pos,     # num_pos = 64
                            tf.ones_like(num_pos))
        total_loss = tf.reduce_sum(pos_conf_loss) + tf.reduce_sum(neg_conf_loss)
        total_loss /= tf.reduce_sum(num_pos)
        total_loss += tf.reduce_sum(self.alpha * pos_loc_loss) / tf.reduce_sum(num_pos)

        return total_loss

5.2.1 gen.generate(True)代码步骤

  1. 数据增强
  2. 先验框与真实框配对

5.2.2 gen.generate(True)代码

def generate(self, train=True):
        while True:
            if train:
                # 打乱数据顺序
                shuffle(self.train_lines)
                lines = self.train_lines
            else:
                shuffle(self.val_lines)
                lines = self.val_lines
            inputs = []
            targets = []
            for annotation_line in lines:  # img,y =  image_data, box_data
                img,y=self.get_random_data(annotation_line,self.image_size[0:2]) # y.shap3[4+cls],数据增强操作
                if len(y)!=0:
                    boxes = np.array(y[:,:4],dtype=np.float32)
                    boxes[:,0] = boxes[:,0]/self.image_size[1]
                    boxes[:,1] = boxes[:,1]/self.image_size[0]
                    boxes[:,2] = boxes[:,2]/self.image_size[1]
                    boxes[:,3] = boxes[:,3]/self.image_size[0]
                    one_hot_label = np.eye(self.num_classes)[np.array(y[:,4],np.int32)]  # 不包含背景
                    if ((boxes[:,3]-boxes[:,1])<=0).any() and ((boxes[:,2]-boxes[:,0])<=0).any():
                        continue

                    y = np.concatenate([boxes,one_hot_label],axis=-1) # y[4+20(one_hot)]
                # 编码操作 y[ 框 + cls ]
                y = self.bbox_util.assign_boxes(y)    # 先根据IoU找出框,编码,再找出最符合条件的框, 制作成标签y_true[4+1+cls+8]
                inputs.append(img)
                targets.append(y)
                if len(targets) == self.batch_size:   # 如果有数据
                    tmp_inp = np.array(inputs)
                    tmp_targets = np.array(targets)
                    inputs = []
                    targets = []
                    yield preprocess_input(tmp_inp), tmp_targets

6 模型评估

6.1 流程

  1. 获得预测框
  2. 获得真实框
  3. 计算m_AP

6.1.1 获得预测框代码步骤

  1. 图片预处理,归一化
  2. 将预测结果进行解码
  3. 筛选出其中得分高于confidence的框
  4. 去掉灰条

6.1.2 获得真实框代码步骤

  1. 读入真实框的数据

6.1.3 计算m_AP代码步骤

  1. 根据IoU计算precision和recall
  2. 计算P-R曲线的面积

6.1.4 get_dr_txt.py 代码

#----------------------------------------------------#

#----------------------------------------------------#
from keras.layers import Input
from ssd import SSD
from PIL import Image
from keras.applications.imagenet_utils import preprocess_input
from utils_.utils import BBoxUtility,letterbox_image,ssd_correct_boxes
import numpy as np
import os
class mAP_SSD(SSD):
    #---------------------------------------------------#
    #   检测图片
    #---------------------------------------------------#
    def detect_image(self,image_id,image):
        self.confidence = 0.05
        f = open("./input/detection-results/"+image_id+".txt","w") 
        image_shape = np.array(np.shape(image)[0:2])
        crop_img,x_offset,y_offset = letterbox_image(image, (self.model_image_size[0],self.model_image_size[1]))
        photo = np.array(crop_img,dtype = np.float64)

        # 图片预处理,归一化
        photo = preprocess_input(np.reshape(photo,[1,self.model_image_size[0],self.model_image_size[1],3]))
        preds = self.ssd_model.predict(photo)

        # 将预测结果进行解码
        results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)
        
        if len(results[0])<=0:
            f.close()
            return

        # 筛选出其中得分高于confidence的框
        det_label = results[0][:, 0]
        det_conf = results[0][:, 1]
        det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
        top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
        top_conf = det_conf[top_indices]
        top_label_indices = det_label[top_indices].tolist()
        top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices],-1),np.expand_dims(det_ymin[top_indices],-1),np.expand_dims(det_xmax[top_indices],-1),np.expand_dims(det_ymax[top_indices],-1)
        
        # 去掉灰条
        boxes = ssd_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([self.model_image_size[0],self.model_image_size[1]]),image_shape)


        for i, c in enumerate(top_label_indices):
            predicted_class = self.class_names[int(c)-1]
            score = str(top_conf[i])

            top, left, bottom, right = boxes[i]
            f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))

        f.close()
        return 

ssd = mAP_SSD()
image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/train.txt').read().strip().split() # image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/test.txt').read().strip().split()

if not os.path.exists("./input"):
    os.makedirs("./input")
if not os.path.exists("./input/detection-results"):
    os.makedirs("./input/detection-results")
if not os.path.exists("./input/images-optional"):
    os.makedirs("./input/images-optional")


for image_id in image_ids:
    image_path = "./VOCdevkit/VOC2007/JPEGImages/"+image_id+".jpg"
    image = Image.open(image_path)
    image.save("./input/images-optional/"+image_id+".jpg")
    ssd.detect_image(image_id,image)
    print(image_id," done!")
    

print("Conversion completed!")

6.1.5 get_gt_txt.py 代码

#----------------------------------------------------#

#----------------------------------------------------#
import sys
import os
import glob
import xml.etree.ElementTree as ET

image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/train.txt').read().strip().split() #image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/test.txt').read().strip().split()

if not os.path.exists("./input"):
    os.makedirs("./input")
if not os.path.exists("./input/ground-truth"):
    os.makedirs("./input/ground-truth")

for image_id in image_ids:
    with open("./input/ground-truth/"+image_id+".txt", "w") as new_f:
        root = ET.parse("VOCdevkit/VOC2007/Annotations/"+image_id+".xml").getroot()
        for obj in root.findall('object'):
            if obj.find('difficult')!=None:
                difficult = obj.find('difficult').text
                if int(difficult)==1:
                    continue
            obj_name = obj.find('name').text
            bndbox = obj.find('bndbox')
            left = bndbox.find('xmin').text
            top = bndbox.find('ymin').text
            right = bndbox.find('xmax').text
            bottom = bndbox.find('ymax').text
            new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
            
print("Conversion completed!")

6.1.6 get_map 代码

import glob
import json
import os
import shutil
import operator
import sys
import argparse
import math

import numpy as np
#----------------------------------------------------#
#   用于计算mAP
#   代码克隆自https://github.com/Cartucho/mAP
#----------------------------------------------------#
MINOVERLAP = 0.5  # default value (defined in the PASCAL VOC2012 challenge)

parser = argparse.ArgumentParser()
parser.add_argument('-na', '--no-animation', help="no animation is shown.", action="store_true")
parser.add_argument('-np', '--no-plot', help="no plot is shown.", action="store_true")
parser.add_argument('-q', '--quiet', help="minimalistic console output.", action="store_true")
# argparse receiving list of classes to be ignored
parser.add_argument('-i', '--ignore', nargs='+', type=str, help="ignore a list of classes.")
# argparse receiving list of classes with specific IoU (e.g., python main.py --set-class-iou person 0.7)
parser.add_argument('--set-class-iou', nargs='+', type=str, help="set IoU for a specific class.")
args = parser.parse_args()

'''
    0,0 ------> x (width)
     |
     |  (Left,Top)(x1,y1)
     |      *_________
     |      |         |
            |         |
     y      |_________|
  (height)            *
                (Right,Bottom)(x2,y2)
'''

# if there are no classes to ignore then replace None by empty list
if args.ignore is None:
    args.ignore = []

specific_iou_flagged = False
if args.set_class_iou is not None:
    specific_iou_flagged = True

# make sure that the cwd() is the location of the python script (so that every path makes sense)
os.chdir(os.path.dirname(os.path.abspath(__file__)))

GT_PATH = os.path.join(os.getcwd(), 'input', 'ground-truth')
DR_PATH = os.path.join(os.getcwd(), 'input', 'detection-results')
# if there are no images then no animation can be shown
IMG_PATH = os.path.join(os.getcwd(), 'input', 'images-optional')
if os.path.exists(IMG_PATH): 
    for dirpath, dirnames, files in os.walk(IMG_PATH):
        if not files:
            # no image files found
            args.no_animation = True
else:
    args.no_animation = True

# try to import OpenCV if the user didn't choose the option --no-animation
show_animation = False
if not args.no_animation:
    try:
        import cv2
        show_animation = True
    except ImportError:
        print("\"opencv-python\" not found, please install to visualize the results.")
        args.no_animation = True

# try to import Matplotlib if the user didn't choose the option --no-plot
draw_plot = False
if not args.no_plot:
    try:
        import matplotlib.pyplot as plt
        draw_plot = True
    except ImportError:
        print("\"matplotlib\" not found, please install it to get the resulting plots.")
        args.no_plot = True


def log_average_miss_rate(precision, fp_cumsum, num_images):
    """
        log-average miss rate:
            Calculated by averaging miss rates at 9 evenly spaced FPPI points
            between 10e-2 and 10e0, in log-space.

        output:
                lamr | log-average miss rate
                mr   | miss rate
                fppi | false positives per image

        references:
            [1] Dollar, Piotr, et al. "Pedestrian Detection: An Evaluation of the
               State of the Art." Pattern Analysis and Machine Intelligence, IEEE
               Transactions on 34.4 (2012): 743 - 761.
    """

    # if there were no detections of that class
    if precision.size == 0:
        lamr = 0
        mr   = 1
        fppi = 0
        return lamr, mr, fppi

    fppi = fp_cumsum / float(num_images)
    mr = (1 - precision)

    fppi_tmp = np.insert(fppi, 0, -1.0)
    mr_tmp = np.insert(mr, 0, 1.0)

    # Use 9 evenly spaced reference points in log-space
    ref = np.logspace(-2.0, 0.0, num = 9)
    for i, ref_i in enumerate(ref):
        # np.where() will always find at least 1 index, since min(ref) = 0.01 and min(fppi_tmp) = -1.0
        j = np.where(fppi_tmp <= ref_i)[-1][-1]
        ref[i] = mr_tmp[j]

    # log(0) is undefined, so we use the np.maximum(1e-10, ref)
    lamr = math.exp(np.mean(np.log(np.maximum(1e-10, ref))))

    return lamr, mr, fppi

"""
 throw error and exit
"""
def error(msg):
    print(msg)
    sys.exit(0)

"""
 check if the number is a float between 0.0 and 1.0
"""
def is_float_between_0_and_1(value):
    try:
        val = float(value)
        if val > 0.0 and val < 1.0:
            return True
        else:
            return False
    except ValueError:
        return False

"""
 Calculate the AP given the recall and precision array
    1st) We compute a version of the measured precision/recall curve with
         precision monotonically decreasing
    2nd) We compute the AP as the area under this curve by numerical integration.
"""
def voc_ap(rec, prec):
    """
    --- Official matlab code VOC2012---
    mrec=[0 ; rec ; 1];
    mpre=[0 ; prec ; 0];
    for i=numel(mpre)-1:-1:1
            mpre(i)=max(mpre(i),mpre(i+1));
    end
    i=find(mrec(2:end)~=mrec(1:end-1))+1;
    ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
    """
    rec.insert(0, 0.0) # insert 0.0 at begining of list
    rec.append(1.0) # insert 1.0 at end of list
    mrec = rec[:]
    prec.insert(0, 0.0) # insert 0.0 at begining of list
    prec.append(0.0) # insert 0.0 at end of list
    mpre = prec[:]
    """
     This part makes the precision monotonically decreasing
        (goes from the end to the beginning)
        matlab: for i=numel(mpre)-1:-1:1
                    mpre(i)=max(mpre(i),mpre(i+1));
    """
    # matlab indexes start in 1 but python in 0, so I have to do:
    #     range(start=(len(mpre) - 2), end=0, step=-1)
    # also the python function range excludes the end, resulting in:
    #     range(start=(len(mpre) - 2), end=-1, step=-1)
    for i in range(len(mpre)-2, -1, -1):
        mpre[i] = max(mpre[i], mpre[i+1])
    """
     This part creates a list of indexes where the recall changes
        matlab: i=find(mrec(2:end)~=mrec(1:end-1))+1;
    """
    i_list = []
    for i in range(1, len(mrec)):
        if mrec[i] != mrec[i-1]:
            i_list.append(i) # if it was matlab would be i + 1
    """
     The Average Precision (AP) is the area under the curve
        (numerical integration)
        matlab: ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
    """
    ap = 0.0
    for i in i_list:
        ap += ((mrec[i]-mrec[i-1])*mpre[i])
    return ap, mrec, mpre


"""
 Convert the lines of a file to a list
"""
def file_lines_to_list(path):
    # open txt file lines to a list
    with open(path) as f:
        content = f.readlines()
    # remove whitespace characters like `\n` at the end of each line
    content = [x.strip() for x in content]
    return content

"""
 Draws text in image
"""
def draw_text_in_image(img, text, pos, color, line_width):
    font = cv2.FONT_HERSHEY_PLAIN
    fontScale = 1
    lineType = 1
    bottomLeftCornerOfText = pos
    cv2.putText(img, text,
            bottomLeftCornerOfText,
            font,
            fontScale,
            color,
            lineType)
    text_width, _ = cv2.getTextSize(text, font, fontScale, lineType)[0]
    return img, (line_width + text_width)

"""
 Plot - adjust axes
"""
def adjust_axes(r, t, fig, axes):
    # get text width for re-scaling
    bb = t.get_window_extent(renderer=r)
    text_width_inches = bb.width / fig.dpi
    # get axis width in inches
    current_fig_width = fig.get_figwidth()
    new_fig_width = current_fig_width + text_width_inches
    propotion = new_fig_width / current_fig_width
    # get axis limit
    x_lim = axes.get_xlim()
    axes.set_xlim([x_lim[0], x_lim[1]*propotion])

"""
 Draw plot using Matplotlib
"""
def draw_plot_func(dictionary, n_classes, window_title, plot_title, x_label, output_path, to_show, plot_color, true_p_bar):
    # sort the dictionary by decreasing value, into a list of tuples
    sorted_dic_by_value = sorted(dictionary.items(), key=operator.itemgetter(1))
    # unpacking the list of tuples into two lists
    sorted_keys, sorted_values = zip(*sorted_dic_by_value)
    # 
    if true_p_bar != "":
        """
         Special case to draw in:
            - green -> TP: True Positives (object detected and matches ground-truth)
            - red -> FP: False Positives (object detected but does not match ground-truth)
            - orange -> FN: False Negatives (object not detected but present in the ground-truth)
        """
        fp_sorted = []
        tp_sorted = []
        for key in sorted_keys:
            fp_sorted.append(dictionary[key] - true_p_bar[key])
            tp_sorted.append(true_p_bar[key])
        plt.barh(range(n_classes), fp_sorted, align='center', color='crimson', label='False Positive')
        plt.barh(range(n_classes), tp_sorted, align='center', color='forestgreen', label='True Positive', left=fp_sorted)
        # add legend
        plt.legend(loc='lower right')
        """
         Write number on side of bar
        """
        fig = plt.gcf() # gcf - get current figure
        axes = plt.gca()
        r = fig.canvas.get_renderer()
        for i, val in enumerate(sorted_values):
            fp_val = fp_sorted[i]
            tp_val = tp_sorted[i]
            fp_str_val = " " + str(fp_val)
            tp_str_val = fp_str_val + " " + str(tp_val)
            # trick to paint multicolor with offset:
            # first paint everything and then repaint the first number
            t = plt.text(val, i, tp_str_val, color='forestgreen', va='center', fontweight='bold')
            plt.text(val, i, fp_str_val, color='crimson', va='center', fontweight='bold')
            if i == (len(sorted_values)-1): # largest bar
                adjust_axes(r, t, fig, axes)
    else:
        plt.barh(range(n_classes), sorted_values, color=plot_color)
        """
         Write number on side of bar
        """
        fig = plt.gcf() # gcf - get current figure
        axes = plt.gca()
        r = fig.canvas.get_renderer()
        for i, val in enumerate(sorted_values):
            str_val = " " + str(val) # add a space before
            if val < 1.0:
                str_val = " {0:.2f}".format(val)
            t = plt.text(val, i, str_val, color=plot_color, va='center', fontweight='bold')
            # re-set axes to show number inside the figure
            if i == (len(sorted_values)-1): # largest bar
                adjust_axes(r, t, fig, axes)
    # set window title
    fig.canvas.set_window_title(window_title)
    # write classes in y axis
    tick_font_size = 12
    plt.yticks(range(n_classes), sorted_keys, fontsize=tick_font_size)
    """
     Re-scale height accordingly
    """
    init_height = fig.get_figheight()
    # comput the matrix height in points and inches
    dpi = fig.dpi
    height_pt = n_classes * (tick_font_size * 1.4) # 1.4 (some spacing)
    height_in = height_pt / dpi
    # compute the required figure height 
    top_margin = 0.15 # in percentage of the figure height
    bottom_margin = 0.05 # in percentage of the figure height
    figure_height = height_in / (1 - top_margin - bottom_margin)
    # set new height
    if figure_height > init_height:
        fig.set_figheight(figure_height)

    # set plot title
    plt.title(plot_title, fontsize=14)
    # set axis titles
    # plt.xlabel('classes')
    plt.xlabel(x_label, fontsize='large')
    # adjust size of window
    fig.tight_layout()
    # save the plot
    fig.savefig(output_path)
    # show image
    if to_show:
        plt.show()
    # close the plot
    plt.close()

"""
 Create a ".temp_files/" and "results/" directory
"""
TEMP_FILES_PATH = ".temp_files"
if not os.path.exists(TEMP_FILES_PATH): # if it doesn't exist already
    os.makedirs(TEMP_FILES_PATH)
results_files_path = "results"
if os.path.exists(results_files_path): # if it exist already
    # reset the results directory
    shutil.rmtree(results_files_path)

os.makedirs(results_files_path)
if draw_plot:
    os.makedirs(os.path.join(results_files_path, "classes"))
if show_animation:
    os.makedirs(os.path.join(results_files_path, "images", "detections_one_by_one"))

"""
 ground-truth
     Load each of the ground-truth files into a temporary ".json" file.
     Create a list of all the class names present in the ground-truth (gt_classes).
"""
# get a list with the ground-truth files
ground_truth_files_list = glob.glob(GT_PATH + '/*.txt')
if len(ground_truth_files_list) == 0:
    error("Error: No ground-truth files found!")
ground_truth_files_list.sort()
# dictionary with counter per class
gt_counter_per_class = {}
counter_images_per_class = {}

for txt_file in ground_truth_files_list:
    #print(txt_file)
    file_id = txt_file.split(".txt", 1)[0]
    file_id = os.path.basename(os.path.normpath(file_id))
    # check if there is a correspondent detection-results file
    temp_path = os.path.join(DR_PATH, (file_id + ".txt"))
    if not os.path.exists(temp_path):
        error_msg = "Error. File not found: {}\n".format(temp_path)
        error_msg += "(You can avoid this error message by running extra/intersect-gt-and-dr.py)"
        error(error_msg)
    lines_list = file_lines_to_list(txt_file)
    # create ground-truth dictionary
    bounding_boxes = []
    is_difficult = False
    already_seen_classes = []
    for line in lines_list:
        try:
            if "difficult" in line:
                    class_name, left, top, right, bottom, _difficult = line.split()
                    is_difficult = True
            else:
                    class_name, left, top, right, bottom = line.split()
        except ValueError:
            error_msg = "Error: File " + txt_file + " in the wrong format.\n"
            error_msg += " Expected:      ['difficult']\n"
            error_msg += " Received: " + line
            error_msg += "\n\nIf you have a  with spaces between words you should remove them\n"
            error_msg += "by running the script \"remove_space.py\" or \"rename_class.py\" in the \"extra/\" folder."
            error(error_msg)
        # check if class is in the ignore list, if yes skip
        if class_name in args.ignore:
            continue
        bbox = left + " " + top + " " + right + " " +bottom
        if is_difficult:
                bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False, "difficult":True})
                is_difficult = False
        else:
                bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False})
                # count that object
                if class_name in gt_counter_per_class:
                    gt_counter_per_class[class_name] += 1
                else:
                    # if class didn't exist yet
                    gt_counter_per_class[class_name] = 1

                if class_name not in already_seen_classes:
                    if class_name in counter_images_per_class:
                        counter_images_per_class[class_name] += 1
                    else:
                        # if class didn't exist yet
                        counter_images_per_class[class_name] = 1
                    already_seen_classes.append(class_name)


    # dump bounding_boxes into a ".json" file
    with open(TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json", 'w') as outfile:
        json.dump(bounding_boxes, outfile)

gt_classes = list(gt_counter_per_class.keys())
# let's sort the classes alphabetically
gt_classes = sorted(gt_classes)
n_classes = len(gt_classes)
#print(gt_classes)
#print(gt_counter_per_class)

"""
 Check format of the flag --set-class-iou (if used)
    e.g. check if class exists
"""
if specific_iou_flagged:
    n_args = len(args.set_class_iou)
    error_msg = \
        '\n --set-class-iou [class_1] [IoU_1] [class_2] [IoU_2] [...]'
    if n_args % 2 != 0:
        error('Error, missing arguments. Flag usage:' + error_msg)
    # [class_1] [IoU_1] [class_2] [IoU_2]
    # specific_iou_classes = ['class_1', 'class_2']
    specific_iou_classes = args.set_class_iou[::2] # even
    # iou_list = ['IoU_1', 'IoU_2']
    iou_list = args.set_class_iou[1::2] # odd
    if len(specific_iou_classes) != len(iou_list):
        error('Error, missing arguments. Flag usage:' + error_msg)
    for tmp_class in specific_iou_classes:
        if tmp_class not in gt_classes:
                    error('Error, unknown class \"' + tmp_class + '\". Flag usage:' + error_msg)
    for num in iou_list:
        if not is_float_between_0_and_1(num):
            error('Error, IoU must be between 0.0 and 1.0. Flag usage:' + error_msg)

"""
 detection-results
     Load each of the detection-results files into a temporary ".json" file.
"""
# get a list with the detection-results files
dr_files_list = glob.glob(DR_PATH + '/*.txt')
dr_files_list.sort()

for class_index, class_name in enumerate(gt_classes):
    bounding_boxes = []
    for txt_file in dr_files_list:
        #print(txt_file)
        # the first time it checks if all the corresponding ground-truth files exist
        file_id = txt_file.split(".txt",1)[0]
        file_id = os.path.basename(os.path.normpath(file_id))
        temp_path = os.path.join(GT_PATH, (file_id + ".txt"))
        if class_index == 0:
            if not os.path.exists(temp_path):
                error_msg = "Error. File not found: {}\n".format(temp_path)
                error_msg += "(You can avoid this error message by running extra/intersect-gt-and-dr.py)"
                error(error_msg)
        lines = file_lines_to_list(txt_file)
        for line in lines:
            try:
                tmp_class_name, confidence, left, top, right, bottom = line.split()
            except ValueError:
                error_msg = "Error: File " + txt_file + " in the wrong format.\n"
                error_msg += " Expected:      \n"
                error_msg += " Received: " + line
                error(error_msg)
            if tmp_class_name == class_name:
                #print("match")
                bbox = left + " " + top + " " + right + " " +bottom
                bounding_boxes.append({"confidence":confidence, "file_id":file_id, "bbox":bbox})
                #print(bounding_boxes)
    # sort detection-results by decreasing confidence
    bounding_boxes.sort(key=lambda x:float(x['confidence']), reverse=True)
    with open(TEMP_FILES_PATH + "/" + class_name + "_dr.json", 'w') as outfile:
        json.dump(bounding_boxes, outfile)

"""
 Calculate the AP for each class
"""
sum_AP = 0.0
ap_dictionary = {}
lamr_dictionary = {}
# open file to store the results
with open(results_files_path + "/results.txt", 'w') as results_file:
    results_file.write("# AP and precision/recall per class\n")
    count_true_positives = {}
    for class_index, class_name in enumerate(gt_classes):
        count_true_positives[class_name] = 0
        """
         Load detection-results of that class
        """
        dr_file = TEMP_FILES_PATH + "/" + class_name + "_dr.json"
        dr_data = json.load(open(dr_file))

        """
         Assign detection-results to ground-truth objects
        """
        nd = len(dr_data)
        tp = [0] * nd # creates an array of zeros of size nd
        fp = [0] * nd
        for idx, detection in enumerate(dr_data):
            file_id = detection["file_id"]
            if show_animation:
                # find ground truth image
                ground_truth_img = glob.glob1(IMG_PATH, file_id + ".*")
                #tifCounter = len(glob.glob1(myPath,"*.tif"))
                if len(ground_truth_img) == 0:
                    error("Error. Image not found with id: " + file_id)
                elif len(ground_truth_img) > 1:
                    error("Error. Multiple image with id: " + file_id)
                else: # found image
                    #print(IMG_PATH + "/" + ground_truth_img[0])
                    # Load image
                    img = cv2.imread(IMG_PATH + "/" + ground_truth_img[0])
                    # load image with draws of multiple detections
                    img_cumulative_path = results_files_path + "/images/" + ground_truth_img[0]
                    if os.path.isfile(img_cumulative_path):
                        img_cumulative = cv2.imread(img_cumulative_path)
                    else:
                        img_cumulative = img.copy()
                    # Add bottom border to image
                    bottom_border = 60
                    BLACK = [0, 0, 0]
                    img = cv2.copyMakeBorder(img, 0, bottom_border, 0, 0, cv2.BORDER_CONSTANT, value=BLACK)
            # assign detection-results to ground truth object if any
            # open ground-truth with that file_id
            gt_file = TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json"
            ground_truth_data = json.load(open(gt_file))
            ovmax = -1
            gt_match = -1
            # load detected object bounding-box
            bb = [ float(x) for x in detection["bbox"].split() ]
            for obj in ground_truth_data:
                # look for a class_name match
                if obj["class_name"] == class_name:
                    bbgt = [ float(x) for x in obj["bbox"].split() ]
                    bi = [max(bb[0],bbgt[0]), max(bb[1],bbgt[1]), min(bb[2],bbgt[2]), min(bb[3],bbgt[3])]
                    iw = bi[2] - bi[0] + 1
                    ih = bi[3] - bi[1] + 1
                    if iw > 0 and ih > 0:
                        # compute overlap (IoU) = area of intersection / area of union
                        ua = (bb[2] - bb[0] + 1) * (bb[3] - bb[1] + 1) + (bbgt[2] - bbgt[0]
                                        + 1) * (bbgt[3] - bbgt[1] + 1) - iw * ih
                        ov = iw * ih / ua
                        if ov > ovmax:
                            ovmax = ov
                            gt_match = obj

            # assign detection as true positive/don't care/false positive
            if show_animation:
                status = "NO MATCH FOUND!" # status is only used in the animation
            # set minimum overlap
            min_overlap = MINOVERLAP
            if specific_iou_flagged:
                if class_name in specific_iou_classes:
                    index = specific_iou_classes.index(class_name)
                    min_overlap = float(iou_list[index])
            if ovmax >= min_overlap:
                if "difficult" not in gt_match:
                        if not bool(gt_match["used"]):
                            # true positive
                            tp[idx] = 1
                            gt_match["used"] = True
                            count_true_positives[class_name] += 1
                            # update the ".json" file
                            with open(gt_file, 'w') as f:
                                    f.write(json.dumps(ground_truth_data))
                            if show_animation:
                                status = "MATCH!"
                        else:
                            # false positive (multiple detection)
                            fp[idx] = 1
                            if show_animation:
                                status = "REPEATED MATCH!"
            else:
                # false positive
                fp[idx] = 1
                if ovmax > 0:
                    status = "INSUFFICIENT OVERLAP"

            """
             Draw image to show animation
            """
            if show_animation:
                height, widht = img.shape[:2]
                # colors (OpenCV works with BGR)
                white = (255,255,255)
                light_blue = (255,200,100)
                green = (0,255,0)
                light_red = (30,30,255)
                # 1st line
                margin = 10
                v_pos = int(height - margin - (bottom_border / 2.0))
                text = "Image: " + ground_truth_img[0] + " "
                img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
                text = "Class [" + str(class_index) + "/" + str(n_classes) + "]: " + class_name + " "
                img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), light_blue, line_width)
                if ovmax != -1:
                    color = light_red
                    if status == "INSUFFICIENT OVERLAP":
                        text = "IoU: {0:.2f}% ".format(ovmax*100) + "< {0:.2f}% ".format(min_overlap*100)
                    else:
                        text = "IoU: {0:.2f}% ".format(ovmax*100) + ">= {0:.2f}% ".format(min_overlap*100)
                        color = green
                    img, _ = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)
                # 2nd line
                v_pos += int(bottom_border / 2.0)
                rank_pos = str(idx+1) # rank position (idx starts at 0)
                text = "Detection #rank: " + rank_pos + " confidence: {0:.2f}% ".format(float(detection["confidence"])*100)
                img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
                color = light_red
                if status == "MATCH!":
                    color = green
                text = "Result: " + status + " "
                img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)

                font = cv2.FONT_HERSHEY_SIMPLEX
                if ovmax > 0: # if there is intersections between the bounding-boxes
                    bbgt = [ int(round(float(x))) for x in gt_match["bbox"].split() ]
                    cv2.rectangle(img,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
                    cv2.rectangle(img_cumulative,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
                    cv2.putText(img_cumulative, class_name, (bbgt[0],bbgt[1] - 5), font, 0.6, light_blue, 1, cv2.LINE_AA)
                bb = [int(i) for i in bb]
                cv2.rectangle(img,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
                cv2.rectangle(img_cumulative,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
                cv2.putText(img_cumulative, class_name, (bb[0],bb[1] - 5), font, 0.6, color, 1, cv2.LINE_AA)
                # show image
                cv2.imshow("Animation", img)
                cv2.waitKey(20) # show for 20 ms
                # save image to results
                output_img_path = results_files_path + "/images/detections_one_by_one/" + class_name + "_detection" + str(idx) + ".jpg"
                cv2.imwrite(output_img_path, img)
                # save the image with all the objects drawn to it
                cv2.imwrite(img_cumulative_path, img_cumulative)

        #print(tp)
        # compute precision/recall
        cumsum = 0
        for idx, val in enumerate(fp):
            fp[idx] += cumsum
            cumsum += val
        cumsum = 0
        for idx, val in enumerate(tp):
            tp[idx] += cumsum
            cumsum += val
        #print(tp)
        rec = tp[:]
        for idx, val in enumerate(tp):
            rec[idx] = float(tp[idx]) / gt_counter_per_class[class_name]
        #print(rec)
        prec = tp[:]
        for idx, val in enumerate(tp):
            prec[idx] = float(tp[idx]) / (fp[idx] + tp[idx])
        #print(prec)

        ap, mrec, mprec = voc_ap(rec[:], prec[:])
        sum_AP += ap
        text = "{0:.2f}%".format(ap*100) + " = " + class_name + " AP " #class_name + " AP = {0:.2f}%".format(ap*100)
        """
         Write to results.txt
        """
        rounded_prec = [ '%.2f' % elem for elem in prec ]
        rounded_rec = [ '%.2f' % elem for elem in rec ]
        results_file.write(text + "\n Precision: " + str(rounded_prec) + "\n Recall :" + str(rounded_rec) + "\n\n")
        if not args.quiet:
            print(text)
        ap_dictionary[class_name] = ap

        n_images = counter_images_per_class[class_name]
        lamr, mr, fppi = log_average_miss_rate(np.array(rec), np.array(fp), n_images)
        lamr_dictionary[class_name] = lamr

        """
         Draw plot
        """
        if draw_plot:
            plt.plot(rec, prec, '-o')
            # add a new penultimate point to the list (mrec[-2], 0.0)
            # since the last line segment (and respective area) do not affect the AP value
            area_under_curve_x = mrec[:-1] + [mrec[-2]] + [mrec[-1]]
            area_under_curve_y = mprec[:-1] + [0.0] + [mprec[-1]]
            plt.fill_between(area_under_curve_x, 0, area_under_curve_y, alpha=0.2, edgecolor='r')
            # set window title
            fig = plt.gcf() # gcf - get current figure
            fig.canvas.set_window_title('AP ' + class_name)
            # set plot title
            plt.title('class: ' + text)
            #plt.suptitle('This is a somewhat long figure title', fontsize=16)
            # set axis titles
            plt.xlabel('Recall')
            plt.ylabel('Precision')
            # optional - set axes
            axes = plt.gca() # gca - get current axes
            axes.set_xlim([0.0,1.0])
            axes.set_ylim([0.0,1.05]) # .05 to give some extra space
            # Alternative option -> wait for button to be pressed
            #while not plt.waitforbuttonpress(): pass # wait for key display
            # Alternative option -> normal display
            #plt.show()
            # save the plot
            fig.savefig(results_files_path + "/classes/" + class_name + ".png")
            plt.cla() # clear axes for next plot

    if show_animation:
        cv2.destroyAllWindows()

    results_file.write("\n# mAP of all classes\n")
    mAP = sum_AP / n_classes
    text = "mAP = {0:.2f}%".format(mAP*100)
    results_file.write(text + "\n")
    print(text)

# remove the temp_files directory
shutil.rmtree(TEMP_FILES_PATH)

"""
 Count total of detection-results
"""
# iterate through all the files
det_counter_per_class = {}
for txt_file in dr_files_list:
    # get lines to list
    lines_list = file_lines_to_list(txt_file)
    for line in lines_list:
        class_name = line.split()[0]
        # check if class is in the ignore list, if yes skip
        if class_name in args.ignore:
            continue
        # count that object
        if class_name in det_counter_per_class:
            det_counter_per_class[class_name] += 1
        else:
            # if class didn't exist yet
            det_counter_per_class[class_name] = 1
#print(det_counter_per_class)
dr_classes = list(det_counter_per_class.keys())


"""
 Plot the total number of occurences of each class in the ground-truth
"""
if draw_plot:
    window_title = "ground-truth-info"
    plot_title = "ground-truth\n"
    plot_title += "(" + str(len(ground_truth_files_list)) + " files and " + str(n_classes) + " classes)"
    x_label = "Number of objects per class"
    output_path = results_files_path + "/ground-truth-info.png"
    to_show = False
    plot_color = 'forestgreen'
    draw_plot_func(
        gt_counter_per_class,
        n_classes,
        window_title,
        plot_title,
        x_label,
        output_path,
        to_show,
        plot_color,
        '',
        )

"""
 Write number of ground-truth objects per class to results.txt
"""
with open(results_files_path + "/results.txt", 'a') as results_file:
    results_file.write("\n# Number of ground-truth objects per class\n")
    for class_name in sorted(gt_counter_per_class):
        results_file.write(class_name + ": " + str(gt_counter_per_class[class_name]) + "\n")

"""
 Finish counting true positives
"""
for class_name in dr_classes:
    # if class exists in detection-result but not in ground-truth then there are no true positives in that class
    if class_name not in gt_classes:
        count_true_positives[class_name] = 0
#print(count_true_positives)

"""
 Plot the total number of occurences of each class in the "detection-results" folder
"""
if draw_plot:
    window_title = "detection-results-info"
    # Plot title
    plot_title = "detection-results\n"
    plot_title += "(" + str(len(dr_files_list)) + " files and "
    count_non_zero_values_in_dictionary = sum(int(x) > 0 for x in list(det_counter_per_class.values()))
    plot_title += str(count_non_zero_values_in_dictionary) + " detected classes)"
    # end Plot title
    x_label = "Number of objects per class"
    output_path = results_files_path + "/detection-results-info.png"
    to_show = False
    plot_color = 'forestgreen'
    true_p_bar = count_true_positives
    draw_plot_func(
        det_counter_per_class,
        len(det_counter_per_class),
        window_title,
        plot_title,
        x_label,
        output_path,
        to_show,
        plot_color,
        true_p_bar
        )

"""
 Write number of detected objects per class to results.txt
"""
with open(results_files_path + "/results.txt", 'a') as results_file:
    results_file.write("\n# Number of detected objects per class\n")
    for class_name in sorted(dr_classes):
        n_det = det_counter_per_class[class_name]
        text = class_name + ": " + str(n_det)
        text += " (tp:" + str(count_true_positives[class_name]) + ""
        text += ", fp:" + str(n_det - count_true_positives[class_name]) + ")\n"
        results_file.write(text)

"""
 Draw log-average miss rate plot (Show lamr of all classes in decreasing order)
"""
if draw_plot:
    window_title = "lamr"
    plot_title = "log-average miss rate"
    x_label = "log-average miss rate"
    output_path = results_files_path + "/lamr.png"
    to_show = False
    plot_color = 'royalblue'
    draw_plot_func(
        lamr_dictionary,
        n_classes,
        window_title,
        plot_title,
        x_label,
        output_path,
        to_show,
        plot_color,
        ""
        )

"""
 Draw mAP plot (Show AP's of all classes in decreasing order)
"""
if draw_plot:
    window_title = "mAP"
    plot_title = "mAP = {0:.2f}%".format(mAP*100)
    x_label = "Average Precision"
    output_path = results_files_path + "/mAP.png"
    to_show = True
    plot_color = 'royalblue'
    draw_plot_func(
        ap_dictionary,
        n_classes,
        window_title,
        plot_title,
        x_label,
        output_path,
        to_show,
        plot_color,
        ""
        )

7 训练自己数据集的步骤

7.1 流程

  1. 用imgme圈出真实框,把图片信息和框的信息分别放入VOCdevkit/VOC2007/JPEGImages和VOCdevkit/VOC2007/Annotations中。
  2. 修改ssd.py里的"model_path"、“classes_path”。
  3. 修改train.py里的model.load_weights。

你可能感兴趣的:(计算机视觉,目标检测,python,深度学习)