YOLOV2 _keras代码详解

YOLOV2_keras代码详解

  • 2 YOLOV2
    • 2.1 数据处理
      • 2.1.1 生成XX.hdf5格式的数据
      • 2.1.2 生成tfrecords格式的数据
      • 2.1.3 数据增强
      • 2.1.4 制作标签
    • 2.2 训练
      • 2.2.1 darknet19主体结构及主要模块儿
      • 2.2.2 utils_ls.py
      • 2.2.3 keras_yolo_ls.py
      • 2.2.4 迁移学习
    • 2.3 测试
      • 2.4 总结

2 YOLOV2

  • 论文:https:arxiv.org/pdf/1612.08242.pdf
  • 参考代码:https://github.com/allanzelener/yad2k
- YOVLOV2改进:
   - 1. 用Kmeans获取先验框的宽高
   - 2. 引入anchor。BS*7*7*2Boxes*30-->BS*13*13*5Boxes*(4+1+cls)
   - 3. 提高输入数据分辨率。224-->448
   - 4. 采用darknet-19主干网络。
   - 5. 特征融合(passthrough)。BS*26*26*512-->BS*13*13*256-->cat BS*13*13*1024-->BS*13*13*1280
   - 6. 加入BN。去掉fc、dropout,添加bn+relu,提高2%mAP。
   - 7. 预测中心点坐标偏移。解码函数改变。
   - 8. 多尺度输入训练网络。[320,352,384,448,480,512,544,576,608] 10/pre epoch
   

2.1 数据处理

已知数据VOC2007,VOC2007包含图片信息VOC2007/JPEGImages(.jpg)和VOC2007/Annotation存储的框的信息(.xml)。

数据处理步骤:

  1. 把数据划分为训练集、验证集、测试集。train_val_test_split.
  2. 把上一步得到的数据集转化为.hdf5格式或用tfrecords格式,加快数据读取速度。voc2hdf5/voc2tfrecords
  3. 数组增强
  4. 制作标签

注:train_val_test_split的代码再YOLOV1中已经解释过,不再赘述。

2.1.1 生成XX.hdf5格式的数据

第一步由 YAD2K-master_YOLOV2/voc_conversion_scripts/voc2hdf5_LS.py 生成 YAD2K-master_YOLOV2/VOCdevkit/pascal_voc_07_12_LS.hdf5

(1) _main(args)

input  :
output : 
process:
1. 
2. 
3. 

'''
为什么要把数据转换成HDF5? 产生的文件:VOCdevkit/pascal_voc_07_12_LS.hdf5
'''

import os
import h5py
import argparse
import numpy as np
import xml.etree.ElementTree as ElementTree

sets_from_2007 = [('2007','train'),('2007','val')]
train_set = [('2007','train')]
val_set = [('2007','val')]
test_set = [('2007','test')]

classes = [
    "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
    "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

parser = argparse.ArgumentParser(description = 'Conver Pascal VOC 2007 detection dataset to HDF5')
parser.add_argument('-p','--path_to_voc',
                    help='path to VOCdevkit directory',
                    default='/Users/liushuang/Desktop/LearnGit/Bubbliiiing资料/YLOLO__pytorch系列/YAD2K-master_YOLOV2/VOCdevkit')

def get_boxes_from_id(voc_path,year,image_id):
    ''' 根据图片地址获取类别、框信息[5,N]'''
    fname = os.path.join(voc_path,'VOC{}/Annotations/{}.xml'.format(year,image_id))  # xml信息的地址
    with open(fname,'r')as f:
        xlm_tree = ElementTree.parse(f)  # 打开image_id对应的.xml文件
    root = xlm_tree.getroot()            # .xml文件开头
    boxes = []                           # 存储一张图片上取到的所有的框
    for obj in root.iter('object'):      # 遍历每一个框
        difficult = obj.find('difficult').text
        label = obj.find('name').text
        if label not in classes or int(difficult)== 1:  # 过滤非查找对象和difficult==1的框
            continue
        xlm_box = obj.find('bndbox')     # 取得框信息
        bbox = (classes.index(label),
                int(xlm_box.find('xmin').text),
                int(xlm_box.find('ymin').text),
                int(xlm_box.find('xmax').text),
                int(xlm_box.find('ymax').text))
        boxes.extend(bbox)
    return np.array(boxes)


def get_image_from_id(voc_path,year,image_id):
    ''' 读取图片信息,转换为uint8数据类型 '''
    fname = os.path.join(voc_path,'VOC{}/JPEGImages/{}.jpg'.format(year,image_id))
    with open(fname,'rb') as f:
        data = f.read()
    return np.frombuffer(data,dtype='uint8')

def get_ids(voc_path,datasets):
    ids = []
    for year,image_set in datasets:
        id_file = os.path.join(voc_path,'VOC{}/ImageSets/Main/{}.txt'.format(year,image_set))
        with open(id_file,'r')as f:
            ids.extend(map(str.strip,f.readlines()))
    return ids    #  ids = ['000027', '000025', '000018', '000022', '000014', '000028', '00001

def add_to_dataset(voc_path,year,ids,images,boxes,start = 0):
    '''取得图片数据 array of uint8、框'''
    for i ,voc_id  in enumerate(ids):
        images_data =  get_image_from_id(voc_path,year,voc_id)
        images_boxes = get_boxes_from_id(voc_path,year,voc_id)
        images[start+i] = images_data
        boxes[start+i] = images_boxes
    return i

def _main(args):
    voc_path = os.path.expanduser(args.path_to_voc)
    train_ids = get_ids(voc_path,train_set)  # train_set = [('2007', 'train')]
    val_ids = get_ids(voc_path,val_set)
    test_ids = get_ids(voc_path,test_set)
    train_ids_2007 = get_ids(voc_path, sets_from_2007)
    total_train_ids = len(train_ids)+ len(train_ids_2007)  # 原代码是2007的train和2012的train放在一起

    print('Creating HDF5 dataset structure.')
    fname = os.path.join(voc_path,'pascal_voc_07_12_LS.hdf5')
    voc_h5file = h5py.File(fname,'w')
    uint8_dt = h5py.special_dtype(vlen = np.dtype('uint8')) # variable length uint8
    int_dt = h5py.special_dtype(vlen=np.dtype(int))  # variable length default int
    train_group = voc_h5file.create_group('train')
    val_group = voc_h5file.create_group('val')
    test_group = voc_h5file.create_group('test')

    # store class
    voc_h5file.attrs['classes'] = np.string_(str.join(',',classes))  # dtype('S134')

    # store images
    train_images = train_group.create_dataset('images',shape=(total_train_ids,),dtype=uint8_dt)
    val_images = val_group.create_dataset('images',shape=(len(val_ids),),dtype=uint8_dt)
    test_images = test_group.create_dataset('images',shape=(len(test_ids),),dtype=uint8_dt)

    train_boxes = train_group.create_dataset('boxes',shape=(total_train_ids,),dtype=int_dt)
    val_boxes = val_group.create_dataset('boxes',shape=(len(val_ids),),dtype=int_dt)
    test_boxes = test_group.create_dataset('boxes',shape=(len(test_ids),),dtype=int_dt)

    # process all ids and add to datasets
    print('Process Pascal VOC 2007 datasets for training set')
    last_2007 = add_to_dataset(voc_path,'2007',train_ids_2007,train_images,train_boxes)
    print('Processing Pascal VOC 2012 training set.')
    add_to_dataset(voc_path,'2007',train_ids,train_images,train_boxes,start=last_2007+1)
    print('Processing Pascal VOC 2012 val set.')
    add_to_dataset(voc_path, '2007', val_ids, val_images, val_boxes)
    print('Processing Pascal VOC 2007 test set.')
    add_to_dataset(voc_path, '2007', test_ids, test_images, test_boxes)
    print('Closing HDF5 file.')
    voc_h5file.close()
    print('Done.')


if __name__ == '__main__':
    _main(parser.parse_args())
    print(parser.parse_args())

(2) get_boxes_from_id(voc_path,year,image_id)

input  :
output : 
process:
1. 
2. 
3. 



(3)get_image_from_id(voc_path,year,image_id)

input  :
output : 
process:
1. 
2. 
3. 



(4)get_ids(voc_path,datasets)

input  :
output : 
process:
1. 
2. 
3. 



(5)add_to_dataset(voc_path,year,ids,images,boxes,start = 0)

input  :
output : 
process:
1. 
2. 
3. 



2.1.2 生成tfrecords格式的数据

()

input  :
output : 
process:
1. 
2. 
3. 



(2) 图片数据转换为二进制数据 process_image(image_path)

input  : image_path,图片地址
output : image_data(bytes), height, width
process:
1. 读取图片
2. 图片数据有Uint8转换为二进制
3. 返回二进制数据类型图片及图片高宽

decoder_sess = tf.Session()
image_placeholder = tf.placeholder(dtype=tf.string)  # 图片
decoded_jpeg = tf.image.decode_jpeg(image_placeholder,channels=3)  # 解码

def process_image(image_path):
    ''' tf 解码jpeg图片,返回图片及宽高'''
    with open(image_path,'rb') as f:  # 打开图片
        image_data = f.read()
    image = decoder_sess.run(decoded_jpeg,feed_dict={image_placeholder:image_data})     # 图片转化为二进制数据
    assert len(image.shape) == 3   # 判断是否是三个维度
    height = image.shape[0]
    width = image.shape[1]
    assert image.shape[2] == 3
    return image_data, height, width  

if __name__ == '__main__':
    train_txt_path = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
    with open(train_txt_path,'r')as f:
        train_ids = f.readlines()
    for id in train_ids:
        image_path =  '../VOCdevkit/VOC2007/JPEGImages/{}.jpg'.format(id.strip())
        image_data, height, width = process_image(image_path)
        print('type(image_data):',type(image_data),'height:',height, 'width:',width)
    
'''
type(image_data):  height: 500 width: 486
type(image_data):  height: 375 width: 500
type(image_data):  height: 285 width: 380
type(image_data):  height: 332 width: 500
type(image_data):  height: 333 width: 500
type(image_data):  height: 500 width: 375

Process finished with exit code 0
'''

(3) 读取xml信息 process_anno(anno_path)

input  : xxx.xml路径
output : 真实框的类别和框的左上角、右下角坐标
process:
1. 打开xml文件,获取宽高。
2. 获取类别、框的信息

def process_anno(anno_path):
    with open(anno_path) as f:
        xml_tree = ElementTree.parse(f)
    root = xml_tree.getroot()
    height = float(size.find('height').text)
    width = float(size.find('width').text)
    boxes = []
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        label = obj.find('name').text
        if label not in classes or int(difficult)==1:
            continue
        xml_box = obj.find('bndbox')
        bbox = {
            'class':classes.index(label),
            'ymin':float(xml_box.find('ymin').text)/height,  # 归一化,得到真实框在图片上的相对位置
            'xmin':float(xml_box.find('xmin').text)/width,
            'ymax':float(xml_box.find('ymax').text)/height,
            'xmax':float(xml_box.find('xmax').text)/width,
        }
        boxes.append(bbox)
    return boxes


if __name__ == '__main__':
    train_txt_path = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
    with open(train_txt_path,'r')as f:
        train_ids = f.readlines()
    for id in train_ids:
        anno_path =  '../VOCdevkit/VOC2007/Annotations/{}.xml'.format(id.strip())
        boxes = process_anno(anno_path)
        print(len(boxes))
        for i in boxes:
            print(i)

'''
len(boxes): 1
{'class': 14, 'ymin': 0.202, 'xmin': 0.35802469135802467, 'ymax': 0.702, 'xmax': 0.7181069958847737}
len(boxes): 10
{'class': 9, 'ymin': 0.224, 'xmin': 0.004, 'ymax': 0.6613333333333333, 'xmax': 0.118}
{'class': 9, 'ymin': 0.30666666666666664, 'xmin': 0.136, 'ymax': 0.744, 'xmax': 0.466}
{'class': 9, 'ymin': 0.4613333333333333, 'xmin': 0.128, 'ymax': 0.9946666666666667, 'xmax': 0.754}
{'class': 14, 'ymin': 0.005333333333333333, 'xmin': 0.64, 'ymax': 1.0, 'xmax': 0.992}
{'class': 14, 'ymin': 0.010666666666666666, 'xmin': 0.442, 'ymax': 0.9973333333333333, 'xmax': 0.682}
{'class': 14, 'ymin': 0.037333333333333336, 'xmin': 0.27, 'ymax': 0.39466666666666667, 'xmax': 0.44}
{'class': 9, 'ymin': 0.11466666666666667, 'xmin': 0.138, 'ymax': 0.472, 'xmax': 0.312}
{'class': 14, 'ymin': 0.144, 'xmin': 0.116, 'ymax': 0.37066666666666664, 'xmax': 0.208}
{'class': 14, 'ymin': 0.0026666666666666666, 'xmin': 0.558, 'ymax': 0.22933333333333333, 'xmax': 0.662}
{'class': 14, 'ymin': 0.058666666666666666, 'xmin': 0.64, 'ymax': 0.256, 'xmax': 0.688}
len(boxes): 1
{'class': 11, 'ymin': 0.10526315789473684, 'xmin': 0.08157894736842106, 'ymax': 0.9789473684210527, 'xmax': 0.9421052631578948}
len(boxes): 2
{'class': 12, 'ymin': 0.3102409638554217, 'xmin': 0.136, 'ymax': 0.8524096385542169, 'xmax': 0.736}
{'class': 14, 'ymin': 0.13253012048192772, 'xmin': 0.372, 'ymax': 0.6927710843373494, 'xmax': 0.51}
len(boxes): 6
{'class': 5, 'ymin': 0.4894894894894895, 'xmin': 0.144, 'ymax': 0.6846846846846847, 'xmax': 0.604}
{'class': 6, 'ymin': 0.5825825825825826, 'xmin': 0.37, 'ymax': 0.948948948948949, 'xmax': 1.0}
{'class': 6, 'ymin': 0.5405405405405406, 'xmin': 0.832, 'ymax': 0.6666666666666666, 'xmax': 1.0}
{'class': 14, 'ymin': 0.024024024024024024, 'xmin': 0.628, 'ymax': 0.19519519519519518, 'xmax': 0.688}
{'class': 14, 'ymin': 0.012012012012012012, 'xmin': 0.662, 'ymax': 0.1831831831831832, 'xmax': 0.722}
{'class': 14, 'ymin': 0.024024024024024024, 'xmin': 0.714, 'ymax': 0.1831831831831832, 'xmax': 0.802}
len(boxes): 1
{'class': 7, 'ymin': 0.036, 'xmin': 0.168, 'ymax': 1.0, 'xmax': 0.9973333333333333}

Process finished with exit code 0

'''

(4) 把图片和框转换为TFRocoreds格式。conver_to_example(image_data,boxes,filename,height,width)

inputs:
    image_data:Encoded image bytes.
    boxes: class labels + Bounding box corners
    filename: Path to image file
outputs:
    protobuf:ensorflow Example protobuf containing image and bounding boxes
process:
1. 取出类别和框的坐标
2. 按照tf的标准存储

def conver_to_example(image_data,boxes,filename,height,width):
    ''' convert images,boxes to TFRecord  protobuf.
    :inputs
    image_data:Encoded image bytes.
    boxes: class labels + Bounding box corners
    filename: Path to image file
    :outputs
    protobuf:ensorflow Example protobuf containing image and bounding boxes
    '''
    box_classes = [b['class'] for b in boxes]
    box_ymin = [b['ymin']for b in boxes]
    box_xmin = [b['xmin']for b in boxes]
    box_ymax = [b['ymax']for b in boxes]
    box_xmax = [b['xmax']for b in boxes]
    encoded_image = [tf.compat.as_bytes(image_data)]
    base_name = [tf.compat.as_bytes(os.path.basename(filename))]  # filename = '../images/dog.jpg'-->base_name = [b'dog.jpg']

    example = tf.train.Example(features = tf.train.Features(feature={
        'filename': tf.train.Feature(bytes_list = tf.train.BytesList(value=base_name)),
        'height': tf.train.Feature(int64_list=tf.train.Int64List(value=[height])),
        'width': tf.train.Feature(int64_list=tf.train.Int64List(value=[width])),
        'classes': tf.train.Feature(int64_list=tf.train.Int64List(value=box_classes)),
        'y_mins': tf.train.Feature(float_list=tf.train.FloatList(value=box_ymin)),
        'x_mins': tf.train.Feature(float_list=tf.train.FloatList(value=box_xmin)),
        'y_max': tf.train.Feature(float_list=tf.train.FloatList(value=box_ymax)),
        'x_max': tf.train.Feature(float_list=tf.train.FloatList(value=box_xmax)),
        'encoded': tf.train.Feature(bytes_list=tf.train.BytesList(value=encoded_image))
    }))
    return example


if __name__ == '__main__':
    image_path = '../VOCdevkit/VOC2007/JPEGImages/000014.jpg'
    anno_path = '../VOCdevkit/VOC2007/Annotations/000014.xml'
    filename = image_path
    image_data, height, width = process_image(image_path)
    boxes = process_anno(anno_path)
    example = conver_to_example(image_data,boxes,filename,height,width)
    print(type(example))
    print(example)
    
    # train_txt_path = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
    # with open(train_txt_path,'r')as f:
    #     train_ids = f.readlines()
    # for id in train_ids:
    #     image_path = '../VOCdevkit/VOC2007/JPEGImages/{}.jpg'.format(id.strip())
    #     anno_path =  '../VOCdevkit/VOC2007/Annotations/{}.xml'.format(id.strip())
    #     image_data, height, width = process_image(image_path)
    #     boxes = process_anno(anno_path)
    #     example = conver_to_example(image_data,boxes,image_path,height,width)
    #     print(type(example))
    #     # print(example)
'''

features {
  feature {
    key: "classes"
    value {
      int64_list {
        value: 5
        value: 6
        value: 6
        value: 14
        value: 14
        value: 14
      }
    }
  }
  feature {
    key: "encoded"
    value {
      bytes_list {........}
  feature {
    key: "filename"
    value {
      bytes_list {
        value: "000014.jpg"
      }
    }
  }
  feature {
    key: "height"
    value {
      int64_list {
        value: 333
      }
    }
  }
  feature {
    key: "width"
    value {
      int64_list {
        value: 500
      }
    }
  }
  feature {
    key: "x_max"
    value {
      float_list {
        value: 0.6039999723434448
        value: 1.0
        value: 1.0
        value: 0.6880000233650208
        value: 0.722000002861023
        value: 0.8019999861717224
      }
    }
  }
  feature {
    key: "x_mins"
    value {
      float_list {
        value: 0.14399999380111694
        value: 0.3700000047683716
        value: 0.8320000171661377
        value: 0.628000020980835
        value: 0.6620000004768372
        value: 0.7139999866485596
      }
    }
  }
  feature {
    key: "y_max"
    value {
      float_list {
        value: 0.684684693813324
        value: 0.9489489197731018
        value: 0.6666666865348816
        value: 0.19519519805908203
        value: 0.18318317830562592
        value: 0.18318317830562592
      }
    }
  }
  feature {
    key: "y_mins"
    value {
      float_list {
        value: 0.48948949575424194
        value: 0.5825825929641724
        value: 0.5405405163764954
        value: 0.024024024605751038
        value: 0.012012012302875519
        value: 0.024024024605751038
      }
    }
  }
}


Process finished with exit code 0

'''

(5) 取得图片路径和框路径

input  :
    voc_path = '../VOCdevkit'
    year = '2007'
    Eg :image_id = '0000014'
output : 
    image_path = '../VOCdevkit/VOC2007/JPEGImages/0000014.jpg'
    anno_path = '../VOCdevkit/VOC2007/JPEGImages/0000014.xml'

def get_image_path(voc_path,year,image_id):
    ''' 图片地址'''
    return os.path.join(voc_path,'VOC{}/JPEGImages/{}.jpg'.format(year,image_id))

def get_anno_path(voc_path, year, image_id):
    ''' xml地址'''
    return os.path.join(voc_path,'VOC{}/Annotations/{}.xml'.format(year,image_id))


if __name__ == '__main__':
    voc_path = '../VOCdevkit'
    year = '2007'
    train_txt_path = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
    with open(train_txt_path,'r')as f:
        train_ids = f.readlines()
    for id in train_ids:
        image_path = get_image_path(voc_path,year,id.strip())
        anno_path = get_anno_path(voc_path, year, id.strip())
        filename = image_path
        # print(image_path)
        # print(anno_path)
        image_data, height, width = process_image(image_path)
        boxes = process_anno(anno_path)
        example = conver_to_example(image_data,boxes,filename,height,width)
        print(type(example))
        # print(example)
 '''







Process finished with exit code 0

 '''

(6) 把数据集分块存储。process_dataset(name,image_paths,anno_paths,result_path,num_shards)

input  :
    name:'train' or 'test'.
    image_paths : List of paths to images to include in dataset.
    result_path : string.Path to put resulting TFRecord files.
    num_shards : int。Number of shards to split TFRecord files into.
output : 
    VOCdevkit/TFRecords/train/train-00000-of-00002
    VOCdevkit/TFRecords/train/train-00001-of-00002
    VOCdevkit/TFRecords/test/test-00000-of-00001
process:
1. 函数传入名称、图片地址、框地址、数据存储地址和数据分块儿存储的块儿数。
2. 划分每块儿数据的起始和终止的下标。
3. 根据临界下标存储数据

def process_dataset(name,image_paths,anno_paths,result_path,num_shards):
    ''' 把所有的数据转换成TFRecords
    name:'train' or 'test'.
    image_paths : List of paths to images to include in dataset.
    result_path : string.Path to put resulting TFRecord files.
    num_shards : int.Number of shards to split TFRecord files into.
    '''
    shard_ranges = np.linspace(0,len(image_paths),num_shards + 1).astype(int)  # 理解为每批次数据
    counter = 0     # 记录数据存储在第几块儿
    for shard in range(num_shards):
        output_filename = '{}-{:05d}-of-{:05d}'.format(name,shard,num_shards)
        output_file = os.path.join(result_path,output_filename)
        writer = tf.python_io.TFRecordWriter(output_file)

        shard_counter = 0
        files_in_shard = range(shard_ranges[shard],shard_ranges[shard+1]) # 第i块儿数据的第一个样本下标和最后一个样本下标
        for i in files_in_shard:  # 遍历数据块中的每一个样本下标
            image_file = image_paths[i]
            anno_file = anno_paths[i]

            image_data,height,width = process_image(image_file)
            boxes = process_anno(anno_file)
            example = conver_to_example(image_data,boxes,image_file,height,width)

            writer.write(example.SerializeToString())

            shard_counter += 1
            counter += 1

            if not counter%1000:
                print('{}: Processed {:d} of {:d}'.format(datetime.now(), counter, len(image_paths)))
        writer.close()
        print('{} : Wrote {} images to {}'.format(datetime.now(), shard_counter, output_filename))
    print('{} : Wrote {} images to {} shards'.
          format(datetime.now(), counter,num_shards))



if __name__ == '__main__':
    voc_path = '../VOCdevkit'
    year = '2007'
    result_path = '../VOCdevkit'
    train_txt_path = '../VOCdevkit/VOC2007/ImageSets/Main/train.txt'
    with open(train_txt_path,'r')as f:
        train_ids = f.readlines()
    image_paths = []
    anno_paths = []
    for id in train_ids:
        image_paths.append(get_image_path(voc_path,year,id.strip()))
        anno_paths.append(get_anno_path(voc_path, year, id.strip()))
    process_dataset('train',image_paths,anno_paths,result_path,num_shards=3)
    
'''
19:30:01.332069 : Wrote 2 images to train-00000-of-00003
19:30:01.348466 : Wrote 2 images to train-00001-of-00003
19:30:01.365998 : Wrote 2 images to train-00002-of-00003
19:30:01.366042 : Wrote 6 images to 3 shards

Process finished with exit code 0
'''

(7) _main(args)

 _main(args)是process_dataset()的前奏,主要是设置地址。

def _main(args):
    ''' Locate files for train and test sets and then generate TFRecords. '''
    voc_path = args.path_to_voc
    voc_path = os.path.expanduser(voc_path)  # 什么意思?
    result_path = os.path.join(voc_path, 'TFRecords')
    print('Saving results to {}'.format(result_path))

    train_path = os.path.join(result_path, 'train')
    test_path = os.path.join(result_path, 'test')

    train_ids = get_ids(voc_path, train_set)  # 2012 trainval
    test_ids = get_ids(voc_path, test_set)  # 2007 test
    train_ids_2007 = get_ids(voc_path, sets_from_2007)  # 2007 trainval
    total_train_ids = len(train_ids) + len(train_ids_2007)
    print('{} train examples and {} test examples'.format(total_train_ids,
                                                          len(test_ids)))

    train_image_paths = [
        get_image_path(voc_path, '2007', i) for i in train_ids
    ]
    train_image_paths.extend(
        [get_image_path(voc_path, '2007', i) for i in train_ids_2007])
    test_image_paths = [get_image_path(voc_path, '2007', i) for i in test_ids]

    train_anno_paths = [get_anno_path(voc_path, '2007', i) for i in train_ids]
    train_anno_paths.extend(
        [get_anno_path(voc_path, '2007', i) for i in train_ids_2007])
    test_anno_paths = [get_anno_path(voc_path, '2007', i) for i in test_ids]

    process_dataset(
        'train',
        train_image_paths,
        train_anno_paths,
        train_path,
        num_shards=2)
    process_dataset(
        'test', test_image_paths, test_anno_paths, test_path, num_shards=1)

if __name__ == '__main__':
    args = parser.parse_args()
    _main(args)

'''
Saving results to ../VOCdevkit/TFRecords
13 train examples and 1 test examples
2022-08-17 20:15:06.660740 : Wrote 6 images to train-00000-of-00002
2022-08-17 20:15:06.700461 : Wrote 7 images to train-00001-of-00002
2022-08-17 20:15:06.700501 : Wrote 13 images to 2 shards
2022-08-17 20:15:06.705831 : Wrote 1 images to test-00000-of-00001
2022-08-17 20:15:06.705862 : Wrote 1 images to 1 shards

Process finished with exit code 0
'''

2.1.3 数据增强

input  :
output : 
process:
1. 
2. 
3. 




2.1.4 制作标签

input  :
output : 
process:
1. 
2. 
3. 



2.2 训练

2.2.1 darknet19主体结构及主要模块儿

input  : [416,416,3]
output : [13,13,1000]

(1) DarknetConv2D_BN_Leaky:
conv2D(padding='same',kernel_regularizer=l2(5e-4),bias= False)
BatchNormalization(),
LeakyReLU(alpha=0.1)

(2) bottleneck_block:
DarknetConv2D_BN_Leaky(outer_filters, (3, 3)),
DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
DarknetConv2D_BN_Leaky(outer_filters, (3, 3)))

(3) bottleneck_x2_block:
bottleneck_block(outer_filters, bottleneck_filters),
DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
DarknetConv2D_BN_Leaky(outer_filters, (3, 3))

(4) darknet19:
DarknetConv2D_BN_Leaky(32, (3, 3)),    
MaxPooling2D(),
DarknetConv2D_BN_Leaky(64, (3, 3)),
MaxPooling2D(),
bottleneck_block(128, 64),
MaxPooling2D(),
bottleneck_block(256, 128),
MaxPooling2D(),
bottleneck_x2_block(512, 256),
MaxPooling2D(),
bottleneck_x2_block(1024, 512))
DarknetConv2D(1000, (1, 1), activation='softmax')(body)


import functools
from functools import partial
from keras.layers import Conv2D,MaxPooling2D
from keras.layers.advanced_activations import  LeakyReLU
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2
from yad2k.utils.utils import compose

_DarknetConv2D = partial(Conv2D, padding='same')


@functools.wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
    darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
    darknet_conv_kwargs.update(kwargs)
    return _DarknetConv2D(*args, **darknet_conv_kwargs)

def DarknetConv2D_BN_Leaky(*args, **kwargs):
    """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
    no_bias_kwargs = {'use_bias': False}
    no_bias_kwargs.update(kwargs)
    return compose(
        DarknetConv2D(*args, **no_bias_kwargs),
        BatchNormalization(),
        LeakyReLU(alpha=0.1))

def bottleneck_block(outer_filters, bottleneck_filters):
    """Bottleneck block of 3x3, 1x1, 3x3 convolutions."""
    return compose(
        DarknetConv2D_BN_Leaky(outer_filters, (3, 3)),
        DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
        DarknetConv2D_BN_Leaky(outer_filters, (3, 3)))

def bottleneck_x2_block(outer_filters, bottleneck_filters):
    """Bottleneck block of 3x3, 1x1, 3x3, 1x1, 3x3 convolutions."""
    return compose(
        bottleneck_block(outer_filters, bottleneck_filters),
        DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
        DarknetConv2D_BN_Leaky(outer_filters, (3, 3)))

def darknet_body():
    """Generate first 18 conv layers of Darknet-19."""
    return compose(
        DarknetConv2D_BN_Leaky(32, (3, 3)),
        MaxPooling2D(),
        DarknetConv2D_BN_Leaky(64, (3, 3)),
        MaxPooling2D(),
        bottleneck_block(128, 64),
        MaxPooling2D(),
        bottleneck_block(256, 128),
        MaxPooling2D(),
        bottleneck_x2_block(512, 256),
        MaxPooling2D(),
        bottleneck_x2_block(1024, 512))

def darknet19(inputs):
    """Generate Darknet-19 model for Imagenet classification."""
    body = darknet_body()(inputs)
    logits = DarknetConv2D(1000, (1, 1), activation='softmax')(body)
    return Model(inputs, logits)


if __name__ == '__main__':
    from keras.models import Input
    inputs = Input([416,416,3])
    model = darknet19(inputs)
    model.summary()


'''
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 416, 416, 3)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 416, 416, 32)      864       
_________________________________________________________________
batch_normalization_1 (Batch (None, 416, 416, 32)      128       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 416, 416, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 208, 208, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 208, 208, 64)      18432     
_________________________________________________________________
batch_normalization_2 (Batch (None, 208, 208, 64)      256       
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 208, 208, 64)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 104, 104, 64)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 104, 104, 128)     73728     
_________________________________________________________________
batch_normalization_3 (Batch (None, 104, 104, 128)     512       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 104, 104, 128)     0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 104, 104, 64)      8192      
_________________________________________________________________
batch_normalization_4 (Batch (None, 104, 104, 64)      256       
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 104, 104, 64)      0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 104, 104, 128)     73728     
_________________________________________________________________
batch_normalization_5 (Batch (None, 104, 104, 128)     512       
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 104, 104, 128)     0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 52, 52, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 52, 52, 256)       294912    
_________________________________________________________________
batch_normalization_6 (Batch (None, 52, 52, 256)       1024      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 52, 52, 256)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 52, 52, 128)       32768     
_________________________________________________________________
batch_normalization_7 (Batch (None, 52, 52, 128)       512       
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 52, 52, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 52, 52, 256)       294912    
_________________________________________________________________
batch_normalization_8 (Batch (None, 52, 52, 256)       1024      
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 52, 52, 256)       0         
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 26, 26, 256)       0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 26, 26, 512)       1179648   
_________________________________________________________________
batch_normalization_9 (Batch (None, 26, 26, 512)       2048      
_________________________________________________________________
leaky_re_lu_9 (LeakyReLU)    (None, 26, 26, 512)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 26, 26, 256)       131072    
_________________________________________________________________
batch_normalization_10 (Batc (None, 26, 26, 256)       1024      
_________________________________________________________________
leaky_re_lu_10 (LeakyReLU)   (None, 26, 26, 256)       0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 26, 26, 512)       1179648   
_________________________________________________________________
batch_normalization_11 (Batc (None, 26, 26, 512)       2048      
_________________________________________________________________
leaky_re_lu_11 (LeakyReLU)   (None, 26, 26, 512)       0         
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 26, 26, 256)       131072    
_________________________________________________________________
batch_normalization_12 (Batc (None, 26, 26, 256)       1024      
_________________________________________________________________
leaky_re_lu_12 (LeakyReLU)   (None, 26, 26, 256)       0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 26, 26, 512)       1179648   
_________________________________________________________________
batch_normalization_13 (Batc (None, 26, 26, 512)       2048      
_________________________________________________________________
leaky_re_lu_13 (LeakyReLU)   (None, 26, 26, 512)       0         
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 13, 13, 512)       0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 13, 13, 1024)      4718592   
_________________________________________________________________
batch_normalization_14 (Batc (None, 13, 13, 1024)      4096      
_________________________________________________________________
leaky_re_lu_14 (LeakyReLU)   (None, 13, 13, 1024)      0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 13, 13, 512)       524288    
_________________________________________________________________
batch_normalization_15 (Batc (None, 13, 13, 512)       2048      
_________________________________________________________________
leaky_re_lu_15 (LeakyReLU)   (None, 13, 13, 512)       0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 13, 13, 1024)      4718592   
_________________________________________________________________
batch_normalization_16 (Batc (None, 13, 13, 1024)      4096      
_________________________________________________________________
leaky_re_lu_16 (LeakyReLU)   (None, 13, 13, 1024)      0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 13, 13, 512)       524288    
_________________________________________________________________
batch_normalization_17 (Batc (None, 13, 13, 512)       2048      
_________________________________________________________________
leaky_re_lu_17 (LeakyReLU)   (None, 13, 13, 512)       0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 13, 13, 1024)      4718592   
_________________________________________________________________
batch_normalization_18 (Batc (None, 13, 13, 1024)      4096      
_________________________________________________________________
leaky_re_lu_18 (LeakyReLU)   (None, 13, 13, 1024)      0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 13, 13, 1000)      1025000   
=================================================================
Total params: 20,856,776
Trainable params: 20,842,376
Non-trainable params: 14,400
_________________________________________________________________

Process finished with exit code 0

'''

2.2.2 utils_ls.py

compose把多个函数嵌套使用,相当于torch.nn.Squence()


from functools import reduce

def compose(*funcs):
    if funcs:
        return reduce(lambda f,g:lambda *a,**kw: g(f(*a,**kw)),funcs)
    else:
        raise ValueError('Composition of empty sequence not supported.')

class add():
    def __init__(self,a,b):
        self.a = a
        self.b = b
    def __call__(self, x):
        return (a+b)*x


class mul():
    def __init__(self,a,b):
        self.a = a
        self.b = b
    def __call__(self, x):
        return a*b+x

if __name__ == '__main__':
    a,b = 2,3
    c = compose(add(a,b),
                mul(a,b),
                add(a,b),
                mul(a,b))
    print(c(2))

'''
86

Process finished with exit code 0
'''

2.2.3 keras_yolo_ls.py

(1) 所有代码

input  :
output : 
process:
1. 
2. 
3. 



(2) passthougth

高宽减半,通道数变成原来的四倍

input  : tensor[batch_size,heigh,width,channels]
output : tensor[batch_size,heigh//2,width//2,channels*4]
process:
1. 隔行隔列取元素组成新的特征图
2. 堆叠特征图 
处理前的一个特征图:
 1  2  3  4
 5  6  7  8
 9  10 11 12
13  14 15 16
经过passthougth后生成4个特征图:
1 3      2  4     5   7     10  12
9 10     6  8     13  15    14  16

def space_to_depth_x2(x):    # passthrough
    import tensorflow as tf
    return tf.space_to_depth(x,block_size=2)

x = tf.random.normal([8,26,26,10])
tf.space_to_depth(x,block_size=2)
'''
'''

(3)计算passthougth后特征成的shape

input  : 特征层的shape
output : 特征层经过passthrough后的shape
process:
1. 如果特征层高宽存在,高宽减半,通道数变成原来的4倍。
2. 如果特征层高不存在,高宽为None,通道数变成原来的4倍。

def space_to_depth_x2_output_shape(input_shape):
    return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 *
            input_shape[3])  if  input_shape[1] else(input_shape[0],None,None,4 *
                                                     input_shape[3])
 
 '''
space_to_depth_x2_output_shape([8,26,26,10])
Out[25]: (8, 13, 13, 40)
space_to_depth_x2_output_shape([8,0,0,10])
Out[26]: (8, None, None, 40)
space_to_depth_x2_output_shape([8,None,None,10])
Out[27]: (8, None, None, 40)
'''                                                    
                                                     

(4) yolo_body


input  :inputs = Input([416,416,3]),num_anchors=5,num_classes=20
output : m = yolo_body() (None, 13, 13, 125)
process:
1.  调取darknet19,对最后一层再卷积,对darknet19第43层做pathough后,两个特征层统一到相同尺寸再堆叠在一起。
2. 对上一步得到的特征层再卷积。

import sys
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.layers import Lambda
from keras.models import Model,Input
from keras.layers.merge import concatenate
from data_process.utils.utils_ls import compose
from data_process.models.keras_darknet19_ls import (DarknetConv2D,DarknetConv2D_BN_Leaky,darknet_body)

sys.path.append('..') # 这个是干什么的?
voc_anchors = np.array( [[1.08, 1.19], [3.42, 4.41],
                         [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]])

voc_classes = [
    "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
    "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

def space_to_depth_x2(x):    # passthrough
    import tensorflow as tf
    return tf.space_to_depth(x,block_size=2)

def space_to_depth_x2_output_shape(input_shape):
    return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 *
            input_shape[3])  if  input_shape[1] else(input_shape[0],None,None,4 *
                                                     input_shape[3])
def yolo_body(inputs,num_anchors,num_classes):
    darknet = Model(inputs,darknet_body()(inputs))
    conv20 = compose(DarknetConv2D_BN_Leaky(1024,(3,3)),
                     DarknetConv2D_BN_Leaky(1024,(3,3)))(darknet.output)

    conv13 = darknet.layers[43].output
    conv21 = DarknetConv2D_BN_Leaky(64,(1,1))(conv13)
    conv21_reshaped = Lambda(space_to_depth_x2,
                             output_shape=space_to_depth_x2_output_shape,
                             name='space_to_depth')(conv21)

    x = concatenate([conv21_reshaped,conv20])
    x = DarknetConv2D_BN_Leaky(1024,(3,3))(x)
    x = DarknetConv2D(num_anchors*(num_classes + 5),(1,1))(x)
    return Model(inputs,x)

if __name__ == '__main__':
    inputs = Input([416,416,3])
    m = yolo_body(inputs,num_anchors=5,num_classes=20)
    print(m.summary())
'''
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 416, 416, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 416, 416, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 416, 416, 32) 128         conv2d_1[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 416, 416, 32) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 208, 208, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 208, 208, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 208, 208, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 208, 208, 64) 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 104, 104, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 104, 104, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 104, 104, 128 512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 104, 104, 128 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 104, 104, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 104, 104, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 104, 104, 64) 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 104, 104, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 104, 104, 128 512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 104, 104, 128 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 52, 52, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 52, 52, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 52, 52, 256)  1024        conv2d_6[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 52, 52, 256)  0           batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 52, 52, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 52, 52, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 52, 52, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 52, 52, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 52, 52, 256)  1024        conv2d_8[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 52, 52, 256)  0           batch_normalization_8[0][0]      
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 26, 26, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 26, 26, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 26, 26, 512)  2048        conv2d_9[0][0]                   
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 26, 26, 512)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 26, 26, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 26, 26, 256)  1024        conv2d_10[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 26, 26, 256)  0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 26, 26, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 26, 26, 512)  2048        conv2d_11[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 26, 26, 512)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 26, 26, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 26, 26, 256)  1024        conv2d_12[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 26, 26, 256)  0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 26, 26, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 26, 26, 512)  2048        conv2d_13[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 26, 26, 512)  0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 13, 13, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 13, 13, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 13, 13, 1024) 4096        conv2d_14[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 13, 13, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 13, 13, 512)  2048        conv2d_15[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 13, 13, 512)  0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 13, 13, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 13, 13, 1024) 4096        conv2d_16[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_16[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 13, 13, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 13, 13, 512)  2048        conv2d_17[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 13, 13, 512)  0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 13, 13, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 13, 13, 1024) 4096        conv2d_18[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_18[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 13, 13, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 13, 13, 1024) 4096        conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 26, 26, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 26, 26, 64)   256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 13, 13, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 26, 26, 64)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 13, 13, 1024) 4096        conv2d_20[0][0]                  
__________________________________________________________________________________________________
space_to_depth (Lambda)         (None, 13, 13, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_20[0][0]     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 13, 13, 1280) 0           space_to_depth[0][0]             
                                                                 leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 13, 13, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 13, 13, 1024) 4096        conv2d_22[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 13, 13, 1024) 0           batch_normalization_22[0][0]     
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 13, 13, 125)  128125      leaky_re_lu_22[0][0]             
==================================================================================================
Total params: 50,676,061
Trainable params: 50,655,389
Non-trainable params: 20,672
__________________________________________________________________________________________________
None

Process finished with exit code 0
'''

(5) 解码。yolo_head(feats,anchors,num_classes)

input  :
    feats:模型特征层 [None,13,13 125],125 = 5*(4+1+20)
    anchors:先验框的宽高
    num_classes:类别数 20
output : 
    box_xy :Tensor("truediv:0", shape=(?, ?, ?, 5, 2), dtype=float32)
    box_wh :Tensor("truediv_1:0", shape=(?, ?, ?, 5, 2), dtype=float32)
    box_confidence :Tensor("Sigmoid_1:0", shape=(?, ?, ?, 5, 1), dtype=float32)
    box_class_probs :Tensor("Softmax:0", shape=(?, ?, ?, 5, 20), dtype=float32)
process:
1. 获取先验框中心点坐标。
2. 预测结果处理。
3. 解码。

def yolo_head(feats,anchors,num_classes):
    ''' Convert final layer features to bounding box parameters.
    inputs:
        feats: tensor, [None,13,13 125],
        anchors: array-like,Anchor box widths and heights.
        num_classes: int, Number of target classes.
    outputs:
        box_xy ,box_wh,box_conf ,box_class_pred
    '''
    num_anchors = len(anchors)
    anchors_tensor = K.reshape(K.variable(anchors),[1,1,1,num_anchors,2])
    # 1 先验框中心点坐标
    conv_dims = K.shape(feats)[1:3]  # hw
    conv_height_index = K.arange(0,stop=conv_dims[0])
    conv_width_index = K.arange(0,stop=conv_dims[1])
    conv_height_index = K.tile(conv_height_index, [conv_dims[1]])
    conv_width_index = K.tile(
        K.expand_dims(conv_width_index, 0), [conv_dims[0], 1])
    conv_width_index = K.flatten(K.transpose(conv_width_index))
    conv_index = K.transpose(K.stack([conv_height_index, conv_width_index]))
    conv_index = K.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2])
    conv_index = K.cast(conv_index, K.dtype(feats))
    # [bath,13,13,125]-->[bath,13,13,5,25]
    feats = K.reshape(
        feats, [-1, conv_dims[0], conv_dims[1], num_anchors, num_classes + 5])
    conv_dims = K.cast(K.reshape(conv_dims, [1, 1, 1, 1, 2]), K.dtype(feats))
    # 2 预测结果处理
    box_xy = K.sigmoid(feats[..., :2])
    box_wh = K.exp(feats[..., 2:4])
    box_confidence = K.sigmoid(feats[..., 4:5])
    box_class_probs = K.softmax(feats[..., 5:])
    # 3 解码
    box_xy = (box_xy + conv_index) / conv_dims
    box_wh = box_wh * anchors_tensor / conv_dims

    return box_xy, box_wh, box_confidence, box_class_probs


if __name__ == '__main__':
    feats = Input([416,416,3])# Input(tensor=K.random_uniform([1,416,416,3]),shape = [1,416,416,3])
    m = yolo_body(feats,num_anchors=5,num_classes=20)
    print(m.output)
    voc_anchors = np.array(
        [[1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]])
    box_xy, box_wh, box_confidence, box_class_probs =yolo_head(m.output, anchors=voc_anchors, num_classes=20)
    print(box_xy, box_wh, box_confidence, box_class_probs)

'''
Tensor("conv2d_23/BiasAdd:0", shape=(?, 13, 13, 125), dtype=float32)
Tensor("truediv:0", shape=(?, ?, ?, 5, 2), dtype=float32) Tensor("truediv_1:0", shape=(?, ?, ?, 5, 2), dtype=float32) Tensor("Sigmoid_1:0", shape=(?, ?, ?, 5, 1), dtype=float32) Tensor("Softmax:0", shape=(?, ?, ?, 5, 20), dtype=float32)

Process finished with exit code 0
'''

(6) YOLOV1模型yolo(inputs,anchors,num_classes)

input  :
    inputs:  img.shape=[b,h,w,3]
    anchors: 先验框的宽高.default(anchors.shape=[5,2])
    num_classes:类别数 default(cls=20)
output :
    box_xy:[b,h,w,2]
    box_wh:[b,h,w,2]
    box_confidence:[b,h,w,1]
    box_class_probs:[b,h,w,20]
process:
    1. yolo_body(darknet19)
    2. yolo_head

def yolo(inputs,anchors,num_classes):
    ''' yolo_body + yolo_head'''
    num_anchors = len(anchors)
    body = yolo_body(inputs,num_anchors,num_classes)
    outputs = yolo_head(body.output,anchors,num_classes)
    return outputs

(7) 框坐标转换[cx,cy,w,h]–>[y1,x1,y2,x2]

input  :
    box_xy:[b,h,w,2]
    box_wh:[b,h,w,2]
output : 
    box:[b,h,w,4]
process:
    1. 中心点坐标减去宽高的一半得到框的左上角坐标;
    2. 中心点坐标加上宽高的一半得到框的右下角坐标;
    3. 把坐标数据堆叠在一起。

def yolo_boxes_to_corners(box_xy,box_wh):
    box_mins = box_xy - (box_wh/2.)
    box_maxes = box_xy + (box_wh/2.)
    return K.concatenate(
            [box_mins[...,1:2], box_mins[...,0:1],
            box_maxes[...,1:2],box_maxes[...,0:1]])

(8) 制作标签 preprocess_true_boxes(true_boxes,anchors,image_size)

inputs:
    true_boxes:[n,4+1]
    anchors:   [5,2]
    image_size:[h,w]
outputs:
    detectors_mask:[h,w,n_anchor,1] [13,13,5,1]
    matching_true_boxes:真实框的坐标[h,w,n_anchor,5]
process:
    detectors_mask根据IoU来标记与真实框最匹配的先验框.matching_true_boxes存放预测偏移。
    1. 遍历每一个真实框,把真实框映射导特征层上,找出真实框所在的网格中心点。
    2. 遍历每个先验框,找出与真实框最匹配的先验框k,标记先验框的位置。detectors_mask[i,j,k]=1
    3. 计算中心点、宽高偏移,并将偏移值和物体类别放在matching_true_boxes中


def preprocess_true_boxes(true_boxes,anchors,image_size):
    ''' 制作标签
    
    '''
    height,width = image_size
    num_anchors = len(anchors)
    assert height % 32  == 0,'Image sizes in YOLO_v2 must be multiples of 32.'
    assert width % 32 == 0, 'Image sizes in YOLO_v2 must be multiples of 32.'
    conv_height = height // 32
    conv_width = width // 32
    num_box_params = true_boxes.shape[1]
    detectors_mask = np.zeros((conv_height, conv_width, num_anchors, 1),dtype=np.float32)
    matching_true_boxes = np.zeros(
        (conv_height, conv_width, num_anchors, num_box_params),dtype=np.float32)

    for box in true_boxes:
        box_class = box[4:5]
        box = box[0:4] * np.array(
            [conv_width, conv_height, conv_width, conv_height])
        i = np.floor(box[1]).astype('int')
        j = np.floor(box[0]).astype('int')
        best_iou = 0
        best_anchor = 0
        for k, anchor in enumerate(anchors):
            # Find IOU between box shifted to origin and anchor box.
            box_maxes = box[2:4] / 2.
            box_mins = -box_maxes
            anchor_maxes = (anchor / 2.)
            anchor_mins = -anchor_maxes

            intersect_mins = np.maximum(box_mins, anchor_mins)
            intersect_maxes = np.minimum(box_maxes, anchor_maxes)
            intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
            intersect_area = intersect_wh[0] * intersect_wh[1]
            box_area = box[2] * box[3]
            anchor_area = anchor[0] * anchor[1]
            iou = intersect_area / (box_area + anchor_area - intersect_area)
            if iou > best_iou:
                best_iou = iou
                best_anchor = k

        if best_iou > 0:
            detectors_mask[i, j, best_anchor] = 1
            adjusted_box = np.array(
                [
                    box[0] - j, box[1] - i,
                    np.log(box[2] / anchors[best_anchor][0]),
                    np.log(box[3] / anchors[best_anchor][1]), box_class
                ],
                dtype=np.float32)
            matching_true_boxes[i, j, best_anchor] = adjusted_box
    return detectors_mask, matching_true_boxes

if __name__=='__main__':
    true_boxes = np.array([[137,147,500,375,17],[1,154,132,374,8],
                         [147,163,394,375,14]])
    true_boxes[:,0:4] = true_boxes[:,0:4]/416                    
    voc_anchors = np.array(
        [[1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]])
    image_size = [416,416]
    detectors_mask, matching_true_boxes = preprocess_true_boxes(true_boxes,voc_anchors,image_size)
    print(detectors_mask.shape, matching_true_boxes.shape)
    print(detectors_mask[detectors_mask[...,0]>0])
    print(matching_true_boxes[matching_true_boxes[...,0]>0])
'''
(13, 13, 5, 1) (13, 13, 5, 5)
[[1.]
 [1.]]
[[ 0.078125    0.25        0.17982087  0.44401696  8.        ]
 [ 0.453125    0.40625    -0.19918266  0.25965098 14.        ]]

Process finished with exit code 0
'''

(9) 损失函数yolo_loss(args,anchors,num_classes,
rescore_confidence=False,print_loss=False)

inputs:
    args: yolo_output(feas), true_boxes([b,n,4+1]), detectors_mask(0~1)[13,13,5,1],与真实框匹配的anchor的mask matching_true_boxes[13,13,5,5],编码值
    anchors: tensor
    num_classes: int ,dfault=20.
    rescore_confidence: bool, default=False. if true ,confidence=IoU_best(GT,anchor)
    print_loss: bool, default=False. if true ,print the loss components.
outputs:
    mean_loss : float. mean localization loss across minibatch
process:
    1. 根据真实框和预测框的IoU得到object_detections。根据object_detections、detectors_mask计算正负样本损失。
    2. 类别损失。
    3. 框的偏移损失。

def yolo_loss(args,anchors,num_classes,
              rescore_confidence=False,print_loss=False):
    ''' YOLOV2 loss
    inputs:
        args: yolo_output(feas), true_boxes([b,n,4+1]), detectors_mask(0~1), matching_true_boxes
        anchors: tensor
        num_classes: int ,dfault=20.
        rescore_confidence: bool, default=False. if true ,confidence=IoU_best(GT,anchor)
        print_loss: bool, default=False. if true ,print the loss components.
    outputs:
        mean_loss : float. mean localization loss across minibatch
    '''
    (yolo_output, true_boxes, detectors_mask, matching_true_boxes) = args
    num_anchors = len(anchors)
    object_scale = 5
    no_object_scale = 1
    class_scale = 1
    coordinates_scale = 1
    pred_xy, pred_wh, pred_confidence, pred_class_prob = \
        yolo_head(yolo_output,anchors,num_classes)

    #1从feats提取sigmoid预测框 [b,h,w,125]-->[b,h,w,5,25]
    yolo_output_shape = K.shape(yolo_output)
    feats = K.reshape(yolo_output,[-1,yolo_output_shape[1],
                                   yolo_output_shape[2], num_anchors, num_classes+5])
    pred_boxes = K.concatenate((K.sigmoid(feats[...,0:2]),feats[...,2:4]),axis=-1)

    #2 解码的预测框[b,h,w,5,2]-->[b,h,w,5,1,2]
    pred_xy = K.expand_dims(pred_xy,4)
    pred_wh = K.expand_dims(pred_wh,4)
    # [x1,y1,x2,y2]-->[cx,cy,w,h]
    pred_mins = pred_xy - pred_wh/2.
    pred_maxes = pred_xy + pred_wh/2.

    #3 真实框 true_boxes([b,n,4+1]) -->[b,1,1,1,n,4]
    true_boxes_shape = K.shape(true_boxes)
    true_boxes = K.reshape(true_boxes, [
        true_boxes_shape[0], 1, 1, 1, true_boxes_shape[1], true_boxes_shape[2]
    ])
    true_xy = true_boxes[..., 0:2]   # [b,1,1,1,n,2]
    true_wh = true_boxes[..., 2:4]
    true_mins = true_xy - true_wh / 2.
    true_maxes = true_xy + true_wh / 2.

    #4 计算GT和Anchors的IoU
    intersect_mins = K.maximum(pred_mins,true_mins)
    intersect_maxes = K.minimum(pred_maxes,true_maxes) # [b,1,1,1,n,2]
    intersect_wh = K.maximum(intersect_maxes-intersect_mins,0)
    intersect_areas = intersect_wh[...,0]*intersect_wh[...,1] # [b,1,1,1,n]
    pred_areas = pred_wh[...,0]*pred_wh[...,1]
    true_areas = true_wh[...,0]*true_wh[...,1]
    iou_scores = intersect_areas/(pred_areas + true_areas - intersect_areas)
    # Best IOUs for each location
    best_ious = K.max(iou_scores, axis=4) # [b,1,1,1,n]-->[b,1,1,1]
    best_ious = K.expand_dims(best_ious)  # [b,1,1,1,1]
    object_detections = K.cast(best_ious > 0.6, K.dtype(best_ious))
    # 5 负样本损失
    no_object_weights = no_object_scale * (1-object_detections) * (1-detectors_mask)
    no_objects_loss = no_object_weights * K.square(1-pred_confidence)
    # 6 正样本损失
    if rescore_confidence:
        objects_loss = object_scale * detectors_mask * K.square(best_ious - pred_confidence)
    else:
        objects_loss = object_scale * detectors_mask * K.square(1 - pred_confidence)
    # 7 正负样本损失
    confidence_loss = objects_loss + no_objects_loss

    # 8 cls loss
    matching_classes = K.cast(matching_true_boxes[...,4],'int32')
    matching_classes = K.one_hot(matching_classes,num_classes)
    classification_loss = class_scale * detectors_mask * K.square(matching_classes - pred_class_prob)

    # 9 boxes loss
    matching_boxes = matching_true_boxes[...,0:4]
    coordinates_loss = coordinates_scale * detectors_mask * K.square(matching_boxes-pred_boxes)

    # total_loss
    confidence_loss_sum = K.sum(confidence_loss)
    classification_loss_sum = K.sum(classification_loss)
    coordinates_loss_sum = K.sum(coordinates_loss)
    total_loss = 0.5 * (confidence_loss_sum+classification_loss_sum+coordinates_loss_sum)
    if print_loss:
        total_loss = tf.Print(total_loss,
                              [total_loss,confidence_loss_sum,
                               classification_loss_sum,coordinates_loss_sum],
                               message= 'yolo_loss, conf_loss, class_loss, box_coord_loss:')
    return total_loss

if __name__ == '__main__':
    # # 1 mask and true_boxes_bias
    true_boxes = np.array([[137.,147,416,375,17],[1.,154,132,374,8],
                           [147.,163,394,375,14]])

    # true_boxes = K.cast(true_boxes,tf.float32)
    true_boxes[:,0:4] = true_boxes[:,0:4]/416
    cxy=(true_boxes[:,2:4]+true_boxes[:,:2])/2
    wh = (true_boxes[:,2:4]-true_boxes[:,:2])
    true_boxes[:,:2] = cxy
    true_boxes[:,2:4] = wh


    voc_anchors = np.array(
        [[1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]])
    image_size = [416,416]
    detectors_mask, matching_true_boxes = preprocess_true_boxes(true_boxes,voc_anchors,image_size)
    true_boxes = K.variable(true_boxes,tf.float32)
    true_boxes = K.expand_dims(true_boxes,0)

    # # 2 outputs
    num_classes = 20
    inputs = Input(tensor=K.random_uniform([1,416,416,3]))
    outputs = yolo(inputs,voc_anchors,num_classes)
    box_xy, box_wh, box_confidence, box_class_probs = outputs
    print(box_xy.shape, box_wh.shape, box_confidence.shape, box_class_probs.shape)
    # #  3 feats
    num_anchors = len(voc_anchors)
    yolo_output = yolo_body(inputs,num_anchors,num_classes)
    print(yolo_output.output)
    # #  4 loss
    args = (yolo_output.output, true_boxes, detectors_mask, matching_true_boxes)
    total_loss = yolo_loss(args,voc_anchors,num_classes,rescore_confidence=False,print_loss=False)
    print(total_loss)
'''
(1, 13, 13, 5, 2) (1, 13, 13, 5, 2) (1, 13, 13, 5, 1) (1, 13, 13, 5, 20)
Tensor("conv2d_46/BiasAdd:0", shape=(1, 13, 13, 125), dtype=float32)
Tensor("mul_11:0", shape=(), dtype=float32)

Process finished with exit code 0
'''

(10) 根据置信度对预测值筛选

input  :
    box :[b,h,w,4]
    box_confidence:[b,h,w,1]
    box_class_probs:[b,h,w,20]
output : 
    
process:
    1. 计算box_scores,根据box_scores得到prediction_mask
    2. 筛选boxes,scores,classes
def yolo_filter_boxes(boxes,box_confidence,box_class_probs,threshold=.6):
    ''' 根据score筛选  '''
    box_scores = box_confidence * box_class_probs
    box_classes = K.argmax(box_scores,axis=-1)
    box_class_scores = K.max(box_scores,axis=-1)
    prediction_mask = box_class_scores >= threshold

    boxes = tf.boolean_mask(boxes,prediction_mask)
    scores = tf.boolean_mask(box_class_scores,prediction_mask)
    classes = tf.boolean_mask(box_classes, prediction_mask)
    return boxes,scores,classes

if __name__ == '__main__':
    b,h,w=2,3,3
    boxes=np.random.random([b,h,w,4]);
    box_confidence=np.random.random([b,h,w,1]);
    box_class_probs=np.random.random([b,h,w,20]);
    threshold=.2
    boxes,scores,classes = yolo_filter_boxes(boxes,box_confidence,box_class_probs,threshold)
    print(boxes.shape,scores.shape,classes.shape)
'''
    (?, 4) (?,) (?,)
'''

(11)筛选预测值(confidence+NMS)

(scores + NMS)  筛选
def yolo_eval(yolo_outputs,image_shape,
              max_boxes=10,score_threshold=.6,iou_threshold=.5):
    ''' scores + NMS  筛选'''
    box_xy,box_wh,box_confidence,box_class_probs = yolo_outputs
    boxes = yolo_boxes_to_corners(box_xy, box_wh)
    boxes, scores, classes = yolo_filter_boxes(
        boxes, box_confidence, box_class_probs, threshold=score_threshold)

    height = image_shape[0]
    width = image_shape[1]
    image_dims = K.stack([height,width,height,width])
    image_dims = K.reshape(image_dims, [1, 4])
    boxes = boxes * image_dims

    max_boxes_tensor = K.variable(max_boxes, dtype='int32')
    K.get_session().run(tf.variables_initializer([max_boxes_tensor]))
    nms_index = tf.image.non_max_suppression(
        boxes, scores, max_boxes_tensor, iou_threshold=iou_threshold)
    boxes = K.gather(boxes, nms_index)
    scores = K.gather(scores, nms_index)
    classes = K.gather(classes, nms_index)
    return boxes, scores, classes

2.2.4 迁移学习

input  :
output : 
process:
1. 
2. 
3. 



(2)数据处理
process_data(images,boxes=None)

input  :
    ,
    
    
output : 
    image_data: shape(num_imgs, 416, 416, 3)
    boxes     : shape(num_imgs, 10, 5)
process:
    1. 图片统一到[416,416,3],转换成浮点数组,归一化到0~1.
    2. 真实框坐标转换为中心点和宽高,转换为框在对应图片上的相对位置。每张图片真实框数以框数图片的数量为准,不足的不零。方便后面操作。

import numpy as np
import tensorflow as tf
import io,os,PIL,h5py,argparse
from keras import backend as  K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from keras.callbacks import TensorBoard, ModelCheckpoint, EarlyStopping
from yad2k.models.keras_yolo import (preprocess_true_boxes, yolo_body,
                                     yolo_eval, yolo_head, yolo_loss)
from yad2k.utils.draw_boxes import draw_boxes

argparser = argparse.ArgumentParser(description="Retrain or 'fine-tune' a "
                                "pretrained YOLOv2 model for your own data")
argparser.add_argument('-d','--data_path',
                       help='dataset contain imgs and boxes in .hdf5 mode',
                       default= 'VOCdevkit/pascal_voc_07_12_LS.hdf5')
argparser.add_argument('-a','--anchors_path',
                       help='path to anchors file, defaults to yolo_anchors.txt',
                       default=os.path.join('model_data', 'yolo_anchors.txt'))
argparser.add_argument('-c','--classes_path',
                       help='path to classes file, defaults to pascal_classes.txt',
                       default=os.path.join('model_data', 'pascal_classes.txt'))

YOLO_ANCHORS = np.array(
    ((0.57273, 0.677385), (1.87446, 2.06253), (3.33843, 5.47434),
     (7.88282, 3.52778), (9.77052, 9.16828)))

def get_classes(classes_path):
    with open(classes_path) as f:
        class_name = f.readlines()
    class_name = [i.strip() for i in class_name]
    return class_name

def get_anchors(anchors_path):
    if os.path.isfile(anchors_path):
        with open(anchors_path) as f:
            anchors = f.readlines()
            anchors = [float(x.strip()) for x in anchors]
            return np.array(anchors).reshape(-1, 2)
    else:
        Warning('Could not open anchors file, using default.')
        return YOLO_ANCHORS

def process_data(images,boxes=None):
    images = [PIL.Image.open(io.BytesIO(i)) for i in images]
    orig_size = np.array([[i.width, i.height] for i in images])  #[num_imgs,2]

    processed_images = [i.resize((416, 416), PIL.Image.BICUBIC) for i in images]
    processed_images = [np.array(image, dtype=np.float) for image in processed_images]
    processed_images = [image/255. for image in processed_images]

    if boxes is not None:
        boxes = [box.reshape((-1,5)) for box in boxes]
        boxes_extents = [box[:, [2, 1, 4, 3, 0]] for box in boxes]
        boxes_xy = [0.5 * (box[:, 3:5] + box[:, 1:3]) for box in boxes]
        boxes_wh = [box[:, 3:5] - box[:, 1:3] for box in boxes]
        n = len(boxes_wh)
        boxes_xy = [boxes_xy[i] /orig_size[i] for i in range(n)]
        boxes_wh = [boxes_wh[i] /orig_size[i] for i in range(n)]
        boxes = [np.concatenate((boxes_xy[i], boxes_wh[i], box[:, 0:1]),axis=1) for i,box in enumerate(boxes)]

        max_boxes = 0
        for boxz in boxes:
            if boxz.shape[0] > max_boxes:
                max_boxes = boxz.shape[0]

        for i, boxz in enumerate(boxes):
            if boxz.shape[0] < max_boxes:
                zero_padding = np.zeros((max_boxes-boxz.shape[0], 5), dtype=np.float32)
                boxes[i] = np.vstack((boxz, zero_padding))
        return np.array(processed_images), np.array(boxes)
    else:
        return np.array(processed_images),,None


if __name__ == '__main__':
    args = argparser.parse_args()
    data = h5py.File(args.data_path, 'r')
    image_data, boxes = process_data(data['train/images'], data['train/boxes'])
    # print(get_anchors(args.anchors_path))
    print(image_data.shape, boxes.shape) # (13, 416, 416, 3) (13, 10, 5)
    image_data, boxes = process_data(data['train/images'])
    print(image_data.shape, type(boxes))


'''
(13, 416, 416, 3) (13, 10, 5)
(13, 416, 416, 3) 

Process finished with exit code 0
'''

(3) 制作标签

inputs:
     boxes:    (batch_size, max_num_box, 5) eg:(13, 10, 5)
     anchors:  (num_boxes , 2)              eg:(5, 2)
outputs:
    detectors_mask: (batch_size, h, w, num_boxes, 1)      eg:(13, 13, 13, 5, 1)
    matching_true_boxes:(batch_size, h, w, num_boxes, 5)  eg:(13, 13, 13, 5, 5)
process:
1. 遍历每张图片,计算每个图片中每个框在特征层的位置。
2. 计算与真实框最匹配的先验框,解码并标记有物体的位置。
3. 返回解码张量和标记张量。
注:(1)用遍历的方法计算与真实框最匹配的先验框有些麻烦,为什么不用矩阵。
(2)为什么每张图片的真实框数量都要一样(也限制了每个图片上检测的物体的数量。如果真实框少,就会有多余的框;反之,有些框会被舍去),是方便后面解码时用吗?这里每张图上最多有10个真实框,后面是怎么用的啊??????????


def get_detector_mask(boxes, anchors):
    '''
    真实框匹配的先验框,并计算真实偏移
    inputs:
         boxes:    (batch_size, max_num_box, 5) eg:(13, 10, 5)
         anchors:  (num_boxes , 2)              eg:(5, 2)
    outputs:
        detectors_mask: (batch_size, h, w, num_boxes, 1)      eg:(13, 13, 13, 5, 1)
        matching_true_boxes:(batch_size, h, w, num_boxes, 5)  eg:(13, 13, 13, 5, 5)
     '''
    detectors_mask = [0 for i in range(len(boxes))]
    matching_true_boxes = [0 for i in range(len(boxes))]
    for i,box in enumerate(boxes):
        detectors_mask[i],matching_true_boxes[i] = preprocess_true_boxes(box, anchors, [416, 416])
    return np.array(detectors_mask), np.array(matching_true_boxes)


if __name__ == '__main__':
    args = argparser.parse_args()
    data = h5py.File(args.data_path, 'r')
    anchors = get_anchors(args.anchors_path)
    image_data, boxes = process_data(data['train/images'], data['train/boxes'])
    # print(get_anchors(args.anchors_path))
    detectors_mask, matching_true_boxes = get_detector_mask(boxes, anchors)
    print(detectors_mask.shape, matching_true_boxes.shape) 

'''
(13, 13, 13, 5, 1) (13, 13, 13, 5, 5)

Process finished with exit code 0
'''

(4)导入模型

inputs:
    load_pretrained: whether or not to load the pretrained model or initialize all weights
    freeze_body: whether or not to freeze all weights except for the last layer's
outputs:
    model_body: YOLOv2 with new output layer
    model: YOLOv2 with custom loss Lambda layer
process:
    1. 加载模型参数
    2. 冻结部分层
    3. 生成损失函数
def create_model(anchors,class_names,load_pretrained = True,freeze_body = True):
    '''  returns the body of the model and the model
    # Params:
        load_pretrained: whether or not to load the pretrained model or initialize all weights
        freeze_body: whether or not to freeze all weights except for the last layer's
    # Returns:
        model_body: YOLOv2 with new output layer
        model: YOLOv2 with custom loss Lambda layer
'''
    detectors_mask_shape = (13,13,5,1)
    matching_true_shape = (13,13,5,5)
    image_input = Input(shape=(416,416,3))
    boxes_input = Input(shape=(None,5))
    detectors_mask_input = Input(shape=detectors_mask_shape)
    matching_boxes_input = Input(shape=matching_true_shape)

    yolo_model = yolo_body(image_input,len(anchors),len(class_names))
    topless_yolo = Model(yolo_model.input,yolo_model.layers[-1].output)

    if load_pretrained:
        topless_yolo_path = 'overfit_weights_ls.h5'  #
        if not os.path.exists(topless_yolo_path):
            print("CREATING TOPLESS WEIGHTS FILE")
            yolo_path = os.path.join('model_data', 'yolo.h5')
            model_body = load_model(yolo_path)
            model_body = Model(model_body.inputs, model_body.layers[-1].output)
            model_body.save_weights(topless_yolo_path)
        topless_yolo.load_weights(topless_yolo_path)

    if freeze_body:
        for layer in topless_yolo.layers:
            layer.trainable = False
    final_layer = Conv2D(len(anchors)*(5+len(class_names)),(1,1),activation='linear')(topless_yolo.output)

    model_body = Model(image_input,final_layer)
    # Place model loss on CPU to reduce GPU memory usage.
    with tf.device('/cpu:0'):
        model_loss = Lambda(
            yolo_loss,output_shape=(1,),name='yolo_loss',
            arguments={'anchors': anchors,
                      'num_classes': len(class_names)})([model_body.output,
                    boxes_input,detectors_mask_input, matching_boxes_input])
    model = Model([model_body.input, boxes_input, detectors_mask_input,
                   matching_boxes_input],model_loss)
    return model_body, model



if __name__ == '__main__':
    args = argparser.parse_args()
    data = h5py.File(args.data_path, 'r')
    anchors = get_anchors(args.anchors_path)
    class_names = get_classes(args.classes_path)
    image_data, boxes = process_data(data['train/images'], data['train/boxes'])
    # print(get_anchors(args.anchors_path))
    detectors_mask, matching_true_boxes = get_detector_mask(boxes, anchors)
    print(detectors_mask.shape, matching_true_boxes.shape) # (13, 416, 416, 3) (13, 10, 5)
    model_body, model = create_model(anchors, class_names)
    print(model_body)
    print(model)
'''



Process finished with exit code 0
'''

(5)训练

input  :
output : 
process:
1. 
2. 
3. 

def train(model,class_names,anchors,image_data,boxes, detectors_mask, matching_true_boxes, validation_split=0.1):
    '''
    retrain/fine-tune the model
    logs training with tensorboard
    saves training weights in current directory
    best weights according to val_loss is saved as trained_stage_3_best.h5
    '''
    model.compile(optimizer='adam',
                  loss={'yolo_loss':lambda y_true, y_pred: y_pred})
    logging = TensorBoard()
    checkpoint = ModelCheckpoint("trained_stage_3_best.h5", monitor='val_loss',
                                 save_weights_only=True, save_best_only=True)
    early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')

    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split=validation_split,
              batch_size=2,
              epochs=1,
              callbacks=[logging])
    model.save_weights('trained_stage_1.h5')

    model_body, model = create_model(anchors, class_names, load_pretrained=False, freeze_body=False)

    model.load_weights('trained_stage_3.h5')

    model.compile(
        optimizer='adam', loss={
            'yolo_loss': lambda y_true, y_pred: y_pred
        })  # This is a hack to use the custom loss function in the last layer.


    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split=0.1,
              batch_size=2,
              epochs=1,
              callbacks=[logging])

    model.save_weights('trained_stage_2_ls.h5')

    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split=0.1,
              batch_size=2,
              epochs=1,
              callbacks=[logging, checkpoint, early_stopping])

    model.save_weights('trained_stage_3_ls.h5')
    model.save('yolo_2.h5')
'''
 2/11 [====>.........................] - ETA: 21s - loss: 257.7118
 4/11 [=========>....................] - ETA: 11s - loss: 250.1349
 6/11 [===============>..............] - ETA: 6s - loss: 255.0045 
 8/11 [====================>.........] - ETA: 3s - loss: 250.6645
10/11 [==========================>...] - ETA: 1s - loss: 245.7306
11/11 [==============================] - 14s 1s/step - loss: 234.5688 - val_loss: 27475.1465

'''

(6)画图

input  :
output : 
process:
1. 
2. 
3. 


def draw(model_body, class_names, anchors, image_data, image_set='val',
            weights_name='trained_stage_3_best.h5', out_path="output_images", save_all=True):
    '''
    Draw bounding boxes on image data
    '''
    if image_set == 'train':
        image_data = np.array([np.expand_dims(image, axis=0)
            for image in image_data[:int(len(image_data)*.9)]])
    elif image_set == 'val':
        image_data = np.array([np.expand_dims(image, axis=0)
            for image in image_data[int(len(image_data)*.9):]])
    elif image_set == 'all':
        image_data = np.array([np.expand_dims(image, axis=0)
            for image in image_data])
    else:
        ValueError("draw argument image_set must be 'train', 'val', or 'all'")
    # model.load_weights(weights_name)
    print(image_data.shape)
    model_body.load_weights(weights_name)

    # Create output variables for prediction.
    yolo_outputs = yolo_head(model_body.output, anchors, len(class_names))
    input_image_shape = K.placeholder(shape=(2, ))
    boxes, scores, classes = yolo_eval(
        yolo_outputs, input_image_shape, score_threshold=0.07, iou_threshold=0.)

    # Run prediction on overfit image.
    sess = K.get_session()  # TODO: Remove dependence on Tensorflow session.

    if  not os.path.exists(out_path):
        os.makedirs(out_path)
    for i in range(len(image_data)):
        out_boxes, out_scores, out_classes = sess.run(
            [boxes, scores, classes],
            feed_dict={
                model_body.input: image_data[i],
                input_image_shape: [image_data.shape[2], image_data.shape[3]],
                K.learning_phase(): 0
            })
        print('Found {} boxes for image.'.format(len(out_boxes)))
        print(out_boxes)

        # Plot image with predicted boxes.
        image_with_boxes = draw_boxes(image_data[i][0], out_boxes, out_classes,
                                    class_names, out_scores)
        # Save the image:
        if save_all or (len(out_boxes) > 0):
            image = PIL.Image.fromarray(image_with_boxes)
            image.save(os.path.join(out_path,str(i)+'.png'))

        # To display (pauses the program):
        # plt.imshow(image_with_boxes, interpolation='nearest')
        # plt.show()

(7) 整合以上代码训练

画图部分中有重复代码,可以作为一个单独的模块儿使用,但是如果前面有训练代码,不用再训练模型。

process:
1. 导入数据,处理数据,制作标签。
2. 建立模型
3. 训练
4. 画图

def main(args):
    data_path = os.path.expanduser(args.data_path)
    classes_path = os.path.expanduser(args.classes_path)
    anchors_path = os.path.expanduser(args.anchors_path)

    class_names = get_classes(classes_path)
    anchors = get_anchors(anchors_path)
    data = h5py.File(data_path, 'r')
    image_data, boxes = process_data(data['train/images'], data['train/boxes'])
    
    detectors_mask, matching_true_boxes = get_detector_mask(boxes, anchors)
    model_body, model = create_model(anchors, class_names)

    train(model,class_names,anchors,image_data,boxes,detectors_mask,matching_true_boxes)
    draw(model_body,class_names,anchors,image_data,image_set='val',
         weights_name='trained_stage_3_best.h5', out_path="output_images", save_all=True)
    


if __name__ == '__main__':
    args = argparser.parse_args()
    main(args)
'''
 2/11 [====>.........................] - ETA: 1:29 - loss: 25.3031
 4/11 [=========>....................] - ETA: 55s - loss: 48.2596 
 6/11 [===============>..............] - ETA: 34s - loss: 36.1064
 8/11 [====================>.........] - ETA: 19s - loss: 30.9763
10/11 [==========================>...] - ETA: 6s - loss: 31.8478 
11/11 [==============================] - 72s 7s/step - loss: 29.7228 - val_loss: 416.9062
(2, 1, 416, 416, 3)

'''

2.3 测试

test_yolo.py
这个代码写得乱。




import numpy as np
from keras import backend as K
from keras.models import load_model,Input, Model
from PIL import Image,ImageDraw,ImageFont
import argparse,colorsys,imghdr,os,random
from data_process.models.keras_yolo_ls import yolo_eval,yolo_head, yolo_body


parser = argparse.ArgumentParser(
    description='Run a YOLO_v2 style detection model on test images..')
parser.add_argument('-m','--model_path',
    help='path to h5 model file containing body of a YOLO_v2 model',
    default= 'overfit_weights.h5')  # trained_stage_3.h5
parser.add_argument('-a','--anchors_path',
    help='path to anchors file, defaults to yolo_anchors.txt',
    default='model_data/yolo_anchors.txt')
parser.add_argument('-c','--classes_path',
    help='path to classes file, defaults to coco_classes.txt',
    default='model_data/pascal_classes.txt')
parser.add_argument('-t','--test_path',
    help='path to directory of test images, defaults to images/',
    default='images')  # default='VOCdevkit/pascal_voc_07_12_LS.hdf5'
parser.add_argument('-o','--output_path',
    help='path to output test images, defaults to images/out',default='images/out')
parser.add_argument('-s','--score_threshold',type=float,
    help='threshold for bounding box scores, default .3',default=.3)
parser.add_argument('-iou','--iou_threshold',type=float,
    help='threshold for non max suppression IOU, default .5',default=.5)

def _main(args):
    model_path = os.path.expanduser(args.model_path)     # overfit_weights_ls.h5
    assert model_path.endswith('.h5')
    anchors_path = os.path.expanduser(args.anchors_path) # model_data/yolo_anchors.txt
    classes_path = os.path.expanduser(args.classes_path) # model_data/pascal_classes.txt
    test_path = os.path.expanduser(args.test_path)       # images
    output_path = os.path.expanduser(args.output_path)   # images/out

    if not os.path.exists(output_path):
        print('Creating output path {}'.format(output_path))
        os.mkdir(output_path)

    sess = K.get_session()

    with open(classes_path) as f:
        class_names = f.readlines()
    class_name = [c.strip() for c in class_names]

    with open(anchors_path) as f:
        anchors = f.readlines()
    anchors = [float(x.strip()) for x in anchors]
    anchors = np.array(anchors).reshape(-1, 2)
    # print(anchors)
    # 建模,加载参数
    detectors_mask_shape = (13,13,5,1)
    matching_true_shape = (13,13,5,5)
    image_input = Input(shape=(416,416,3))
    boxes_input = Input(shape=(None,5))
    detectors_mask_input = Input(shape=detectors_mask_shape)
    matching_boxes_input = Input(shape=matching_true_shape)

    yolo_model = yolo_body(image_input,len(anchors),len(class_names))
    yolo_m = Model(yolo_model.input, yolo_model.output)
    yolo_m.load_weights(model_path)

    num_classes = len(class_names)
    num_anchors = len(anchors)
    model_output_channels = yolo_m.layers[-1].output_shape[-1]
    assert model_output_channels == num_anchors * (num_classes + 5), \
        'Mismatch between model and given anchor and class sizes. ' \
        'Specify matching anchors and classes with --anchors_path and ' \
        '--classes_path flags.'
    print('{} model, anchors, and classes loaded.'.format(model_path))

    model_image_size = yolo_m.layers[0].input_shape[1:3]
    is_fixed_size = model_image_size != (None, None)
    print(is_fixed_size)

    # Generate colors for drawing bounding boxes.
    hsv_tuples = [(x / len(class_names), 1., 1.)
                  for x in range(len(class_names))]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(
        map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
            colors))
    random.seed(10101)  # Fixed seed for consistent colors across runs.
    random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.
    random.seed(None)  # Reset seed to default.
    ################### =================== ###################
    yolo_outputs = yolo_head(yolo_m.output, anchors, len(class_names))
    input_image_shape = K.placeholder(shape=(2, ))
    boxes, scores, classes = yolo_eval(
        yolo_outputs,
        input_image_shape,
        score_threshold=args.score_threshold,
        iou_threshold=args.iou_threshold)

    for image_file in os.listdir(test_path):
        try:
            image_type = imghdr.what(os.path.join(test_path, image_file))
            if not image_type:
                continue
        except IsADirectoryError:
            continue

        image = Image.open(os.path.join(test_path, image_file))
        if is_fixed_size:  # TODO: When resizing we can use minibatch input.
            resized_image = image.resize(
                tuple(reversed(model_image_size)), Image.BICUBIC)
            image_data = np.array(resized_image, dtype='float32')
        else:
            # Due to skip connection + max pooling in YOLO_v2, inputs must have
            # width and height as multiples of 32.
            new_image_size = (image.width - (image.width % 32),
                              image.height - (image.height % 32))
            resized_image = image.resize(new_image_size, Image.BICUBIC)
            image_data = np.array(resized_image, dtype='float32')
            print(image_data.shape)

        image_data /= 255.
        image_data = np.expand_dims(image_data, 0)  # Add batch dimension.

        out_boxes, out_scores, out_classes = sess.run(
            [boxes, scores, classes],
            feed_dict={
                yolo_model.input: image_data,
                input_image_shape: [image.size[1], image.size[0]],
                K.learning_phase(): 0
            })
        print('Found {} boxes for {}'.format(len(out_boxes), image_file))

        font = ImageFont.truetype(
            font='font/FiraMono-Medium.otf',
            size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
        thickness = (image.size[0] + image.size[1]) // 300

        for i, c in reversed(list(enumerate(out_classes))):
            predicted_class = class_names[c]
            box = out_boxes[i]
            score = out_scores[i]

            label = '{} {:.2f}'.format(predicted_class, score)

            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)

            top, left, bottom, right = box
            top = max(0, np.floor(top + 0.5).astype('int32'))
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
            right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
            print(label, (left, top), (right, bottom))

            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            # My kingdom for a good redistributable image drawing library.
            for i in range(thickness):
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=colors[c])
            draw.rectangle(
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=colors[c])
            draw.text(text_origin, label, fill=(0, 0, 0), font=font)
            del draw

        image.save(os.path.join(output_path, image_file), quality=90)
    sess.close()





if __name__ == '__main__':
    args = parser.parse_args()
    _main(args)
    
'''
overfit_weights.h5 model, anchors, and classes loaded.
True
Found 10 boxes for dog.jpg
bicycle
 1.00 (0, 266) (-2147483648, 266)
car
 1.00 (0, 0) (-2147483648, 576)
cow
 1.00 (0, 266) (768, 266)
cow
 1.00 (0, 0) (-2147483648, -2147483648)
diningtable
 1.00 (354, 0) (354, 576)
cow
 1.00 (0, 0) (-2147483648, -2147483648)
diningtable
 1.00 (473, 0) (473, 0)
diningtable
 1.00 (236, 0) (236, 0)
diningtable
 1.00 (118, 0) (118, 0)
diningtable
 1.00 (59, 0) (59, 0)
Found 10 boxes for scream.jpg
cow
 1.00 (0, 0) (352, 448)
cow
 1.00 (0, 0) (352, -2147483648)
cow
 1.00 (0, 0) (352, 448)
cow
 1.00 (166, 69) (213, 69)
cow
 1.00 (0, 0) (352, 448)
diningtable
 1.00 (160, 0) (165, 448)
cow
 1.00 (0, 0) (352, 448)
cow
 1.00 (107, 0) (327, 448)
cow
 1.00 (0, 0) (352, 448)
cow
 1.00 (0, 0) (352, 448)
Found 10 boxes for eagle.jpg
cow
 1.00 (0, 394) (773, 394)
cow
 1.00 (0, 197) (773, 197)
cow
 1.00 (0, 394) (773, 394)
bicycle
 1.00 (0, 236) (-2147483648, 236)
cow
 1.00 (0, 0) (-2147483648, -2147483648)
cow
 1.00 (535, 0) (535, -2147483648)
cow
 1.00 (0, 0) (773, -2147483648)
cow
 1.00 (0, 0) (773, -2147483648)
cow
 1.00 (0, 0) (773, 512)
cow
 1.00 (0, 0) (773, 512)
Found 10 boxes for person.jpg
diningtable
 1.00 (345, 65) (345, 65)
cow
 1.00 (0, 337) (-2147483648, 337)
cow
 1.00 (0, 359) (640, 359)
cow
 1.00 (0, 196) (4, 196)
cow
 1.00 (249, 0) (341, -2147483648)
diningtable
 1.00 (148, 38) (148, 38)
cow
 1.00 (0, 0) (640, 424)
cow
 1.00 (0, 0) (640, -2147483648)
cow
 1.00 (0, 0) (640, 424)
cow
 1.00 (0, 0) (640, 424)
Found 10 boxes for giraffe.jpg
bicycle
 1.00 (0, 423) (-2147483648, 423)
car
 1.00 (0, 0) (-2147483648, -2147483648)
cow
 1.00 (0, 231) (0, 231)
diningtable
 1.00 (231, 0) (231, -2147483648)
diningtable
 1.00 (115, 38) (115, 38)
diningtable
 1.00 (308, 0) (308, 0)
diningtable
 1.00 (154, 0) (154, 0)
diningtable
 1.00 (77, 0) (77, 0)
aeroplane
 1.00 (0, 0) (0, -2147483648)
diningtable
 1.00 (0, 0) (0, 0)
Found 10 boxes for horses.jpg
cow
 1.00 (0, 0) (773, 512)
cow
 1.00 (0, 276) (773, 276)
cow
 1.00 (0, 276) (773, 276)
cow
 1.00 (0, 0) (773, 512)
cow
 1.00 (0, 0) (773, -2147483648)
cow
 1.00 (0, 0) (773, -2147483648)
cow
 1.00 (0, 0) (773, 512)
cow
 1.00 (0, 236) (556, 236)
cow
 1.00 (0, 0) (773, 512)
cow
 1.00 (124, 0) (233, 512)

Process finished with exit code 0

'''

2.4 总结

作者的代码相对于其模型的代码有些乱,思路不清晰。

I.收获:

  • (1)数据转化为 .hdf5格式,但是作者读取图片和 真实框数据的时候,依然用列表读取,这种方式不如torch用生成器读取数据效率高。
  • (2)作者写的主干模型较为清晰。
  • (3)生成标签的时候,找出与真实框最匹配的先验框时标记真实框的位置,比在计算损失时重新计算IoU时省计算步骤。

II .代码缺陷:

  • 用keras把代码都封装了,而且不是交互式的,对理解代码造成潜在阻碍。
  • 数据没有做数据增强处理
  • 训练代码乱,有些代码应该写成一个函数,这样思路更加简洁。
  • 计算每个真实框在原图上相对位置的时候,用的是第一张图片的宽高,而不是每个真实框对应的图片的宽高,框在图片上的相对位置有误,预测也会不准确。

你可能感兴趣的:(Pytorch,计算机视觉,深度学习,目标检测,计算机视觉)