Yolo v7去年推出之后,取得了很好的性能。作者也公布了基于Pytorch实现的源代码。在我之前的几篇博客当中,对代码进行了深入的解析,了解了Yolo v7的技术细节和实现机制。因为我一直是用的Tensorflow,因此也想尝试把代码移植到Tensorflow上。
直接运行Yolo v7源代码里面的get_coco.sh脚本下载coco数据集,脚本代码如下:
#!/bin/bash
# COCO 2017 dataset http://cocodataset.org
# Download command: bash ./scripts/get_coco.sh
# Download/unzip labels
d='./' # unzip directory
url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
echo 'Downloading' $url$f ' ...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
# Download/unzip images
d='./coco/images' # unzip directory
url=http://images.cocodataset.org/zips/
f1='train2017.zip' # 19G, 118k images
f2='val2017.zip' # 1G, 5k images
f3='test2017.zip' # 7G, 41k images (optional)
for f in $f1 $f2 $f3; do
echo 'Downloading' $url$f '...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
done
wait # finish background tasks
数据下载完成之后,在images和labels目录下分别有train2017, val2017, test2017这三个子目录,对应训练/验证/测试数据。
然后我们可以基于Tensorflow来构建一个训练的数据集,需要对训练的图像进行增强,包括了包括了Mosaic拼接,随机拷贝图像,随机形变,色彩调整等,相应的图像里面的物体Label也要做相应的变换。具体的工作原理可以见我之前的博客,解读YOLO v7的代码(二)训练数据的准备-CSDN博客
这里我定义了一个Dataloader的类,负责对训练集的数据进行相应的图像增强处理,这里的处理过程和Yolov7源码的基本是一致的,只是做了一些小的修改,就是当做了Mosaic拼接之后,如果随机形变是进行缩小,那么有可能会出现物体的检测框超出图像的情况,这里我根据物体的segments数据进行了裁减,使得不会超出图像。
对于验证集的数据,我们不需要进行图像增强,只需要对图像的长边缩放到640即可,空白部分进行padding。Tensorflow的dataset的定义如下:
def map_val_fn(t: tf.Tensor):
filename = str(t.numpy(), encoding='utf-8')
imgid = int(filename[20:32])
# Load image
img, (h0, w0), (h, w) = load_image(filename)
#augment_hsv(img, hgain=hsv_h, sgain=hsv_s, vgain=hsv_v)
# Labels
label_filename = val_label_path + filename.split('/')[-1].split('.')[0] + '.txt'
labels, _ = load_labels(label_filename)
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, 0, 0) # normalized xywh to pixel xyxy format
labels[:, 1:5] = xyxy2xywh(labels[:, 1:5]) # convert xyxy to xywh
labels[:, 1:5] /= img_size # normalized height 0-1
img = img[:, :, ::-1].transpose(2,0,1)
img = img/255.
img_hw = tf.concat([h0, w0], axis=0)
return img, labels, img_hw, imgid
dataset_val = tf.data.Dataset.list_files("coco/images/val2017/*.jpg", shuffle=False)
dataset_val = dataset_val.map(
lambda x: tf.py_function(func=map_val_fn, inp=[x], Tout=[tf.float32, tf.float32, tf.int32, tf.int32]),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset_val = dataset_val\
.padded_batch(val_batch_size, padded_shapes=([3, img_size, img_size], [None, 5], [2], []), padding_values=(144/255., 0., 0, 0))\
.prefetch(tf.data.experimental.AUTOTUNE)
对于训练集的dataset,本来我也是打算按类似以上验证集的方式来定义,只是把map函数替换为对应的Dataloader里面的函数,具体代码可以见dataloader.py。但是我发现这种方式效率不高,在实际测试中发现,因为这个图像增强的过程比较复杂,CPU需要花费较多的事件处理,虽然Tensorflow dataset的map和prefetch提供了一个Autotune的参数可以进行并行处理的优化,但是效果不是太理想,还是出现GPU等待CPU处理完数据的情况。为此我自己写了一个并行处理的函数,利用Python multiprocessing的多进程函数,来对图像进行并行处理,当GPU在训练100个Batch的时候,CPU并行准备下100个Batch的训练数据,这样可以大幅提高性能。
具体做法是创建一个share memory给各个子进程共享,然后在训练集的图像中随机抽取一部分文件名,分配给几个子进程,每个子进程读取这些图像,进行相应的图像处理,以及对相应的图像Label文件进行处理,并把处理后的数据写入到Share memory的对应位置。最后有一个独立的子进程对Share memory的数据进行合并整理,然后就可以基于整理后的数据直接构建一个dataset了。
相关的代码如下:
#对传入的图像ID进行增强处理,并把结果写入到共享内存
def augment_data(imgids, datasize, memory_name, offset, q):
dataset = Dataloader(img_size, train_image_dir, train_label_dir, imgids, hyp)
traindata = dataset.generateTrainData(datasize)
traindata_obj = pickle.dumps(traindata, protocol=pickle.HIGHEST_PROTOCOL)
existing_shm = shared_memory.SharedMemory(name=memory_name)
existing_shm.buf[offset:offset+len(traindata_obj)] = traindata_obj
q.put((offset, offset+len(traindata_obj)))
existing_shm.close()
#对图像处理子进程的结果进行合并
def merge_subprocess(q, subprocess_num, memory_name):
results = []
while(True):
msg = q.get()
if msg is not None:
results.append(msg)
if len(results)>=subprocess_num:
break
else:
time.sleep(1)
existing_shm = shared_memory.SharedMemory(name=memory_name)
merge_data = []
for result in results:
merge_data.extend(pickle.loads(existing_shm.buf[result[0]:result[1]]))
merge_data_obj = pickle.dumps(merge_data, protocol=pickle.HIGHEST_PROTOCOL)
existing_shm.buf[:len(merge_data_obj)] = merge_data_obj
existing_shm.close()
q.put(len(merge_data_obj))
#启动多个子进程进行图像增强处理,并对结果进行汇总整理
def prepare_traindata(memory_name):
sample_imgid = sample(imgid_train, sample_len) #随机选取一部分训练集图像的文件名
subprocess_list = []
for i in range(subprocess_num): #启动多个子进程,分别对图像和Label进行处理
subprocess_list.append(
mp.Process(
target=augment_data,
args=(sample_imgid[i*imgid_num_process:(i+1)*imgid_num_process], data_size//subprocess_num, memory_name, i*shared_memory_size//subprocess_num, q, )
)
)
for p in subprocess_list:
p.start()
#启动子进程对处理结果进行汇总整理
p0 = mp.Process(target=merge_subprocess, args=(q, subprocess_num, memory_name,))
p0.start()
return p0
image_cache = shared_memory.SharedMemory(name="dataset", create=True, size=shared_memory_size) #创建共享内存
merge_proc = prepare_traindata("dataset")
#等待汇总子进程执行完毕,从Queue中获取数据size,并进行反序列化
merge_proc.join()
msg = q.get()
if msg>0:
traindata = pickle.loads(image_cache.buf[:msg])
else:
print("Could not load training data.")
image_cache.close()
image_cache.unlink()
image_cache.close()
image_cache.unlink()
def traindata_gen():
global traindata
i = 0
while i
构建一个YOLO v7的模型,模型的结构解读可见我之前的另一篇博客解读YOLO v7的代码(一)模型结构研究_gzroy的博客-CSDN博客
定义一个yolo.py文件,里面定义了模型的自定义层和对模型进行组装。
import tensorflow as tf
from tensorflow import keras
l=tf.keras.layers
from params import *
@tf.keras.utils.register_keras_serializable()
class YoloConv(keras.layers.Layer):
def __init__(self, filters, kernel_size, strides, padding='same', bias=False, activation='swish', **kwargs):
super(YoloConv, self).__init__(**kwargs)
self.activation = activation
self.filters = filters
self.kernel_size = kernel_size
self.strides = strides
self.padding = padding
self.bias = bias
self.cv = l.Conv2D(filters=self.filters,
kernel_size=self.kernel_size,
strides=self.strides,
padding=self.padding,
data_format='channels_first',
use_bias=self.bias,
kernel_initializer='he_normal',
kernel_regularizer=tf.keras.regularizers.l2(l=weight_decay))
self.bn = l.BatchNormalization(axis=1)
self.swish = l.Activation('swish')
def call(self, inputs, training):
output = self.cv(inputs)
output = self.bn(output, training)
if self.activation=='swish':
output = self.swish(output)
else:
output = output
return output
def get_config(self):
config = super(YoloConv, self).get_config()
config.update({
"activation": self.activation,
"filters": self.filters,
"kernel_size": self.kernel_size,
"strides": self.strides,
"padding": self.padding,
"bias": self.bias
})
return config
@tf.keras.utils.register_keras_serializable()
class Elan(keras.layers.Layer):
def __init__(self, filters, **kwargs):
super(Elan, self).__init__(**kwargs)
self.filters = filters
self.cv1 = YoloConv(self.filters, 1, 1)
self.cv2 = YoloConv(self.filters, 1, 1)
self.cv3 = YoloConv(self.filters, 3, 1)
self.cv4 = YoloConv(self.filters, 3, 1)
self.cv5 = YoloConv(self.filters, 3, 1)
self.cv6 = YoloConv(self.filters, 3, 1)
self.cv7 = YoloConv(self.filters*4, 1, 1)
self.concat = l.Concatenate(axis=1)
def call(self, inputs, training):
output1 = self.cv1(inputs, training)
output2 = self.cv2(inputs, training)
output3 = self.cv4(self.cv3(output2, training), training)
output4 = self.cv6(self.cv5(output3, training), training)
output = self.concat([output1, output2, output3, output4])
output = self.cv7(output, training)
return output
def get_config(self):
config = super(Elan, self).get_config()
config.update({
"filters": self.filters
})
return config
@tf.keras.utils.register_keras_serializable()
class MP(keras.layers.Layer):
def __init__(self, filters, k=2):
super(MP, self).__init__()
self.filters = filters
self.k = k
self.cv1 = YoloConv(filters, 1, 1)
self.cv2 = YoloConv(filters, 1, 1)
self.cv3 = YoloConv(filters, 3, 2)
self.pool = l.MaxPool2D(pool_size=self.k, strides=self.k, padding='same', data_format='channels_first')
self.concat = l.Concatenate(axis=1)
def call(self, inputs, training):
output1 = self.pool(inputs)
output1 = self.cv1(output1, training)
output2 = self.cv2(inputs, training)
output2 = self.cv3(output2, training)
output = self.concat([output1, output2])
return output
def get_config(self):
config = super(MP, self).get_config()
config.update({
"filters": self.filters,
"k": self.k
})
return config
@tf.keras.utils.register_keras_serializable()
class SPPCSPC(keras.layers.Layer):
def __init__(self, filters, e=0.5, k=(5,9,13)):
super(SPPCSPC, self).__init__()
self.filters = filters
self.e = e
self.k = k
c_ = int(2 * self.filters * self.e)
self.cv1 = YoloConv(c_, 1, 1)
self.cv2 = YoloConv(c_, 1, 1)
self.cv3 = YoloConv(c_, 3, 1)
self.cv4 = YoloConv(c_, 1, 1)
self.m = [l.MaxPool2D(pool_size=x, strides=1, padding='same', data_format='channels_first') for x in k]
self.cv5 = YoloConv(c_, 1, 1)
self.cv6 = YoloConv(c_, 3, 1)
self.cv7 = YoloConv(filters, 1, 1)
self.concat = l.Concatenate(axis=1)
def call(self, inputs, training):
output1 = self.cv4(self.cv3(self.cv1(inputs, training), training), training)
output2 = self.concat([output1] + [m(output1) for m in self.m])
output2 = self.cv6(self.cv5(output2, training), training)
output3 = self.cv2(inputs, training)
output = self.cv7(self.concat([output2, output3]), training)
return output
def get_config(self):
config = super(SPPCSPC, self).get_config()
config.update({
"filters": self.filters,
"k": self.k,
"e": self.e
})
return config
@tf.keras.utils.register_keras_serializable()
class Elan_A(keras.layers.Layer):
def __init__(self, filters):
super(Elan_A, self).__init__()
self.filters = filters
self.cv1 = YoloConv(filters, 1, 1)
self.cv2 = YoloConv(filters, 1, 1)
self.cv3 = YoloConv(filters//2, 3, 1)
self.cv4 = YoloConv(filters//2, 3, 1)
self.cv5 = YoloConv(filters//2, 3, 1)
self.cv6 = YoloConv(filters//2, 3, 1)
self.cv7 = YoloConv(filters, 1, 1)
self.concat = l.Concatenate(axis=1)
def call(self, inputs, training):
output1 = self.cv1(inputs, training)
output2 = self.cv2(inputs, training)
output3 = self.cv3(output2, training)
output4 = self.cv4(output3, training)
output5 = self.cv5(output4, training)
output6 = self.cv6(output5, training)
output7 = self.concat([output1, output2, output3, output4, output5, output6])
output = self.cv7(output7, training)
return output
def get_config(self):
config = super(Elan_A, self).get_config()
config.update({
"filters": self.filters,
})
return config
@tf.keras.utils.register_keras_serializable()
class RepConv(keras.layers.Layer):
def __init__(self, filters):
super(RepConv, self).__init__()
self.filters = filters
self.cv1 = YoloConv(filters, 3, 1, activation=None)
self.cv2 = YoloConv(filters, 1, 1, activation=None)
self.swish = l.Activation('swish')
def call(self, inputs, training):
output1 = self.cv1(inputs, training)
output2 = self.cv2(inputs, training)
output = self.swish(output1+output2)
return output
def get_config(self):
config = super(RepConv, self).get_config()
config.update({
"filters": self.filters,
})
return config
@tf.keras.utils.register_keras_serializable()
class IDetect(keras.layers.Layer):
def __init__(self, shape, no, na, grids):
super(IDetect, self).__init__()
#self.a = tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16)
self.a = tf.Variable(tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
self.m = tf.Variable(tf.random.normal((1,no*na,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
#self.a = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,shape,1,1))
#self.m = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,no*na,1,1))
self.cv = YoloConv(no*na, 1, 1, bias=True, activation=None)
self.shape = shape
self.no = no
self.na = na
self.grids = grids
self.reshape = l.Reshape([self.na, self.no, self.grids*self.grids])
#self.permute = l.Permute([1,3,4,2])
self.permute = l.Permute([1,3,2])
self.activation = l.Activation('linear', dtype='float32')
def call(self, inputs, training):
#output = l.Add()([inputs, self.a])
output = inputs + self.a
output = self.cv(output, training)
output = self.m * output
#output = self.cv(inputs)
#output = tf.reshape(output, [-1, self.na, self.no, self.grids, self.grids])
output = self.reshape(output)
#output = tf.transpose(output, perm=[0,1,3,4,2])
output = self.permute(output)
output = self.activation(output)
return output
def get_config(self):
config = super(IDetect, self).get_config()
config.update({
"no": self.no,
"na": self.na,
"grids": self.grids,
"shape": self.shape
})
return config
def create_model():
inputs = keras.Input(shape=(3, img_size, img_size))
x = YoloConv(32, 3, 1)(inputs) #[32, img_size, img_size]
x = YoloConv(64, 3, 2)(x) #[64, img_size/2, img_size/2]
x = YoloConv(64, 3, 1)(x) #[64, img_size/2, img_size/2]
x = YoloConv(128, 3, 2)(x) #[128, img_size/4, img_size/4]
x = Elan(64)(x) #11
x = MP(128)(x) #16
route1 = Elan(128)(x) #24
x = MP(256)(route1) #29
route2 = Elan(256)(x) #37
x = MP(512)(route2) #42
x = Elan(256)(x) #50
route3 = SPPCSPC(512)(x) #51
x = YoloConv(256, 1, 1)(route3)
x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
x = l.Concatenate(axis=1)([x, YoloConv(256, 1, 1)(route2)])
route4 = Elan_A(256)(x) #63
x = YoloConv(128, 1, 1)(route4)
x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
x = l.Concatenate(axis=1)([x, YoloConv(128, 1, 1)(route1)])
route5 = Elan_A(128)(x) #75, Connect to Detector 1
x = MP(128)(route5)
x = l.Concatenate(axis=1)([x, route4])
route6 = Elan_A(256)(x) #88, Connect to Detector 2
x = MP(256)(route6)
x = l.Concatenate(axis=1)([x, route3])
route7 = Elan_A(512)(x) #101, Connect to Detector 3
detect1 = RepConv(256)(route5)
detect2 = RepConv(512)(route6)
detect3 = RepConv(1024)(route7)
output1 = IDetect(256, 85, 3, 80)(detect1)
output2 = IDetect(512, 85, 3, 40)(detect2)
output3 = IDetect(1024, 85, 3, 20)(detect3)
output = l.Concatenate(axis=-2)([output1, output2, output3])
output = l.Activation('linear', dtype='float32')(output)
model = keras.Model(inputs=inputs, outputs=output, name="yolov7_model")
return model
YOLOv7对损失的定义可以见我另一篇文章的解读解读YOLO v7的代码(三)损失函数_gzroy的博客-CSDN博客
具体的定义在loss.py文件,我也是按照Yolov7的代码处理方式来进行tensorflow的改写,并且用了tf_function的封装来提高计算的效率, 代码如下:
import tensorflow as tf
import math
from test1 import batch_size, na, nl, img_size, stride, balance
from test1 import loss_box, loss_obj, loss_cls
from test1 import batch_no_constant, anchor_no_constant, anchors_reshape, anchor_t, anchors_constant, layer_no_constant
from test1 import val_batch_no_constant, val_layer_no_constant
from util import *
from params import *
#In param:
# p - predictions of the model, list of three detection level.
# labels - the label of the object, dimension [batch, boxnum, 5(class, xywh)]
#Out param:
# results - list of the suggest positive samples for three detection level.
# dimension for each element: [sample_number, 5(batch_no, anch_no, x, y, class)]
# anch - list of the anchor wh ratio for the positive samples
# dimension for each element: [sample_number, anchor_w, anchor_h]
@tf.function(
input_signature=(
[tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)]
)
)
def tf_find_3_positive(labels):
batch_no = tf.zeros_like(labels)[...,0:1] + batch_no_constant
targets = tf.concat((batch_no, labels), axis=-1) #targets dim [batch,box_num,6]
targets = tf.reshape(targets, [batch_size, 1, -1, 6]) #targets dim [batch,1,box_num,6]
targets = tf.tile(targets, [1,na,1,1])
anchor_no = anchor_no_constant + tf.reshape(tf.zeros_like(batch_no), [batch_size, 1, -1, 1])
targets = tf.concat([targets,anchor_no], axis=-1) #targets dim [batch,na,box_num,7(batch_no, cls, xywh, anchor_no)]
g = 0.5 # bias
offsets = tf.expand_dims(tf.constant([[0.,0.], [-1.,0.], [0.,-1.], [1.,0.], [0.,1.]]), axis=0) #offset dim [1,5,2]
gain = tf.constant([[1.,1.,80.,80.,80.,80.,1.], [1.,1.,40.,40.,40.,40.,1.], [1.,1.,20.,20.,20.,20.,1.]])
results = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
for i in tf.range(nl):
t = targets * tf.gather(gain, i)
r = t[..., 4:6] / tf.gather(anchors_reshape, i)
r_reciprocal = tf.math.reciprocal_no_nan(r) #1/r
r_max = tf.reduce_max(tf.math.maximum(r, r_reciprocal), axis=-1)
mask_t = tf.logical_and(r_max0)
t = t[mask_t]
# Offsets
gxy = t[:, 2:4] # grid xy
#gxi = gain[[2, 3]] - gxy # inverse
gxi = tf.gather(gain, i)[2:4] - gxy
mask_xy = tf.concat([
tf.ones([tf.shape(t)[0], 1], dtype=tf.bool),
((gxy % 1. < g) & (gxy > 1.)),
((gxi % 1. < g) & (gxi > 1.))
], axis=1)
t = tf.repeat(tf.expand_dims(t, axis=1), 5, axis=1)[mask_xy]
offsets_xy = (tf.expand_dims(tf.zeros_like(gxy, dtype=tf.float32), axis=1) + offsets)[mask_xy]
xy = t[...,2:4] + offsets_xy
from_which_layer = tf.ones_like(t[...,0:1]) * tf.dtypes.cast(i, tf.float32)
results = results.write(i, tf.dtypes.cast(tf.concat([t[...,0:1], t[...,-1:], xy[...,1:2], xy[...,0:1], t[...,1:2], from_which_layer], axis=-1), tf.int32))
anch = anch.write(i, tf.gather(tf.gather(anchors_constant, i), tf.dtypes.cast(t[...,-1], tf.int32)))
return results.concat(), anch.concat()
@tf.function(
input_signature=([
tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
])
)
def box_iou(box1, box2):
area1 = (box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])
area2 = (box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])
intersect_wh = tf.math.minimum(box1[:,None,2:], box2[:,2:]) - tf.math.maximum(box1[:,None,:2], box2[:,:2])
intersect_wh = tf.clip_by_value(intersect_wh, clip_value_min=0, clip_value_max=img_size)
intersect_area = intersect_wh[...,0]*intersect_wh[...,1]
iou = intersect_area/(area1[:,None]+area2-intersect_area)
return iou
@tf.function(
input_signature=([
tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
])
)
def bbox_ciou(box1, box2):
eps=1e-7
b1_x1, b1_x2 = box1[:,0]-box1[:,2]/2, box1[:,0]+box1[:,2]/2
b1_y1, b1_y2 = box1[:,1]-box1[:,3]/2, box1[:,1]+box1[:,3]/2
b2_x1, b2_x2 = box2[:,0]-box2[:,2]/2, box2[:,0]+box2[:,2]/2
b2_y1, b2_y2 = box2[:,1]-box2[:,3]/2, box2[:,1]+box2[:,3]/2
# Intersection area
inter = tf.clip_by_value(
tf.math.minimum(b1_x2, b2_x2) - tf.math.maximum(b1_x1, b2_x1),
clip_value_min=0,
clip_value_max=tf.float32.max) * tf.clip_by_value(
tf.math.minimum(b1_y2, b2_y2) - tf.math.maximum(b1_y1, b2_y1),
clip_value_min=0,
clip_value_max=tf.float32.max)
# Union Area
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
union = w1 * h1 + w2 * h2 - inter + eps
iou = inter / union
cw = tf.math.maximum(b1_x2, b2_x2) - tf.math.minimum(b1_x1, b2_x1) # convex (smallest enclosing box) width
ch = tf.math.maximum(b1_y2, b2_y2) - tf.math.minimum(b1_y1, b2_y1) # convex height
c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
(b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center distance squared
v = (4 / math.pi ** 2) * tf.math.pow(tf.math.atan(w2 / (h2 + eps)) - tf.math.atan(w1 / (h1 + eps)), 2)
alpha = v / (v - iou + (1 + eps))
return iou - (rho2 / c2 + v * alpha)
@tf.function(
input_signature=([
tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
])
)
def tf_build_targets(p, labels):
results, anch = tf_find_3_positive(labels)
#stride = tf.constant([8., 16., 32.])
grids = tf.dtypes.cast(img_size/stride, tf.int32)
pxyxys = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
p_obj = tf.TensorArray(tf.float32, size=nl, dynamic_size=True, element_shape=[None, 1])
p_cls = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
all_idx = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
from_which_layer = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
all_anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
matching_idxs = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
matching_targets = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
matching_anchs = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
matching_layers = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
for i in tf.range(nl):
idx_mask = results[...,-1]==i
idx = tf.boolean_mask(results, idx_mask)
layer_mask = layer_no_constant[...,0]==i
grid_no = tf.gather(grids, i)
pl = tf.boolean_mask(p, layer_mask)
pl = tf.reshape(pl, [batch_size, na, grid_no, grid_no, -1])
pi = tf.gather_nd(pl, idx[...,0:4])
anchors_p = tf.boolean_mask(anch, idx_mask)
p_obj = p_obj.write(i, pi[...,4:5])
p_cls = p_cls.write(i, pi[...,5:])
gij = tf.dtypes.cast(tf.concat([idx[...,3:4], idx[...,2:3]], axis=-1), tf.float32)
pxy = (tf.math.sigmoid(pi[...,:2])*2-0.5+gij)*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
pwh = (tf.math.sigmoid(pi[...,2:4])*2)**2*anchors_p*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
pxywh = tf.concat([pxy, pwh], axis=-1)
pxyxy = xywh2xyxy(pxywh)
pxyxys = pxyxys.write(i, pxyxy)
all_idx = all_idx.write(i, idx[...,0:4])
from_which_layer = from_which_layer.write(i, idx[..., -1:])
all_anch = all_anch.write(i, tf.boolean_mask(anch, idx_mask))
pxyxys = pxyxys.concat()
p_obj = p_obj.concat()
p_cls = p_cls.concat()
all_idx = all_idx.concat()
from_which_layer = from_which_layer.concat()
all_anch = all_anch.concat()
for i in tf.range(batch_size):
batch_mask = all_idx[...,0]==i
if tf.math.reduce_sum(tf.dtypes.cast(batch_mask, tf.int32)) > 0:
pxyxy_i = tf.boolean_mask(pxyxys, batch_mask)
target_mask = labels[i][...,3]>0
target = tf.boolean_mask(labels[i], target_mask)
txywh = target[...,1:] * img_size
txyxy = xywh2xyxy(txywh)
pair_wise_iou = box_iou(txyxy, pxyxy_i)
pair_wise_iou_loss = -tf.math.log(pair_wise_iou + 1e-8)
top_k, _ = tf.math.top_k(pair_wise_iou, tf.math.minimum(10, tf.shape(pair_wise_iou)[1]))
dynamic_ks = tf.clip_by_value(
tf.dtypes.cast(tf.math.reduce_sum(top_k, axis=-1), tf.int32),
clip_value_min=1,
clip_value_max=10)
gt_cls_per_image = tf.tile(
tf.expand_dims(
tf.one_hot(
tf.dtypes.cast(target[...,0], tf.int32), nc),
axis = 1),
[1,tf.shape(pxyxy_i)[0],1])
num_gt = tf.shape(target)[0]
cls_preds_ = (
tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_cls, batch_mask), 0), [num_gt, 1, 1])) *
tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_obj, batch_mask), 0), [num_gt, 1, 1]))) #dimension [labels_number, positive_targets_number, 80]
y = tf.math.sqrt(cls_preds_)
pair_wise_cls_loss = tf.math.reduce_sum(
tf.nn.sigmoid_cross_entropy_with_logits(
labels = gt_cls_per_image,
logits = tf.math.log(y/(1-y))),
axis = -1)
cost = (
pair_wise_cls_loss
+ 3.0 * pair_wise_iou_loss
)
matching_matrix = tf.zeros_like(cost) #dimension [labels_number, positive_targets_number]
matching_idx = tf.TensorArray(tf.int64, size=0, dynamic_size=True)
for gt_idx in tf.range(num_gt):
_, pos_idx = tf.math.top_k(
-cost[gt_idx], k=dynamic_ks[gt_idx], sorted=True)
X,Y = tf.meshgrid(gt_idx, pos_idx)
matching_idx = matching_idx.write(gt_idx, tf.dtypes.cast(tf.concat([X,Y], axis=-1), tf.int64))
matching_idx = matching_idx.concat()
'''
matching_matrix = tf.scatter_nd(
matching_idx,
tf.ones(tf.shape(matching_idx)[0]),
tf.dtypes.cast(tf.shape(cost), tf.int64))
'''
matching_matrix = tf.sparse.to_dense(
tf.sparse.reorder(
tf.sparse.SparseTensor(
indices=tf.dtypes.cast(matching_idx, tf.int64),
values=tf.ones(tf.shape(matching_idx)[0]),
dense_shape=tf.dtypes.cast(tf.shape(cost), tf.int64))
)
)
anchor_matching_gt = tf.reduce_sum(matching_matrix, axis=0) #dimension [positive_targets_number]
mask_1 = anchor_matching_gt>1 #it means one target match to several ground truths
if tf.reduce_sum(tf.dtypes.cast(mask_1, tf.int32)) > 0: #There is at least one positive target that predict several ground truth
#Get the lowest cost of the serveral ground truth of the target
#For example, there are 100 targets and 10 ground truths.
#The #5 target match to the #2 and #3 ground truth, the related cost are 10 for #2 and 20 for #3
#Then it will select #2 gound truth for the #5 target.
#mask_1 dimension [positive_targets_number]
#tf.boolean_mask(cost, mask_1, axis=1), dimension [ground_truth_numer, targets_predict_sevearl_GT_number]
cost_argmin = tf.math.argmin(
tf.boolean_mask(cost, mask_1, axis=1), axis=0) #in above example, the cost_argmin is [2]
m = tf.dtypes.cast(mask_1, tf.float32)
_, target_indices = tf.math.top_k(
m,
k=tf.dtypes.cast(tf.math.reduce_sum(m), tf.int32)) #in above example, the target_indices is [5]
#So will set the index [2,5] of matching_matrix to 1, and set the other elements of [:,5] to 0
target_matching_gt_indices = tf.concat(
[tf.reshape(tf.dtypes.cast(cost_argmin, tf.int32), [-1,1]), tf.reshape(target_indices, [-1,1])],
axis=1)
matching_matrix = tf.multiply(
matching_matrix,
tf.repeat(tf.reshape(tf.dtypes.cast(anchor_matching_gt<=1, tf.float32), [1,-1]), tf.shape(cost)[0], axis=0))
target_value = tf.sparse.to_dense(
tf.sparse.reorder(
tf.sparse.SparseTensor(
indices=tf.dtypes.cast(target_matching_gt_indices, tf.int64),
values=tf.ones(tf.shape(target_matching_gt_indices)[0]),
dense_shape=tf.dtypes.cast(tf.shape(matching_matrix), tf.int64)
)
)
)
matching_matrix = tf.add(matching_matrix, target_value)
fg_mask_inboxes = tf.math.reduce_sum(matching_matrix, axis=0)>0. #The mask for the targets that will use to predict
if tf.shape(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1))[0]>0:
matched_gt_inds = tf.math.argmax(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1), axis=0) #Get the related gt number for the target
all_idx_i = tf.boolean_mask(tf.boolean_mask(all_idx, batch_mask), fg_mask_inboxes)
from_which_layer_i = tf.boolean_mask(tf.boolean_mask(from_which_layer, batch_mask), fg_mask_inboxes)
all_anch_i = tf.boolean_mask(tf.boolean_mask(all_anch, batch_mask), fg_mask_inboxes)
matching_idxs = matching_idxs.write(i, all_idx_i)
matching_layers = matching_layers.write(i, from_which_layer_i)
matching_anchs = matching_anchs.write(i, all_anch_i )
matching_targets = matching_targets.write(i, tf.gather(target, matched_gt_inds))
else:
matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
else:
matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
matching_idxs = matching_idxs.concat()
matching_layers = matching_layers.concat()
matching_anchs = matching_anchs.concat()
matching_targets = matching_targets.concat()
filter_mask = matching_idxs[:,0]!=-1
matching_idxs = tf.boolean_mask(matching_idxs, filter_mask)
matching_layers = tf.boolean_mask(matching_layers, filter_mask)
matching_anchs = tf.boolean_mask(matching_anchs, filter_mask)
matching_targets = tf.boolean_mask(matching_targets, filter_mask)
#return pxyxys, all_idx, matching_idx, matching_matrix, all_idx_i, cost, pair_wise_iou, from_which_layer_i
return matching_idxs, matching_layers, matching_anchs, matching_targets
@tf.function(
input_signature=([
tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
])
)
def tf_loss_func(p, labels):
matching_idxs, matching_layers, matching_anchs, matching_targets = tf_build_targets(p, labels)
lcls, lbox, lobj = tf.zeros(1), tf.zeros(1), tf.zeros(1)
grids = img_size//stride
for i in tf.range(nl):
layer_mask = layer_no_constant[...,0]==i
grid = tf.gather(grids, i)
pi = tf.reshape(tf.boolean_mask(p, layer_mask), [batch_size, na, grid, grid, -1])
matching_layer_mask = matching_layers[:,0]==i
if tf.reduce_sum(tf.dtypes.cast(matching_layer_mask, tf.int32))==0:
continue
m_idxs = tf.boolean_mask(matching_idxs, matching_layer_mask)
if tf.shape(m_idxs)[0]==0:
continue
m_targets = tf.boolean_mask(matching_targets, matching_layer_mask)
m_anchs = tf.boolean_mask(matching_anchs, matching_layer_mask)
ps = tf.gather_nd(pi, m_idxs)
pxy = tf.math.sigmoid(ps[:,:2])*2-0.5
pwh = (tf.math.sigmoid(ps[:,2:4])*2)**2*m_anchs
pbox = tf.concat([pxy,pwh], axis=-1)
#selected_tbox = tf.gather_nd(labels, matching_targets[i])[:, 1:]
selected_tbox = m_targets[:, 1:]
selected_tbox = tf.multiply(selected_tbox, tf.dtypes.cast(grid, tf.float32))
tbox_grid = tf.concat([
tf.dtypes.cast(m_idxs[:,3:4], tf.float32),
tf.dtypes.cast(m_idxs[:,2:3], tf.float32),
tf.zeros((tf.shape(m_idxs)[0],2))],
axis=-1)
selected_tbox = tf.subtract(selected_tbox, tbox_grid)
iou = bbox_ciou(pbox, selected_tbox)
lbox += tf.math.reduce_mean(1.0 - iou) # iou loss
# Objectness
tobj = tf.sparse.to_dense(
tf.sparse.reorder(
tf.sparse.SparseTensor(
indices = tf.dtypes.cast(m_idxs, tf.int64),
values = (1.0 - gr) + gr * tf.clip_by_value(tf.stop_gradient(iou), clip_value_min=0, clip_value_max=tf.float32.max),
dense_shape = tf.dtypes.cast(tf.shape(pi[..., 0]), tf.int64)
)
), validate_indices=False
)
# Classification
tcls = tf.one_hot(
indices = tf.dtypes.cast(m_targets[:,0], tf.int32),
depth = 80,
dtype = tf.float32
)
lcls += tf.math.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(
labels = tcls,
logits = ps[:, 5:]
)
)
'''
lcls += tf.math.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
labels = tf.dtypes.cast(m_targets[:,0], tf.int32),
logits = ps[:, 5:]
)
)
'''
obji = tf.math.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(
labels = tobj,
logits = pi[..., 4]
)
)
lobj += obji * tf.gather(balance, i)
lbox *= loss_box
lobj *= loss_obj
lcls *= loss_cls
loss = (lbox + lobj + lcls) * batch_size
return loss
@tf.function(
input_signature=([
tf.TensorSpec(shape=[None, na, 8400, 85], dtype=tf.float32),
tf.TensorSpec(shape=[None, None, 5], dtype=tf.float32),
tf.TensorSpec(shape=[None, 2], dtype=tf.int32),
tf.TensorSpec(shape=[None], dtype=tf.int32),
])
)
def tf_predict_func(predictions, labels, imgs_hw, imgs_id):
grids = img_size // stride
batch_size = tf.shape(predictions)[0]
confidence_threshold = 0.2
probabilty_threshold = 0.8
all_predict_result = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
boxes_result = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
imgs_info = tf.TensorArray(tf.int32, size=0, dynamic_size=True)
for i in tf.range(nl):
grid = tf.gather(grids, i)
grid_x, grid_y = tf.meshgrid(tf.range(grid, dtype=tf.float32), tf.range(grid, dtype=tf.float32))
grid_x = tf.reshape(grid_x, [-1, 1])
grid_y = tf.reshape(grid_y, [-1, 1])
#grid_xy = tf.concat([grid_y, grid_x], axis=-1)
grid_xy = tf.concat([grid_x, grid_y], axis=-1)
grid_xy = tf.reshape(grid_xy, [1,1,-1,2])
layer_mask = val_layer_no_constant[...,0]==i
#grid = tf.gather(grids, i)
predict_layer = tf.boolean_mask(predictions, layer_mask)
predict_layer = tf.reshape(predict_layer, [batch_size, na, -1, 85])
predict_conf = tf.math.sigmoid(predict_layer[...,4:5])
predict_xy = (tf.math.sigmoid(predict_layer[...,:2])*2-0.5 + \
tf.dtypes.cast(grid_xy,tf.float32))*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
predict_wh = (tf.math.sigmoid(predict_layer[...,2:4])*2)**2*\
tf.reshape(tf.gather(anchors_constant,i), [1,na,1,2])*\
tf.dtypes.cast(tf.gather(stride, i), tf.float32)
predict_xywh = tf.concat([predict_xy, predict_wh], axis=-1)
predict_xyxy = xywh2xyxy(predict_xywh)
predict_cls = tf.reshape(tf.argmax(predict_layer[...,5:], axis=-1), [batch_size, na, -1, 1])
predict_cls = tf.dtypes.cast(predict_cls, tf.float32)
predict_proba = tf.nn.sigmoid(
tf.reduce_max(
predict_layer[...,5:], axis=-1, keepdims=True
)
)
batch_no = tf.expand_dims(tf.tile(tf.gather(val_batch_no_constant, tf.range(batch_size)), [1,na,grid*grid]), -1)
predict_result = tf.concat([batch_no, predict_conf, predict_xyxy, predict_cls, predict_proba], axis=-1)
mask = tf.math.logical_and(
predict_result[...,1]>=confidence_threshold,
predict_result[...,-1]>=probabilty_threshold
)
predict_result = tf.boolean_mask(predict_result, mask)
#tf.print(tf.shape(predict_result))
if tf.shape(predict_result)[0] > 0:
all_predict_result = all_predict_result.write(i, predict_result)
#tf.print(tf.shape(predict_result))
else:
all_predict_result = all_predict_result.write(i, tf.zeros(shape=[1,8]))
all_predict_result = all_predict_result.concat()
#return all_predict_result
for i in tf.range(batch_size):
batch_mask = tf.math.logical_and(
all_predict_result[...,0]==tf.dtypes.cast(i, tf.float32),
all_predict_result[...,1]>0
)
predict_true_box = tf.boolean_mask(all_predict_result, batch_mask)
if tf.shape(predict_true_box)[0]==0:
continue
original_hw = tf.dtypes.cast(tf.gather(imgs_hw, i), tf.float32)
ratio = tf.dtypes.cast(tf.reduce_max(original_hw/img_size), tf.float32)
predict_classes, _ = tf.unique(predict_true_box[:,6])
#predict_classes_list = tf.unstack(predict_classes)
#for class_id in predict_classes_list:
for j in tf.range(tf.shape(predict_classes)[0]):
#class_mask = tf.math.equal(predict_true_box[:, 6], class_id)
class_mask = tf.math.equal(predict_true_box[:, 6], tf.gather(predict_classes, j))
predict_true_box_class = tf.boolean_mask(predict_true_box, class_mask)
predict_true_box_xy = predict_true_box_class[:, 2:6]
predict_true_box_score = predict_true_box_class[:, 7]*predict_true_box_class[:, 1]
#predict_true_box_score = predict_true_box_class[:, 1]
selected_indices = tf.image.non_max_suppression(
predict_true_box_xy,
predict_true_box_score,
100,
iou_threshold=0.2
#score_threshold=confidence_threshold
)
#Shape [box_num, 7]
selected_boxes = tf.gather(predict_true_box_class, selected_indices)
#boxes_result = boxes_result.write(boxes_result.size(), selected_boxes)
boxes_xyxy = selected_boxes[:,2:6]*ratio
boxes_x1 = tf.clip_by_value(boxes_xyxy[:,0:1], 0., original_hw[1])
boxes_x2 = tf.clip_by_value(boxes_xyxy[:,2:3], 0., original_hw[1])
boxes_y1 = tf.clip_by_value(boxes_xyxy[:,1:2], 0., original_hw[0])
boxes_y2 = tf.clip_by_value(boxes_xyxy[:,3:4], 0., original_hw[0])
boxes_w = boxes_x2 - boxes_x1
boxes_h = boxes_y2 - boxes_y1
boxes = tf.concat([selected_boxes[:,0:2], boxes_x1, boxes_y1, boxes_w, boxes_h, selected_boxes[:,6:8]], axis=-1)
boxes_result = boxes_result.write(boxes_result.size(), boxes)
img_id = tf.gather(imgs_id, i)
imgs_info = imgs_info.write(imgs_info.size(), tf.reshape(tf.stack([i, img_id]), [-1,2]))
if boxes_result.size()==0:
boxes_result = boxes_result.write(0, tf.zeros(shape=[1,8]))
if imgs_info.size()==0:
imgs_info = imgs_info.write(0, tf.dtypes.cast(tf.zeros(shape=[1,2]), tf.int32))
return boxes_result.concat(), imgs_info.concat()
最后就是对模型进行训练和验证了,这里也是按照YOLOv7的实现方式来进行训练,验证的时候是采用pycocotools工具来进行mAP的计算。具体可以参见train.py文件
因为模型是对640*640大小的图像进行训练,对GPU的显存要求很大。在我本地的2080Ti显卡,11G内存的情况下,开启混合精度,只能设置Batch size为8,训练效果不是很理想。为此我在autodl平台租用了一个V100的32G显存的GPU来进行测试(价格是每小时2.28元),Batch size设置为32。感觉Batch size对模型的训练效果还是有比较大的影响的。最终经过了20多个epoch的训练,每个Epoch大概要训练1个小时多一点,大概花费了1天的时间,结果如下:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.270
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.411
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.289
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.162
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.476
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.338
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.661
以下是对验证集的一些图片的预测结果,
按照Yolov7论文的描述,训练了300个epoch之后,mAP all能达到60%,继续训练可以进一步提高准确率,不过限于时间和资源,我就暂时训练到这个地步。
最后,我的源码都放在了Github的仓库,GitHub - gzroy/yolov7_tf2: Yolov7 implementation on tensorflow 2.x