git clone https://github.com/AITTSMD/MTCNN-Tensorflow
mtcnn为一个多任务训练,物体框和特征点数据分别为两个数据集,
数据集1标记里物体框位置,因此只用与训练物体检测。
数据集2标记有物体框,特征点,用于训练特征点。
输入数据每行分别为,
path to image, cls_label, bbox_label, landmark_label
对于数据集1,随机裁剪物体框,根据物体框与ground truth bbox的IOU值,得到正、负、部分样本集,对于数据集1,没有特征点,特征点用0补充。
对于数据集2,提取特征点,并采取crop,flip,rotate方式进行数据增益,预测的时候,根据预测框,然后预测特征点,所以,输入bbox用0补充。
在训练的时候,用一个label值,标记当前的数据是用于训练物体框类别,物体框位置,还是特征点。
For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].
For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].
For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).
For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].
下载训练数据,
http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/ ,人脸框数据。
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html,特征点数据,作者发现celeba数据有些错误,于是使用了,http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm。
将下载好的训练数据放在MTCNN-Tensorflow/prepare_data目录。
gen_12net_data.py
用于得到正、负样本框,对每张图片,得到50个负样本框(np.max(Iou) < 0.3),并将负样本框图片保存在 ‘…/…/DATA/12/negative’。
生成正样本框,iou >= 0.65,保存在…/…/DATA/12/positive/,u >= 0.4,保存在…/…/DATA/12/part/.
wider_face_train.txt为标注文件,每行为文件名,bbox1,bbox2,…,每个bbox对应为box的四个点的坐标,
im_path = annotation[0]
#print(im_path)
#boxed change to float type
bbox = list(map(float, annotation[1:]))
#gt
boxes = np.array(bbox, dtype=np.float32).reshape(-1, 4)
gen_landmark_aug_12.py
用于得到特征点,运行命令为,
python gen_landmark_aug_12.py
输入为,trainImageList.txt,保存有image path , bounding box, and landmarks.
文件的目的是根据bbox,裁剪人脸,根据裁剪的人脸,对特征点进行guiyihau,并将人脸,特征点保存。同时,对数据进行扩增,包括一定范围内随机crop人脸,flip,rotate等。
文件主要用于裁剪目标框图片,
f_face = img[bbox.top:bbox.bottom+1,bbox.left:bbox.right+1]
对特征点归一化,
#normalize land mark by dividing the width and height of the ground truth bounding box
# landmakrGt is a list of tuples
for index, one in enumerate(landmarkGt):
# (( x - bbox.left)/ width of bounding box, (y - bbox.top)/ height of bounding box
rv = ((one[0]-gt_box[0])/(gt_box[2]-gt_box[0]), (one[1]-gt_box[1])/(gt_box[3]-gt_box[1]))
# put the normalized value into the new list landmark
landmark[index] = rv
gen_imglist_pnet.py
将上面得到的样本框,特征点合并写入到一个文件中。
gen_PNet_tfrecords.py
将输入数据处理成pnet
gen_hard_example
读取文件wider_face_train_bbx_gt.txt,并用上面训练好的pnet模型检测文件中所有图像的目标物体框。使用这个目标物体框作为候选框,根据其与ground truth box的IOU值得到positive,negative,part训练样本。
gen_landmark_aug_24.py
从trainImageList.txt读取特征点,裁剪人脸框,并resize为24×24,用于rnet训练。
gen_imglist_rnet.py
合并正、负样本训练框,已经gen_landmark_aug_24.py得到的真是人脸框图像,特征点。
gen_RNet_tfrecords.py
训练数据转化为tfrecords形式。需要运行4次,分别生成neg,pos,part and landmark的tfrecords。
gen_hard_example
读取文件wider_face_train_bbx_gt.txt,并用上面训练好的pnet,rnet模型检测文件中所有图像的目标物体框。使用这个目标物体框作为候选框,根据其与ground truth box的IOU值得到positive,negative,part训练样本。
gen_landmark_aug_48.py
从trainImageList.txt读取特征点,裁剪人脸框,并resize为48×48,用于onet训练。
gen_imglist_rnet.py
合并正、负样本训练框,已经gen_landmark_aug_48.py得到的真是人脸框图像,特征点。
gen_ONet_tfrecords.py
训练数据转化为tfrecords形式。需要运行4次,分别生成neg,pos,part and landmark的tfrecords。
对于pnet,pos,part,landmark,neg的比例大约为1:1:1:3,所以可以将他们合并生成一个tfrecords文件用于训练。
而对于rnet,onet,他们的4类训练数据不平衡,所以,训练的时候,每个mibi batch,
read 64 samples from pos,part and landmark tfrecord and read 192 samples from neg tfrecord
训练数据保存格式,
[path to image][cls_label][bbox_label][landmark_label]
For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].也就是没有grount truth特征点,这部分用来训练框
For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].也就是没有grount truth特征点,这部分用来训练框
For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).也就是ground truth box,框直接用预测的框,这部分用来训练特征点
For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].
用cls_label区分个部分数据,用于后面计算利用个部分数据去计算损失函数。
训练数据读取
对于pos,net,part,landmark部分的训练数据,每个batch读取的数量是不一样的,
assert pos_batch_size != 0,"Batch Size Error "
part_batch_size = int(np.ceil(config.BATCH_SIZE*part_radio))
assert part_batch_size != 0,"Batch Size Error "
neg_batch_size = int(np.ceil(config.BATCH_SIZE*neg_radio))
assert neg_batch_size != 0,"Batch Size Error "
landmark_batch_size = int(np.ceil(config.BATCH_SIZE*landmark_radio))
assert landmark_batch_size != 0,"Batch Size Error "
batch_sizes = [pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size]
image_batch, label_batch, bbox_batch,landmark_batch = read_multi_tfrecords(dataset_dirs,batch_sizes, net)
读取batch数据,
def read_multi_tfrecords(tfrecord_files, batch_sizes, net):
pos_dir,part_dir,neg_dir,landmark_dir = tfrecord_files
pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size = batch_sizes
#assert net=='RNet' or net=='ONet', "only for RNet and ONet"
pos_image,pos_label,pos_roi,pos_landmark = read_single_tfrecord(pos_dir, pos_batch_size, net)
print(pos_image.get_shape())
part_image,part_label,part_roi,part_landmark = read_single_tfrecord(part_dir, part_batch_size, net)
print(part_image.get_shape())
neg_image,neg_label,neg_roi,neg_landmark = read_single_tfrecord(neg_dir, neg_batch_size, net)
print(neg_image.get_shape())
landmark_image,landmark_label,landmark_roi,landmark_landmark = read_single_tfrecord(landmark_dir, landmark_batch_size, net)
print(landmark_image.get_shape())
images = tf.concat([pos_image,part_image,neg_image,landmark_image], 0, name="concat/image")
print(images.get_shape())
labels = tf.concat([pos_label,part_label,neg_label,landmark_label],0,name="concat/label")
print
assert isinstance(labels, object)
labels.get_shape()
rois = tf.concat([pos_roi,part_roi,neg_roi,landmark_roi],0,name="concat/roi")
print( rois.get_shape())
landmarks = tf.concat([pos_landmark,part_landmark,neg_landmark,landmark_landmark],0,name="concat/landmark")
return images,labels,rois,landmarks
损失函数计算
物体框类别部分采用交叉损失熵损失函数,只是用pos,neg数据,
def cls_ohem(cls_prob, label):
zeros = tf.zeros_like(label)
#label=-1 --> label=0net_factory
#pos -> 1, neg -> 0, others -> 0
label_filter_invalid = tf.where(tf.less(label,0), zeros, label)
num_cls_prob = tf.size(cls_prob)
cls_prob_reshape = tf.reshape(cls_prob,[num_cls_prob,-1])
label_int = tf.cast(label_filter_invalid,tf.int32)
# get the number of rows of class_prob
num_row = tf.to_int32(cls_prob.get_shape()[0])
#row = [0,2,4.....]
row = tf.range(num_row)*2
indices_ = row + label_int
label_prob = tf.squeeze(tf.gather(cls_prob_reshape, indices_))
loss = -tf.log(label_prob+1e-10)
zeros = tf.zeros_like(label_prob, dtype=tf.float32)
ones = tf.ones_like(label_prob,dtype=tf.float32)
# set pos and neg to be 1, rest to be 0
valid_inds = tf.where(label < zeros,zeros,ones)
# get the number of POS and NEG examples
num_valid = tf.reduce_sum(valid_inds)
keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
#FILTER OUT PART AND LANDMARK DATA
loss = loss * valid_inds
loss,_ = tf.nn.top_k(loss, k=keep_num)
return tf.reduce_mean(loss)
并只用前num_keep_radio的数据用于训练。
bbox部分,只是用pos,part部分数据用于训练。
#label=1 or label=-1 then do regression
def bbox_ohem(bbox_pred,bbox_target,label):
'''
:param bbox_pred:
:param bbox_target:
:param label: class label
:return: mean euclidean loss for all the pos and part examples
'''
zeros_index = tf.zeros_like(label, dtype=tf.float32)
ones_index = tf.ones_like(label,dtype=tf.float32)
# keep pos and part examples
valid_inds = tf.where(tf.equal(tf.abs(label), 1),ones_index,zeros_index)
#(batch,)
#calculate square sum
square_error = tf.square(bbox_pred-bbox_target)
square_error = tf.reduce_sum(square_error,axis=1)
#keep_num scalar
num_valid = tf.reduce_sum(valid_inds)
#keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
# count the number of pos and part examples
keep_num = tf.cast(num_valid, dtype=tf.int32)
#keep valid index square_error
square_error = square_error*valid_inds
# keep top k examples, k equals to the number of positive examples
_, k_index = tf.nn.top_k(square_error, k=keep_num)
square_error = tf.gather(square_error, k_index)
return tf.reduce_mean(square_error)
landmark部分,只取label=-2,也就是landmark数据用于训练。
def landmark_ohem(landmark_pred,landmark_target,label):
'''
:param landmark_pred:
:param landmark_target:
:param label:
:return: mean euclidean loss
'''
#keep label =-2 then do landmark detection
ones = tf.ones_like(label,dtype=tf.float32)
zeros = tf.zeros_like(label,dtype=tf.float32)
valid_inds = tf.where(tf.equal(label,-2),ones,zeros)
square_error = tf.square(landmark_pred-landmark_target)
square_error = tf.reduce_sum(square_error,axis=1)
num_valid = tf.reduce_sum(valid_inds)
#keep_num = tf.cast(num_valid*num_keep_radio,dtype=tf.int32)
keep_num = tf.cast(num_valid, dtype=tf.int32)
square_error = square_error*valid_inds
_, k_index = tf.nn.top_k(square_error, k=keep_num)
square_error = tf.gather(square_error, k_index)
return tf.reduce_mean(square_error)