前言:
yolo系列的论文阅读
论文阅读 || 深度学习之目标检测 重磅出击YOLOv3
论文阅读 || 深度学习之目标检测yolov2
论文阅读 || 深度学习之目标检测yolov1
该篇讲解的工程连接是:
tensorflow的yolov3:https://github.com/YunYang1994/tensorflow-yolov3
自己对该工程的解析博客:
YOLOv3 || 1. 使用自己的数据集训练yolov3
YOLOv3 || 2. dataset.py解析
YOLOv3 || 3. dataset.py的多进程改造
YOLOv3 || 4. yolov3.py 网络结构的搭建和loss的定义
YOLOv3 || 5. train.py
YOLOv3 || 6. anchorboxes的获取 kmeans.py
YOLOv3 || 7. yolov3的pb文件的测试
该篇博客介绍的是yolov3的训练过程
1.1 代码概述
该脚本定义了yolov3的整个训练过程。是通过定义了
class YoloTrain(object):
来实现的。class YoloTrain(object): def __init__(self): """初始化训练所需的所有变量""" def train(self): """进行神经网络整个过程的训练"""
1.2
def __init__(self)
工程作者将训练过程中的操作,都定义在相应的域下。使代码更加有条理
- 1.神经网络的 placehoder
- 2.神经网络的 结构 和 loss
- 3.神经网络的 learn_rate
- 4.神经网络的 define_weight_decay
- 5.神经网络的 define_first_stage_train 的优化器
神经网络的 define_second_stage_train 的优化器- 6.神经网络的 loader_and_saver
- 7.神经网络的 summary
def __init__(self): self.anchor_per_scale = cfg.YOLO.ANCHOR_PER_SCALE # 每个尺度下的anchor数量 self.classes = utils.read_class_names(cfg.YOLO.CLASSES) # 物体分类的类别 self.num_classes = len(self.classes) # 物体分类类别的数量 self.learn_rate_init = cfg.TRAIN.LEARN_RATE_INIT # 初始学习率(最大学习率) self.learn_rate_end = cfg.TRAIN.LEARN_RATE_END # self.first_stage_epochs = cfg.TRAIN.FISRT_STAGE_EPOCHS # 第一阶段训练轮数 self.second_stage_epochs = cfg.TRAIN.SECOND_STAGE_EPOCHS # 第二阶段训练轮数 self.warmup_periods = cfg.TRAIN.WARMUP_EPOCHS # 预热训练的轮数 self.initial_weight = cfg.TRAIN.INITIAL_WEIGHT # 初始学习率 self.time = time.strftime('%Y-%m-%d-%H-%M-%S', time.localtime(time.time())) self.moving_ave_decay = cfg.YOLO.MOVING_AVE_DECAY # 滑动平均衰减 self.max_bbox_per_scale = 150 # 一张图片可存在bbox的数量的最大值 self.train_logdir = "./data/log/train" # 训练日志的路径 self.trainset = Dataset('train') # 实例化训练数据的读取 self.testset = Dataset('test') # 实例化预测数据的读取 self.steps_per_period = len(self.trainset) # 每轮中的步数 self.sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) # 创建会话
- 1【神经网络的placehoder】:yolov3使用的是不同尺寸的图片进行训练,所以对输入输出的placehoder并未设置shape
with tf.name_scope('define_input'): self.input_data = tf.placeholder(dtype=tf.float32, name='input_data') self.label_sbbox = tf.placeholder(dtype=tf.float32, name='label_sbbox') self.label_mbbox = tf.placeholder(dtype=tf.float32, name='label_mbbox') self.label_lbbox = tf.placeholder(dtype=tf.float32, name='label_lbbox') self.true_sbboxes = tf.placeholder(dtype=tf.float32, name='sbboxes') self.true_mbboxes = tf.placeholder(dtype=tf.float32, name='mbboxes') self.true_lbboxes = tf.placeholder(dtype=tf.float32, name='lbboxes') self.trainable = tf.placeholder(dtype=tf.bool, name='training')
- 2【神经网络的loss】:创建一个YOLOv3的对象,然后计算其loss。loss的详细操作在上一篇博客 yolov3的loss 中有解析
with tf.name_scope("define_loss"): self.model = YOLOV3(self.input_data, self.trainable) self.net_var = tf.global_variables() self.giou_loss, self.conf_loss, self.prob_loss = self.model.compute_loss( self.label_sbbox, self.label_mbbox, self.label_lbbox, self.true_sbboxes, self.true_mbboxes, self.true_lbboxes) self.loss = self.giou_loss + self.conf_loss + self.prob_loss
- 3【神经网络的learn_rate】:学习率的变化分两个阶段
– 预热阶段 warmup_periods:学习率—线性衰减的方式
– 正常训练阶段:学习率—余弦退火,使训练过程中更好的逃离局部最优with tf.name_scope('learn_rate'): self.global_step = tf.Variable(1.0, dtype=tf.float64, trainable=False, name='global_step') # 训练的warmup预数的数量 warmup_steps = tf.constant(self.warmup_periods * self.steps_per_period, dtype=tf.float64, name='warmup_steps') # 训练的所有步数的数量 train_steps = tf.constant( (self.first_stage_epochs + self.second_stage_epochs)* self.steps_per_period, dtype=tf.float64, name='train_steps') self.learn_rate = tf.cond( pred=self.global_step < warmup_steps, true_fn=lambda: self.global_step / warmup_steps * self.learn_rate_init, false_fn=lambda: self.learn_rate_end + 0.5 * (self.learn_rate_init - self.learn_rate_end) * (1 + tf.cos( (self.global_step - warmup_steps) / (train_steps - warmup_steps) * np.pi)) ) global_step_update = tf.assign_add(self.global_step, 1.0)
- 4【神经网络的define_weight_decay】:
滑动平均tf.train.ExponentialMovingAverage
的详细介绍在https://blog.csdn.net/magic_ll/article/details/107540550with tf.name_scope("define_weight_decay"): moving_ave = tf.train.ExponentialMovingAverage(self.moving_ave_decay).apply(tf.trainable_variables())
- 5【神经网络的define_first_stage_train】:训练的第一阶段
– 基于COCO数据集训练好的模型,使用自己的数据集,来微调最后一层的神经网络。所以该阶段设置时训练参数时,只将3个输出模块的最后一层的参数传入到优化器进行训练。
–tf.control_dependencies
该函数是指定某些操作的依赖关系。下面的这个例子表示:op_c/op_d的执行必须在op_a/op_b执行之后完成,以为就是说执行顺序为op_a, op_b, op_c, op_d。
–tf.no_op( )
:表示什么也不操作。with tf.control_dependencies([op_a, op_b]): op_c op_d
with tf.name_scope("define_first_stage_train"): # 将神经网络最后一层的参数,设置为可训练参数 self.first_stage_trainable_var_list = [] for var in tf.trainable_variables(): var_name = var.op.name var_name_mess = str(var_name).split('/') if var_name_mess[0] in ['conv_sbbox', 'conv_mbbox', 'conv_lbbox']: self.first_stage_trainable_var_list.append(var) # 设置训练的优化器,并将可训练参数传入 first_stage_optimizer = tf.train.AdamOptimizer(self.learn_rate).minimize(self.loss, var_list=self.first_stage_trainable_var_list) with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): with tf.control_dependencies([first_stage_optimizer, global_step_update]): with tf.control_dependencies([moving_ave]): self.train_op_with_frozen_variables = tf.no_op()
- 6【神经网络的define_second_stage_train】:训练的第二阶段
– 第二阶段就将所有可训练的参数,传入优化器中进行训练,使整个网络进行微调。
– 其他训练方面的设置与第一阶段一致with tf.name_scope("define_second_stage_train"): second_stage_trainable_var_list = tf.trainable_variables() second_stage_optimizer = tf.train.AdamOptimizer(self.learn_rate).minimize(self.loss, var_list=second_stage_trainable_var_list) with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): with tf.control_dependencies([second_stage_optimizer, global_step_update]): with tf.control_dependencies([moving_ave]): self.train_op_with_all_variables = tf.no_op()
- 7【神经网络的loader_and_saver】:
with tf.name_scope('loader_and_saver'): self.loader = tf.train.Saver(self.net_var) self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=10)
- 8【神经网络的summary】:
– 设置训练时的日志文件。with tf.name_scope('summary'): tf.summary.scalar("learn_rate", self.learn_rate) tf.summary.scalar("giou_loss", self.giou_loss) tf.summary.scalar("conf_loss", self.conf_loss) tf.summary.scalar("prob_loss", self.prob_loss) tf.summary.scalar("total_loss", self.loss) logdir = "./data/log/" if os.path.exists(logdir): shutil.rmtree(logdir) os.mkdir(logdir) self.write_op = tf.summary.merge_all() self.summary_writer = tf.summary.FileWriter(logdir, graph=self.sess.graph)
1.3
def train(self)
def train(self): self.sess.run(tf.global_variables_initializer()) try: print('=> Restoring weights from: %s ... ' % self.initial_weight) self.loader.restore(self.sess, self.initial_weight) except: print('=> %s does not exist !!!' % self.initial_weight) print('= > Now it starts to train YOLOV3 from scratch ...') self.first_stage_epochs = 0 for epoch in range(1, 1+self.first_stage_epochs+self.second_stage_epochs): if epoch <= self.first_stage_epochs: train_op = self.train_op_with_frozen_variables else: train_op = self.train_op_with_all_variables pbar = tqdm(self.trainset) train_epoch_loss, test_epoch_loss = [], [] for train_data in pbar: _, summary, train_step_loss, global_step_val = self.sess.run( [train_op, self.write_op, self.loss, self.global_step],feed_dict={ self.input_data: train_data[0], self.label_sbbox: train_data[1], self.label_mbbox: train_data[2], self.label_lbbox: train_data[3], self.true_sbboxes: train_data[4], self.true_mbboxes: train_data[5], self.true_lbboxes: train_data[6], self.trainable: True, }) train_epoch_loss.append(train_step_loss) self.summary_writer.add_summary(summary, global_step_val) pbar.set_description("train loss: %.2f" %train_step_loss) for test_data in self.testset: test_step_loss = self.sess.run( self.loss, feed_dict={ self.input_data: test_data[0], self.label_sbbox: test_data[1], self.label_mbbox: test_data[2], self.label_lbbox: test_data[3], self.true_sbboxes: test_data[4], self.true_mbboxes: test_data[5], self.true_lbboxes: test_data[6], self.trainable: False, }) test_epoch_loss.append(test_step_loss) train_epoch_loss, test_epoch_loss = np.mean(train_epoch_loss), np.mean(test_epoch_loss) ckpt_file = "./checkpoint/yolov3_test_loss=%.4f.ckpt" % test_epoch_loss log_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())) print("=> Epoch: %2d Time: %s Train loss: %.2f Test loss: %.2f Saving %s ..." %(epoch, log_time, train_epoch_loss, test_epoch_loss, ckpt_file)) # self.saver.save(self.sess, ckpt_file, global_step=epoch)