tensorflow estimator 使用hook实现finetune

    • model_fn里面定义好模型之后直接赋值
    • 使用钩子 hooks


为了实现finetune有如下两种解决方案:

model_fn里面定义好模型之后直接赋值

 def model_fn(features, labels, mode, params):
    # .....
    # finetune
    if params.checkpoint_path and (not tf.train.latest_checkpoint(params.model_dir)):
        checkpoint_path = None
        if tf.gfile.IsDirectory(params.checkpoint_path):
            checkpoint_path = tf.train.latest_checkpoint(params.checkpoint_path)
        else:
            checkpoint_path = params.checkpoint_path

        tf.train.init_from_checkpoint(
            ckpt_dir_or_file=checkpoint_path,
            assignment_map={params.checkpoint_scope: params.checkpoint_scope}  # 'OptimizeLoss/':'OptimizeLoss/'
        )

使用钩子 hooks。

可以在定义tf.contrib.learn.Experiment的时候通过train_monitors参数指定

   # Define the experiment
    experiment = tf.contrib.learn.Experiment(
        estimator=estimator,  # Estimator
        train_input_fn=train_input_fn,  # First-class function
        eval_input_fn=eval_input_fn,  # First-class function
        train_steps=params.train_steps,  # Minibatch steps
        min_eval_frequency=params.eval_min_frequency,  # Eval frequency
        # train_monitors=[],  # Hooks for training
        # eval_hooks=[eval_input_hook],  # Hooks for evaluation
        eval_steps=params.eval_steps  # Use evaluation feeder until its empty
    )

也可以在定义tf.estimator.EstimatorSpec 的时候通过training_chief_hooks参数指定。
不过个人觉得最好还是在estimator中定义,让experiment只专注于控制实验的模式(训练次数,验证次数等等)。

def model_fn(features, labels, mode, params):

    # ....

    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=predictions,
        loss=loss,
        train_op=train_op,
        eval_metric_ops=eval_metric_ops,
        # scaffold=get_scaffold(),
        # training_chief_hooks=None
    )

这里顺便解释以下tf.estimator.EstimatorSpec对像的作用。该对象描述来一个模型的方方面面。包括:

  • 当前的模式:
    • mode: A ModeKeys. Specifies if this is training, evaluation or prediction.
  • 计算图
    • predictions: Predictions Tensor or dict of Tensor.
    • loss: Training loss Tensor. Must be either scalar, or with shape [1].
    • train_op: Op for the training step.
    • eval_metric_ops: Dict of metric results keyed by name. The values of the dict are the results of calling a metric function, namely a (metric_tensor, update_op) tuple. metric_tensor should be evaluated without any impact on state (typically is a pure computation results based on variables.). For example, it should not trigger the update_op or requires any input fetching.
  • 导出策略
    • export_outputs: Describes the output signatures to be exported to
      SavedModel and used during serving. A dict {name: output} where:
      name: An arbitrary name for this output.
      output: an ExportOutput object such as ClassificationOutput, RegressionOutput, or PredictOutput. Single-headed models only need to specify one entry in this dictionary. Multi-headed models should specify one entry for each head, one of which must be named using signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.
  • chief钩子 训练时的模型保存策略钩子CheckpointSaverHook, 模型恢复等
    • training_chief_hooks: Iterable of tf.train.SessionRunHook objects to run on the chief worker during training.
  • worker钩子 训练时的监控策略钩子如: NanTensorHook LoggingTensorHook
    • training_hooks: Iterable of tf.train.SessionRunHook objects to run on all workers during training.
  • 指定初始化和saver
    scaffold: A tf.train.Scaffold object that can be used to set initialization, saver, and more to be used in training.
    • evaluation钩子
      evaluation_hooks: Iterable of tf.train.SessionRunHook objects to run during evaluation.

自定义的钩子如下:

class RestoreCheckpointHook(tf.train.SessionRunHook):
    def __init__(self,
                 checkpoint_path,
                 exclude_scope_patterns,
                 include_scope_patterns
                 ):
        tf.logging.info("Create RestoreCheckpointHook.")
        #super(IteratorInitializerHook, self).__init__()
        self.checkpoint_path =  checkpoint_path

        self.exclude_scope_patterns = None if (not exclude_scope_patterns) else exclude_scope_patterns.split(',')
        self.include_scope_patterns = None if (not include_scope_patterns) else include_scope_patterns.split(',')


    def begin(self):
      # You can add ops to the graph here.
      print('Before starting the session.')

      # 1. Create saver

      #exclusions = []
      #if self.checkpoint_exclude_scopes:
      #  exclusions = [scope.strip()
      #                for scope in self.checkpoint_exclude_scopes.split(',')]
      #
      #variables_to_restore = []
      #for var in slim.get_model_variables(): #tf.global_variables():
      #  excluded = False
      #  for exclusion in exclusions:
      #    if var.op.name.startswith(exclusion):
      #      excluded = True
      #      break
      #  if not excluded:
      #    variables_to_restore.append(var)
      #inclusions
      #[var for var in tf.trainable_variables() if var.op.name.startswith('InceptionResnetV1')]

      variables_to_restore = tf.contrib.framework.filter_variables(
          slim.get_model_variables(),
          include_patterns=self.include_scope_patterns, # ['Conv'],
          exclude_patterns=self.exclude_scope_patterns, # ['biases', 'Logits'],

          # If True (default), performs re.search to find matches
          # (i.e. pattern can match any substring of the variable name).
          # If False, performs re.match (i.e. regexp should match from the beginning of the variable name).
          reg_search = True
      )
      self.saver = tf.train.Saver(variables_to_restore)


    def after_create_session(self, session, coord):
      # When this is called, the graph is finalized and
      # ops can no longer be added to the graph.

      print('Session created.')

      tf.logging.info('Fine-tuning from %s' % self.checkpoint_path)
      self.saver.restore(session, os.path.expanduser(self.checkpoint_path))
      tf.logging.info('End fineturn from %s' % self.checkpoint_path)

    def before_run(self, run_context):
      #print('Before calling session.run().')
      return None #SessionRunArgs(self.your_tensor)

    def after_run(self, run_context, run_values):
      #print('Done running one step. The value of my tensor: %s', run_values.results)
      #if you-need-to-stop-loop:
      #  run_context.request_stop()
      pass


    def end(self, session):
      #print('Done with the session.')
      pass

你可能感兴趣的:(tensorflow)