tensorflow object detectin api 训练mask rcnn 出错

训练信息:

训练模式:分布式训练 Tesla p100 * 4
2分类
配置模板:mask_rcnn_resnet101_atrous_coco_2018_01_28/pipeline.config
使用预训练模型
训练步数:20000

具体报错信息:

2.Traceback (most recent call last):
  File "object_detection/model_main.py", line 111, in 
    tf.app.run()
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/pla
    _sys.exit(main(argv))
  File "object_detection/model_main.py", line 107, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    return executor.run()
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    return self.run_local()
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    saving_listeners=saving_listeners)
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/root/anaconda3/envs/cvtf/lib/python3.6/site-packages/tensorflow/python/est
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/root/research/object_detection/model_lib.py", line 343, in model_fn
    train_config.optimizer)
  File "/root/research/object_detection/builders/optimizer_builder.py", line 50, in
    learning_rate = _create_learning_rate(config.learning_rate)
  File "/root/research/object_detection/builders/optimizer_builder.py", line 112, i
    learning_rate_sequence, config.warmup)
  File "/root/research/object_detection/utils/learning_schedules.py", line 160, in
    raise ValueError('First step cannot be zero.')
ValueError: First step cannot be zero.

解决方法:
根据提示修改配置文件中的下图位置值即可。
将step的值从0 改成1保存tensorflow object detectin api 训练mask rcnn 出错_第1张图片

你可能感兴趣的:(TensorFlow,AI)