Conv2DCustomBackpropInputOp only supports NHWC.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[node Conv2DBackpropInput (defined at /lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_20858]

andrew@1macbook:~/demo/models## run the example
export TENSORFLOW_MODELS=$(pwd)
export CIFAR_DATA=/tmp/cifar10_data
export PYTHONPATH=${TENSORFLOW_MODELS}:$PYTHONPATH
andrew@macbook:~/demo/models#
export PYTHONPATH=${PYTHONPATH}:${TENSORFLOW_MODELS}
export TF_CONFIG='{"cluster": { "chief": ["localhost:2222"], "worker": ["localhost:2223"]}, "task": {"type": "worker", "index": 0}}'                            
python3 resnet_cifar_main.py --data_dir=${CIFAR_DATA} --num_gpus=0 --ds=multi_worker_mirrored --train_epochs=1
/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python: can't open file 'resnet_cifar_main.py': [Errno 2] No such file or directory
andrew@2macbook:~/demo/models#cd ~/tensorflowonspark/TensorFlowOnSpark/examples/resnet
andrew@macbook:~/tensorflowonspark/TensorFlowOnSpark/examples/resnet#
export PYTHONPATH=${PYTHONPATH}:${TENSORFLOW_MODELS}
export TF_CONFIG='{"cluster": { "chief": ["localhost:2222"], "worker": ["localhost:2223"]}, "task": {"type": "worker", "index": 0}}'
python3 resnet_cifar_main.py --data_dir=${CIFAR_DATA} --num_gpus=0 --ds=multi_worker_mirrored --train_epochs=1
2020-05-25 14:10:22.263551: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-25 14:10:22.273714: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb2ea468e10 executing computations on platform Host. Devices:
2020-05-25 14:10:22.273730: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-05-25 14:10:22.279316: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:258] Initialize GrpcChannelCache for job chief -> {0 -> localhost:2222}
2020-05-25 14:10:22.279329: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:258] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2223}
2020-05-25 14:10:22.279803: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:2223
INFO:tensorflow:Enabled multi-worker collective ops with available devices: ['/job:worker/replica:0/task:0/device:CPU:0', '/job:worker/replica:0/task:0/device:XLA_CPU:0']
I0525 14:10:22.280099 4604100032 collective_all_reduce_strategy.py:269] Enabled multi-worker collective ops with available devices: ['/job:worker/replica:0/task:0/device:CPU:0', '/job:worker/replica:0/task:0/device:XLA_CPU:0']
INFO:tensorflow:Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
I0525 14:10:22.282436 4604100032 collective_all_reduce_strategy.py:310] Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/image_ops_impl.py:1518: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0525 14:10:22.569899 4604100032 deprecation.py:323] From /usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/image_ops_impl.py:1518: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.655004 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.658637 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.696696 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.700250 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.744314 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.747793 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.789147 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.792857 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.839616 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2
I0525 14:10:22.843029 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 1 all-reduces, num_workers = 2
INFO:tensorflow:Running Distribute Coordinator with mode = 'independent_worker', cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, environment = None, rpc_layer = 'grpc'
I0525 14:10:25.711827 4604100032 distribute_coordinator.py:776] Running Distribute Coordinator with mode = 'independent_worker', cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, environment = None, rpc_layer = 'grpc'
WARNING:tensorflow:`eval_fn` is not passed in. The `worker_fn` will be used if an "evaluator" task exists in the cluster.
W0525 14:10:25.712001 4604100032 distribute_coordinator.py:825] `eval_fn` is not passed in. The `worker_fn` will be used if an "evaluator" task exists in the cluster.
WARNING:tensorflow:`eval_strategy` is not passed in. No distribution strategy will be used for evaluation.
W0525 14:10:25.712088 4604100032 distribute_coordinator.py:829] `eval_strategy` is not passed in. No distribution strategy will be used for evaluation.
INFO:tensorflow:Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
I0525 14:10:25.712807 4604100032 collective_all_reduce_strategy.py:310] Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
INFO:tensorflow:Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
I0525 14:10:25.713262 4604100032 collective_all_reduce_strategy.py:310] Multi-worker CollectiveAllReduceStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0',), communication = CollectiveCommunication.AUTO
WARNING:tensorflow:ModelCheckpoint callback is not provided. Workers will need to restart training if any fails.
W0525 14:10:25.713459 4604100032 distributed_training_utils.py:1163] ModelCheckpoint callback is not provided. Workers will need to restart training if any fails.
2020-05-25 14:10:25.738091: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:400] Cannot find shardable dataset, adding a shard node at the end of the dataset instead. This may have performance implications.
Train for 390 steps, validate for 78 steps
INFO:tensorflow:Collective batch_all_reduce: 176 all-reduces, num_workers = 2
I0525 14:10:28.142041 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 176 all-reduces, num_workers = 2
INFO:tensorflow:Collective batch_all_reduce: 176 all-reduces, num_workers = 2
I0525 14:10:30.562225 4604100032 cross_device_ops.py:1107] Collective batch_all_reduce: 176 all-reduces, num_workers = 2
2020-05-25 14:10:32.541300: I tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:316] Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.541330: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:381] error: Internal: Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.541343: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:990] error: Internal: Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.541391: E tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:1007] ScopedAllocatorOptimizer: Internal: Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.541399: W tensorflow/core/grappler/optimizers/scoped_allocator_optimizer.cc:782] error: Internal: Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.545791: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] scoped_allocator_optimizer failed: Internal: Abandoning ScopedAllocatorOptimizer because input FusedBatchNormGradV3_15 output 2 is already assigned to scope_id 19
2020-05-25 14:10:32.888861: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropInputOp only supports NHWC.
	 [[{{node Conv2DBackpropInput}}]]
Traceback (most recent call last):
  File "resnet_cifar_main.py", line 290, in <module>
    app.run(main)
  File "/Users/andrew/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/andrew/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "resnet_cifar_main.py", line 284, in main
    return run(flags.FLAGS)
  File "resnet_cifar_main.py", line 260, in run
    verbose=2)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 789, in fit
    *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 776, in wrapper
    mode=dc.CoordinatorMode.INDEPENDENT_WORKER)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 853, in run_distribute_coordinator
    task_id, session_config, rpc_layer)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 360, in _run_single_worker
    return worker_fn(strategy)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 771, in _worker_fn
    return method(model, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Conv2DCustomBackpropInputOp only supports NHWC.
	 [[node Conv2DBackpropInput (defined at /lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_20858]

Function call stack:
distributed_function

是CPU不支持图片的NHWC这样的排列方式:

N batch_size
H height
W width
C color 层数

用GPU计算没这个问题,用CPU计算有这个问题,先记录这个问题,解决了过来填坑

你可能感兴趣的:(mac)