1 技术可行性分析:
Mask_RCNN是目前表现最好的模型之一,和GAN系列的vidvid 和NLP的BERT在我的经验里都是最好用的模型之一,但是今日碰到了一个问题就是生产环境没有GPU,面临怎么去部署的问题.. 备注:在训练环境中已经训练成功了~~能做到用640张样本达到可接受的分割效果.但是线上使用阿里云服务,GPU太贵了.需要尝试一下cpu,如果能达到1000ms内,就帮公司省点钱呗.毕竟不是实时性要求很高的应用.
1.1创建生产环境的日志:
下载了conda, 在.bashrc 添加conda.sh
conda create -n py3cpu python=3.6.2
pip install numpy scipy Pillow cython matplotlib scikit-image keras==2.0.8 h5py IPython
pip install opencv-python imgaug
pip install tensorflow==1.4.0
conda和pip兼容性还是不错的.
进行model的推断模式.果然报错
Processing 1 images
image shape: (512, 512, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 512, 512, 3) min: -123.70000 max: 151.10000 float64
image_metas shape: (1, 15) min: 0.00000 max: 512.00000 int64
anchors shape: (1, 65280, 4) min: -0.17712 max: 1.11450 float32
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 65343 is not in [0, 65280)
[[Node: ROI/Gather_2 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/strided_slice_6, ROI/strided_slice_7)]]During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/Documents/Mask_RCNN-master/samples/dish_food/test_model.py", line 118, in
ma.test(start_id=0, stop=632)
File "/home/ubuntu/Documents/Mask_RCNN-master/samples/dish_food/test_model.py", line 90, in test
results = model.detect(patch_resized_images, verbose=1)
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/model.py", line 2479, in detect
self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/keras/engine/training.py", line 1713, in predict
verbose=verbose, steps=steps)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/keras/engine/training.py", line 1269, in _predict_loop
batch_outs = f(ins_batch)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
**self.session_kwargs)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 65343 is not in [0, 65280)
[[Node: ROI/Gather_2 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/strided_slice_6, ROI/strided_slice_7)]]Caused by op 'ROI/Gather_2', defined at:
File "/home/ubuntu/Documents/Mask_RCNN-master/samples/dish_food/test_model.py", line 29, in
model_dir=MODEL_DIR)
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/model.py", line 1824, in __init__
self.keras_model = self.build(mode=mode, config=config)
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/model.py", line 1948, in build
config=config)([rpn_class, rpn_bbox, anchors])
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/model.py", line 294, in call
names=["pre_nms_anchors"])
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/utils.py", line 826, in batch_slice
output_slice = graph_fn(*inputs_slice)
File "/home/ubuntu/Documents/Mask_RCNN-master/mrcnn/model.py", line 292, in
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2486, in gather
params, indices, validate_indices=validate_indices, name=name)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1834, in gather
validate_indices=validate_indices, name=name)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/ubuntu/anaconda3/envs/py3cpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessInvalidArgumentError (see above for traceback): indices[1] = 65343 is not in [0, 65280)
[[Node: ROI/Gather_2 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/strided_slice_6, ROI/strided_slice_7)]]
报错的是names=["pre_nms_anchors"]的tensor, 查api,知道了tf.gather是按照indices 获取数组新集合的.但是scores的top 6000的indices应该和anchors的indices的范围应该是一致的才对.
需要进一步进行tensor的debug.