本帖主要是个人备忘,语无伦次请见谅!!
问题一:其demo运行时就报错,把data=list(),label=list()就好了。。。。坑啊~~~~~~~~~~
问题二:
我的mxnet和mxnet-model-server都是官网源码本地编译最新版本。。。。。从GitHub上下载了一个用mxnet实现yolo2的代码,代码自带mxnet,版本是0.10.1,自带版本有mx.sym.stack_neighbor和mx.sym.contrib.YoloOutput(),而官方源码没有这些。。。。。。现在的问题是:旧mxnet版本yolo2源码里产生的模型symbol里有stack_neighbor()这个语句,而官网mxnet已经改成stack()了,导致使用最新的mxnet-model-server运行部署模型时,无法解析stack_neighbor。错误如下:
mms/mxnet_model_server.py:_arg_process:184 Failed to process arguments: Failed loading Op stack_downsample of type stack_neighbor: [06:41:11] src/core/op.cc:55: Check failed: op != nullptr Operator stack_neighbor is not registered我不想降级已经编译的最新的mxnet和model-server,但又要部署旧版本产生的model,怎么办???
有人说通过修改json文件替换为stack(stack不等于stack_neighbor,此方法不对!!!详见下面),再mxnet-model-export产生的文件可以避免新版本无法解析stack_neighbor的问题。。。但又出现了contrib.YoloOutput无法解析啊。。。。治标不治本!!!!
于是,
我把 @zhreshold yolo2源码中/src/operator/contrib/中有关yolo_output的三个文件(.h .cu .cc)拷贝到mxnet最新源码 /incubator-mxnet/src/operator/contrib/里,重新编译mxnet,奇迹般的新版本mxnet也有contirb.YoloOutput了。模型顺利装载,但又出现stack不兼容stack_neighbor的问题,报错如下:
model_server.py:_arg_process:184 Failed to process arguments: Cannot find argument ‘kernel’, Possible Arguments:
axis : int, optional, default='0’
The axis in the result array along which the input arrays are stacked.
num_args : int, required
Number of inputs to be stacked.
, in operator stack(name=“stack_downsample”, kernel="(2, 2)")
原因是这两个函数的输入参数不一样。。。stack的参数是:
Parameters:
data (Symbol[]) 鈥?List of arrays to stack
axis (int, optional, default=‘0’) 鈥?The axis in the result array along which the input arrays are stacked.
name (string, optional.) 鈥?Name of the resulting symbol.
stack_neighbor的参数是:
Parameters
data : Symbol
Input data array
kernel : Shape(tuple), optional, default=(1,1)
Stack spatial neighbors defined by kernel along channel axis. The output has same elements as input, but the shape/dimension/order has been changed according to the kernel shape.
name : string, optional.
Name of the resulting symbol.
于是,我强行把旧版本的stack_neighbor的代码一段一段地加到新版本mxnet的matrix_op(.cu,.h,.cc)里,并重新编译安装了mxnet。我手动测试了一下stack_neighbor的正确性,发现输入1,1,8,8,kernel=(2,2),输出是1,4,4,4。所以,新加入的stack_neighbor应该没问题!!!
随后部署时出现这样的错:
Initialized model serving.
[ERROR 2018-03-08 03:33:56,606 PID:10704 /home/jojo/anaconda3/lib/python3.6/site-packages/mms/mxnet_model_server.py:_arg_process:184 Failed to process arguments: Parameter file in model archive is inconsistent with manifest.
问了论坛大神,大神说模型路径不对,如下:
raise Exception('Failed to open manifest file. Stacktrace: ' + str(e))
validate(manifest, schema)
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Signature']))) == 1, \
'Signature file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Symbol']))) == 1, \
'Symbol file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Parameters']))) == 1, \
'Parameter file in model archive is inconsistent with manifest.'
assert len(glob.glob(os.path.join(model_dir, manifest['Model']['Service']))) == 1, \
'Service file in model archive is inconsistent with manifest.'
model_name = manifest['Model']['Model-Name']
return service_name, model_name, model_dir, manifest
可以看出是parameter文件没找到,故把param文件又拷贝到export模型的文件夹里。再次运行说名字不对,好吧,你说不对就不对,我复制两个一样的param文件,名字按它要求的来。最后,搞定!!!
补充一下:根据官网sdd部署的例子直接照搬部署yolo2是不行的,因为没有去均值。。。。详情见我其他帖子。。。。