MXNet学习2——Module

概要

本节介绍MXNet中一个重要的概念,Module(模块),涉及到的内容包括简单的神经网络,模型的构造,参数的训练以及更新,模型预测等。通过本节大概了解Module中一些重要的属性和方法,对于MXNet的机制有初步的了解,需要注意的是本节代码跳过了官方tutorial中存储部分,有些说明部分没有摘录,感兴趣请参考官方的教程
http://mxnet.io/tutorials/python/module.html

正文

import mxnet as mx
from data_iter import SyntheticData

###神经网络连接的构造
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1',num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type='relu')
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')

data = SyntheticData(10, 128)
mx.viz.plot_network(net).view()

###一个最基本的module需要包含一个Symbol,context定义在哪种设备上,data_names和label_names用来寻找训练数据以及对应的标注。需要注意的是名称十分重要!
mod = mx.mod.Module(symbol=net,context=mx.cpu(), data_names=['data'], label_names=['softmax_label'])

import logging
logging.basicConfig(level=logging.INFO)

batch_size=32
###一个简单的fit即可完成所有的训练,结束之后参数均已经训练完毕,接下来就可以预测了。不需要额外的bind和初始化,详见API。'sgd','acc'分别表示使用随机梯度下降算法和准确率的评价标准,num_epoch表示迭代的次数
mod.fit(data.get_iter(batch_size), eval_data=data.get_iter(batch_size), optimizer_params={'learning_rate':0.1}, optimizer='sgd', eval_metric='acc', num_epoch=5)

###predict返回所有的预测值,batch_size=32,num_batchs=10,label=10
y = mod.predict(data.get_iter(batch_size))
print('shape of predict: %s' % (y.shape,))

‘shape of predict: (320L, 10L)’

# 对于iter_predict,官网API给出说明:pred is a list of outputs from the module # i_batch is a integer # batch is the data batch from the data iterator

###preds[0]是个二维数组,第一维是batch_size,即预测的个数,第二维对应的是预测的值,每个值对应的概率,0~9每个数字对应的概率,所以shape=(batch_size,10)
for preds,i_batch,batch in mod.iter_predict(data.get_iter(batch_size)):
    pred_label = preds[0].asnumpy().argmax(axis=1)
    label = batch.label[0].asnumpy().astype('int32')
    print('batch %d, accuracy %f' % (i_batch, float(sum(pred_label==label))/len(label)))

batch 0, accuracy 0.062500
batch 1, accuracy 0.156250
batch 2, accuracy 0.187500
batch 3, accuracy 0.000000
batch 4, accuracy 0.062500
batch 5, accuracy 0.000000
batch 6, accuracy 0.062500
batch 7, accuracy 0.093750
batch 8, accuracy 0.062500
batch 9, accuracy 0.062500

###mx.metric.create定义一个评价指标,mse表示均方误差,acc表示准确率
metric = [mx.metric.create('mse'), mx.metric.create('acc')]
mod.score(data.get_iter(batch_size), metric)
print([metric[0].get(),metric[1].get()])

[(‘mse’, 27.438781929016113), (‘accuracy’, 0.115625)]

上述介绍的是Module最基本的训练操作,下面介绍的是复杂的使用方法

一个Module具有如下几种状态

  • Initial state:内存还没有分配,还没有准备好进行计算
  • Binded:输入、输出、参数的维度都已经知道,内存已经分配,准备好计算了
  • Parameter initialized:参数已经初始化,如果没有初始化将可能得到未定义的结果
  • Optimizer installed:优化方法已经确定;确定后,参数值在梯度计算后根据优化算法进行更新(forward-backward).

mod = mx.mod.Module(symbol=net)

train_iter = data.get_iter(batch_size)
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
###使用bind的话,init_params和init_optimizer是必须要的

###Xavier是一种初始化方式类似高斯,这里不细讲
mod.init_params(initializer=mx.init.Xavier(magnitude=2.))

mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate',0.1),))

###metric至少需要一个,否则参数更新没有依据
metric = mx.metric.create('acc')

for batch in train_iter:
     mod.forward(batch,is_train=True)
     mod.update_metric(metric, batch.label)
     mod.backward()
     mod.update()
"""
forward是计算输出,从前往后直到output,is_train表示是否训练,或者说参数是否更新,不更新训练就没有意义了,默认是None,这时值就取self.for_training,而self.for_training在bind时默认设置为True了
update_metric更新(累加)评价指标,比如准确率,label来确定是哪一种
backward计算梯度
update更新参数,比如w,b
"""
print(metric.get())

(‘accuracy’, 0.59375)

除了上述的操作,Module还提供大量有用的信息,以下不涉及复杂操作,且简单易懂就不翻译了

basic names:

  • data_names: list of string indicating the names of the required data.
  • output_names: list of string indicating the names of the outputs.

state information

  • binded: bool, indicating whether the memory buffers needed for
    computation has been allocated.
  • for_training: whether the module is binded for training (if binded).
  • params_initialized: bool, indicating whether the parameters of this modules has been initialized.
  • optimizer_initialized: bool, indicating whether an optimizer is defined and initialized.
  • inputs_need_grad: bool, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.

input/output information

  • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.
  • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.
  • output_shapes: a list of (name, shape) for outputs of the module. parameters (for modules with parameters)
  • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters.
  • get_outputs(): get outputs of the previous forward operation.
  • get_input_grads(): get the gradients with respect to the inputs
    computed in the previous backward operation.
print((mod.data_shapes, mod.label_shapes, mod.output_shapes))
print(mod.get_params())


([DataDesc[data,(32, 128),'numpy.float32'>,NCHW]], [DataDesc[softmax_label,(32,),'numpy.float32'>,NCHW]], [('softmax_output', (32, 10L))])
({'fc2_bias': 10 @cpu(0)>, 'fc2_weight': , 'fc1_bias': , 'fc1_weight': }, {})

你可能感兴趣的:(MXNet)