【MindSpore易点通】网络构建经验总结中篇

一个简单的MindSpore实现GAN网络示例

背景信息

从随机噪声生成MNIST手写字符,这个简单的GAN任务在目前主流的深度学习框架上都可以轻而易举地实现,在MindSpore平台上我们将它当做一个简单的GAN示例。

经验总结

关键细节1:定义两个Optimizer,分别用来更新生成器和判别器

# opt
gen_opt = Momentum(params=gen_network.trainable_params(),
               learning_rate=Tensor(lr_schedule),
               momentum=args.momentum,
               weight_decay=0,
               loss_scale=args.loss_scale,
               decay_filter=default_wd_filter)

dis_opt = Momentum(params=dis_network.trainable_params(),
                   learning_rate=Tensor(lr_schedule),
                   momentum=args.momentum,
                   weight_decay=0,
                   loss_scale=args.loss_scale,
                   decay_filter=default_wd_filter)

关键细节2:定义两个TrainOneStepCell用来计算梯度

判别器的TrainOneStepCell比较简单,训练时将 (生成图片,0) 和 (真实图片,1) 送进去训练,就可以了。

class TrainOneStepCellDIS(Cell):
    def __init__(self, network, optimizer, sens=1.0):
        super(TrainOneStepCellDIS, self).__init__(auto_prefix=False)
        self.network = network
        self.weights = ParameterTuple(network.trainable_params())
        self.optimizer = optimizer
        self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
        self.sens = sens
        self.reducer_flag = False
        self.grad_reducer = None
        parallel_mode = _get_parallel_mode()
        if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
            self.reducer_flag = True
        if self.reducer_flag:
            mean = _get_mirror_mean()
            degree = _get_device_num()
            self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)

    def construct(self, loss, img, label):
        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
        grads = self.grad(self.network, self.weights)(img, label, sens)

        if self.reducer_flag:
            # apply grad reducer on grads
            grads = self.grad_reducer(grads)

        return F.depend(loss, self.optimizer(grads))

生成器的训练比较麻烦一点,因为生成器自己是不能产生Loss的,必须将结果送入判别器才能产生Loss,在更新时只更新生成器的参数。在构造生成器的TrainOneStepCell时,需要将判别器网络也传进来,计算出对于Input的梯度。这个Input,实际上就是生成器的Output。用这个梯度向前传播,去更新生成器的参数。注意,训练时将 (生成图片,1)送进去训练。

class TrainOneStepCellGEN(Cell):
    def __init__(self, network, optimizer, postnetwork, sens=3.0):
        super(TrainOneStepCellGEN, self).__init__(auto_prefix=False)
        self.network = network
        self.postnetwork = postnetwork
        self.weights = ParameterTuple(network.trainable_params())
        self.postweights = ParameterTuple(postnetwork.trainable_params())
        self.optimizer = optimizer
        self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
        self.postgrad = C.GradOperation('grad', get_all=True, get_by_list=True, sens_param=True)
        self.sens = sens
        self.reducer_flag = False
        self.grad_reducer = None
        parallel_mode = _get_parallel_mode()

        if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
            self.reducer_flag = True

        if self.reducer_flag:
            mean = _get_mirror_mean()
            degree = _get_device_num()
            self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)

        self.cast = P.Cast()
        self.print = P.Print()

    def construct(self, loss, z, fake_img, inverse_fake_label):
        sens_d = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
        grads_d = self.postgrad(self.postnetwork, self.postweights)(fake_img, inverse_fake_label, sens_d)
        sens_g = grads_d[0][0]
        grads_g = self.grad(self.network, self.weights)(z, sens_g)

        if self.reducer_flag:
            # apply grad reducer on grads
            grads_g = self.grad_reducer(grads_g)

        return F.depend(loss, self.optimizer(grads_g))

需要注意的是,静态图中命名一致的变量就是一个变量。为了保证TrainOneStepCellDis和TrainOneStepCellGen中的判别器网络参数是一致的,一定要把auto_prefix设为False。这样在TrainOneStepCell的命名空间中,不会为参数增加新的前缀,就可以实现权重共享。

用MindSpore实现多个算子共享同一个权重

背景信息

网络训练中有时候需要共享某几个层的权值,前向的时候使用同一个权重,反向的时候只更新一次。

经验总结

示例 网络里多次用到了一个Conv+ReLU的结构,希望这些结构共享一个权值:

1.PyTorch实现

conv1x1 = nn.Conv2d(16, 16, 1, has_bias=True)
self.predict_conv_relu = nn.SequentialCell([conv1x1, nn.ReLU()])
self.predict_conv_relu2 = nn.SequentialCell([conv1x1, nn.ReLU()])
self.predict_conv_relu3 = nn.SequentialCell([conv1x1, nn.ReLU()])

PyTorch中,如果传入Sequential的模块是同一个Module实例的话参数就是共享的

2.MindSpore实现 先初始化好predict_conv_relu的权值,然后对要共享的其他层做赋值

self.predict_conv_relu2[0].weight = self.predict_conv_relu[0].weight
self.predict_conv_relu2[0].bias = self.predict_conv_relu[0].bias
self.predict_conv_relu3[0].weight = self.predict_conv_relu[0].weight
self.predict_conv_relu3[0].bias = self.predict_conv_relu[0].bias

在MindSpore网络的construct中手动修改卷积权值

背景信息

在网络训练前向过程中有需求修改卷积的权值,比如做一个Mask操作

经验总结

通过在init中取出某个Conv的权值,然后在construct中进行修改

class MaskedConv2d(nn.Cell):
    def __init__(self, in_channels, out_channels, kernel_size):
        super(MaskedConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, weight_init='ones')
        self.conv_p = self.conv.conv2d
        self.p = ParameterTuple((self.conv.weight,))
        self.mask = np.ones_like(self.conv.weight.data.asnumpy())
        self.mask[:,:,5//2,5//2:] = 0
        self.mask[:,:,5//2+1:]=0
        self.mask = Tensor(self.mask)
        self.mul = P.Mul()
        self.filter = np.ones_like(self.conv.weight.data.asnumpy())
        self.filter = Tensor(self.filter)

    def construct(self, x):
        filter = self.mul(self.p[0], self.mask)
        P.Assign()(self.p[0], filter)
        update_weight = self.p[0] * 1
        return self.conv_p(x, update_weight)


class Context(nn.Cell):
    def __init__(self, N=3):
        super(Context, self).__init__()
        self.mask_conv = MaskedConv2d(N, N*2, kernel_size=5)

    def construct(self, x):
        x = self.mask_conv(x)
        return x

你可能感兴趣的:(技术博客,人工智能,深度学习,神经网络)