从随机噪声生成MNIST手写字符,这个简单的GAN任务在目前主流的深度学习框架上都可以轻而易举地实现,在MindSpore平台上我们将它当做一个简单的GAN示例。
关键细节1:定义两个Optimizer,分别用来更新生成器和判别器
# opt
gen_opt = Momentum(params=gen_network.trainable_params(),
learning_rate=Tensor(lr_schedule),
momentum=args.momentum,
weight_decay=0,
loss_scale=args.loss_scale,
decay_filter=default_wd_filter)
dis_opt = Momentum(params=dis_network.trainable_params(),
learning_rate=Tensor(lr_schedule),
momentum=args.momentum,
weight_decay=0,
loss_scale=args.loss_scale,
decay_filter=default_wd_filter)
关键细节2:定义两个TrainOneStepCell用来计算梯度
判别器的TrainOneStepCell比较简单,训练时将 (生成图片,0) 和 (真实图片,1) 送进去训练,就可以了。
class TrainOneStepCellDIS(Cell):
def __init__(self, network, optimizer, sens=1.0):
super(TrainOneStepCellDIS, self).__init__(auto_prefix=False)
self.network = network
self.weights = ParameterTuple(network.trainable_params())
self.optimizer = optimizer
self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
self.sens = sens
self.reducer_flag = False
self.grad_reducer = None
parallel_mode = _get_parallel_mode()
if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
self.reducer_flag = True
if self.reducer_flag:
mean = _get_mirror_mean()
degree = _get_device_num()
self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
def construct(self, loss, img, label):
sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
grads = self.grad(self.network, self.weights)(img, label, sens)
if self.reducer_flag:
# apply grad reducer on grads
grads = self.grad_reducer(grads)
return F.depend(loss, self.optimizer(grads))
生成器的训练比较麻烦一点,因为生成器自己是不能产生Loss的,必须将结果送入判别器才能产生Loss,在更新时只更新生成器的参数。在构造生成器的TrainOneStepCell时,需要将判别器网络也传进来,计算出对于Input的梯度。这个Input,实际上就是生成器的Output。用这个梯度向前传播,去更新生成器的参数。注意,训练时将 (生成图片,1)送进去训练。
class TrainOneStepCellGEN(Cell):
def __init__(self, network, optimizer, postnetwork, sens=3.0):
super(TrainOneStepCellGEN, self).__init__(auto_prefix=False)
self.network = network
self.postnetwork = postnetwork
self.weights = ParameterTuple(network.trainable_params())
self.postweights = ParameterTuple(postnetwork.trainable_params())
self.optimizer = optimizer
self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
self.postgrad = C.GradOperation('grad', get_all=True, get_by_list=True, sens_param=True)
self.sens = sens
self.reducer_flag = False
self.grad_reducer = None
parallel_mode = _get_parallel_mode()
if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
self.reducer_flag = True
if self.reducer_flag:
mean = _get_mirror_mean()
degree = _get_device_num()
self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
self.cast = P.Cast()
self.print = P.Print()
def construct(self, loss, z, fake_img, inverse_fake_label):
sens_d = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
grads_d = self.postgrad(self.postnetwork, self.postweights)(fake_img, inverse_fake_label, sens_d)
sens_g = grads_d[0][0]
grads_g = self.grad(self.network, self.weights)(z, sens_g)
if self.reducer_flag:
# apply grad reducer on grads
grads_g = self.grad_reducer(grads_g)
return F.depend(loss, self.optimizer(grads_g))
需要注意的是,静态图中命名一致的变量就是一个变量。为了保证TrainOneStepCellDis和TrainOneStepCellGen中的判别器网络参数是一致的,一定要把auto_prefix设为False。这样在TrainOneStepCell的命名空间中,不会为参数增加新的前缀,就可以实现权重共享。
网络训练中有时候需要共享某几个层的权值,前向的时候使用同一个权重,反向的时候只更新一次。
示例 网络里多次用到了一个Conv+ReLU的结构,希望这些结构共享一个权值:
1.PyTorch实现
conv1x1 = nn.Conv2d(16, 16, 1, has_bias=True)
self.predict_conv_relu = nn.SequentialCell([conv1x1, nn.ReLU()])
self.predict_conv_relu2 = nn.SequentialCell([conv1x1, nn.ReLU()])
self.predict_conv_relu3 = nn.SequentialCell([conv1x1, nn.ReLU()])
PyTorch中,如果传入Sequential的模块是同一个Module实例的话参数就是共享的
2.MindSpore实现 先初始化好predict_conv_relu的权值,然后对要共享的其他层做赋值
self.predict_conv_relu2[0].weight = self.predict_conv_relu[0].weight
self.predict_conv_relu2[0].bias = self.predict_conv_relu[0].bias
self.predict_conv_relu3[0].weight = self.predict_conv_relu[0].weight
self.predict_conv_relu3[0].bias = self.predict_conv_relu[0].bias
在网络训练前向过程中有需求修改卷积的权值,比如做一个Mask操作
通过在init中取出某个Conv的权值,然后在construct中进行修改
class MaskedConv2d(nn.Cell):
def __init__(self, in_channels, out_channels, kernel_size):
super(MaskedConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, weight_init='ones')
self.conv_p = self.conv.conv2d
self.p = ParameterTuple((self.conv.weight,))
self.mask = np.ones_like(self.conv.weight.data.asnumpy())
self.mask[:,:,5//2,5//2:] = 0
self.mask[:,:,5//2+1:]=0
self.mask = Tensor(self.mask)
self.mul = P.Mul()
self.filter = np.ones_like(self.conv.weight.data.asnumpy())
self.filter = Tensor(self.filter)
def construct(self, x):
filter = self.mul(self.p[0], self.mask)
P.Assign()(self.p[0], filter)
update_weight = self.p[0] * 1
return self.conv_p(x, update_weight)
class Context(nn.Cell):
def __init__(self, N=3):
super(Context, self).__init__()
self.mask_conv = MaskedConv2d(N, N*2, kernel_size=5)
def construct(self, x):
x = self.mask_conv(x)
return x