mxnet实现强化学习NoisyNet网络

强化学习实现的代码实现参考主要是增强模型的探索能力-强化学习NoisyNet原理及实现！

论文地址：https://arxiv.org/abs/1706.10295

基础知识储备

强化学习流程
Q-learning算法
DQN算法
epsilon贪婪策略

NoisyNet目标

在强化学习算法中，为了增强模型对动作的探索能力，通常使用的策略是epsilon贪婪策略，而NoisyNet论文结果显示可以通过在全连接层增加噪声来代替epsilon贪婪策略

NoisyNet理论

噪声网络是权重(weight)和偏差(bias)受到噪声函数扰动的神经网络

如果用y表示网络的输出结果，x表示网络的输入，f表示网络的映射，神经网络可以表示成

其中theta表示网络参数，或者表示成

输入的x的形状为(batch_size,feature_size)，所以参数w的形状就是(units,feature_size)，参数b的形状是(units,)，x与w.T进行矩阵乘法，得到的形状为(batch_size,units)，就是y的形状

加噪声之后，函数表示为

其中，mu、sigma和epsilon都是和w和b形状相同的，bigodot表示元素乘法

函数在计算过程中，mu和sigma是网络需要更新学习的参数，而epsilon是每次进行前向传播的时候随机噪声

生成噪声的两种方式

独立高斯噪声

每次进行前向传播的时候直接随机生成高斯（正态）噪声，噪声的形状和w和b相同，所以一个噪声的形状是(units,feature_size)，一个噪声的形状是(units,),以共需要生成的噪声数量就是units * feature_size + units

因子分解高斯噪声

因子分解高斯噪声主要可以避免生成一个(units,feature_size)的噪声，而通过因子分解的方式分别生成一个(units,1)和(1,feature_size)的噪声，经过噪声函数f映射之后，进行矩阵乘积生成一个(units,feature_size)的噪声作为权重的噪声，而偏差的噪声形状仍然不变，所以以共需要生成的噪声数量就是2 * units + feature_size，与独立高斯噪声相比可以减少很多资源

代码实现

class Noisy(nn.HybridBlock):
    def __init__(self, units, activation=None, use_bias=True, flatten=True, dtype='float32', weight_initializer=None, bias_initializer='zeros', in_units=0, noisy_distribution='independent', **kwargs):
        super(Noisy, self).__init__(**kwargs)
        self._flatten = flatten  # 是否需要拍平，作为FullyConnected的输入参数
        self.dtype = dtype
        self.ctx = try_gpu(GPU_INDEX)  # 指定计算环境GPU还是CPU
        with self.name_scope():
            self._units = units  # 全连接层的网络输出节点数
            self._in_units = in_units  # 全连接层的网络输入节点数
            self.weights = self.params.get('weights', shape=(units, in_units), init=weight_initializer, dtype=dtype,
                                           allow_deferred_init=False)  # 允许参数延迟初始化，权重参数
            self.weight_sigma = self.params.get('weight_sigma', shape=(units, in_units),
                                                init=weight_initializer, dtype=dtype,
                                                allow_deferred_init=False)  # 权重方差

            if use_bias:
                self.bias = self.params.get('bias', shape=(units, ), init=bias_initializer, dtype=dtype,
                                                allow_deferred_init=False)  # 偏差参数
                self.bias_noise = self.params.get('bias_noise', shape=(1, units), init=bias_initializer, dtype=dtype,
                                                  allow_deferred_init=False)  # 偏差方差
            else:
                self.bias = None
            if activation is not None:
                self.act = nn.Activation(activation, prefix=activation+'_')  # 是否使用激活函数
            else:
                self.act = None
        self.noisy_distribution = noisy_distribution  # 生成噪声的方式，独立高斯还是因子分解

    def hybrid_forward(self, F, x, weights, weight_sigma, bias=None, bias_noise=None):
        # 因子分解中的噪声映射函数
        def real_valued_f(e_list):
            return F.multiply(F.sign(e_list), F.power(F.abs(e_list), 0.5))
        global noise_1, b
        if self.noisy_distribution == 'independent':
            w = weights + F.multiply(F.random_normal(scale=0.1, shape=weight_sigma.shape, ctx=self.ctx, dtype=self.dtype),
                                     weight_sigma)  # 元素乘法
        else:
            noise_1 = real_valued_f(F.random_normal(scale=0.1, shape=(self._units, 1), ctx=self.ctx, dtype=self.dtype))
            noise_2 = real_valued_f(F.random_normal(scale=0.1, shape=(1, self._in_units), ctx=self.ctx, dtype=self.dtype))
            w = weights + F.multiply(F.dot(noise_1, noise_2), weight_sigma)
        if self.bias is not None:
            if self.noisy_distribution == 'independent':
                b = bias + F.multiply(F.random_normal(scale=0.1, shape=bias.shape, ctx=self.ctx, dtype=self.dtype),
                                      bias_noise.reshape(bias.shape))
            else:
                b = bias + F.multiply(noise_1.reshape(bias.shape), bias_noise.reshape(bias.shape))
        act = F.FullyConnected(x, w, b, no_bias=bias is None, num_hidden=self._units,
                               flatten=self._flatten, name='fwd')  # 处理之后的权重和偏差输入到全连接层
        if self.act is not None:
            act = self.act(act)  # 指定是否加激活层
        return act

类使用方法

if __name__ == '__main__':
    net = nn.Sequential()
    a = Noisy(5, in_units=10
                  , weight_initializer=init.Uniform(0.1),
                  bias_initializer=init.Constant(0.1)
                  )

    net.add(Noisy(5, in_units=10
                  , weight_initializer=init.Uniform(0.1),
                  bias_initializer=init.Constant(0.1)
                  ))
    # net.add(nn.Dense(5, weight_initializer=init.Uniform(0.1)))
    net.initialize(init=init.Xavier(), ctx=mx.cpu(1))
    x = nd.arange(20 * 10).reshape((20, 10))
    y = net(x)
    print(y)

总结

使用噪声网络会增加网络参数，在计算资源有限的情况下，可能需要修改原来的网络参数，以减少网络参数所占用的资源
该代码的实现方式主要参考nn.Dense源代码的实现方式，关于HybridBlock和Block类之间的实现方式可以参考上一篇文章mxnet采坑记
Dense层的源代码中allow_deferred_init的参数设置为True，表示支持延迟初始化，即等到第一个数据传入的时候才会根据数据推断网络的输入节点数，而噪声网络中由于前向网络的计算依赖于初始化的权重和偏差参数，所以不支持延迟初始化，需要在初始化该类的时候指定in_units参数