深度学习神经网络-batchnorm 理解 -python+tensorflow(by shany shang)

batchnorm:即对一个batch进行normlize的动作,即是 归一化
yi=BN(x)
一、步骤如下:


1.先求出此次批量数据X的均值: 
 

\mu _{\beta } =\frac{1}{m}\sum_{i=1}^{m}x_{i}

2.再求出数据X的方差:

\sigma _{\beta }^{\2 }=\frac{1}{m}\sum_{i=1 }^{m}\left ( x_{i}-\mu _{\beta } \right )^{2}

3.归一化:

\bar{x}_{i}=\frac{x_{i}-\mu _{\beta }}{\sqrt{\left ( \sigma _{\beta }^{2}+\varepsilon \right )}}

4.线性变换:

y_{i}=\lambda \bar{x_{i}}+\theta \equiv BN\left ( x_{i} \right )

 

二、代码如下:

代码来自知乎,这里加入注释帮助阅读。
def batch_norm_layer(x, train_phase, scope_bn):
    with tf.variable_scope(scope_bn):
        # 新建两个变量,平移、缩放因子  
        beta = tf.Variable(tf.constant(0.0, shape=[x.shape[-1]]), name='beta', trainable=True)
        gamma = tf.Variable(tf.constant(1.0, shape=[x.shape[-1]]), name='gamma', trainable=True)
        
        # 计算此次批量的均值和方差
        axises = np.arange(len(x.shape) - 1)
        batch_mean, batch_var = tf.nn.moments(x, axises, name='moments')

        # 滑动平均做衰减
        ema = tf.train.ExponentialMovingAverage(decay=0.5)

        def mean_var_with_update():
            ema_apply_op = ema.apply([batch_mean, batch_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(batch_mean), tf.identity(batch_var)
        # train_phase 训练还是测试的flag
        # 训练阶段计算runing_mean和runing_var,使用mean_var_with_update()函数
        # 测试的时候直接把之前计算的拿去用 ema.average(batch_mean)
        mean, var = tf.cond(train_phase, mean_var_with_update,
                            lambda: (ema.average(batch_mean), ema.average(batch_var)))
        normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
     return normed 

 

tf.nn.batch_normalization()接口定义如下:

def batch_normalization(x,
                        mean,
                        variance,
                        offset,
                        scale,
                        variance_epsilon,
                        name=None):
                        
    with ops.name_scope(name, "batchnorm", [x, mean, variance, scale, offset]):
        inv = math_ops.rsqrt(variance + variance_epsilon)
        if scale is not None:
            inv *= scale
        return x * inv + (offset - mean * inv
                      if offset is not None else -mean * inv)
 


 

 

 

你可能感兴趣的:(深度学习)