Transformer - layer norm

  1. Encoder structure

Transformer - layer norm_第1张图片
Transformer - layer norm_第2张图片
2. layer normalization:

  • 什么是covariate shift?
    Covariate shift is the change in the distribution of the covariates specifically, that is, the independent variables.
    在这里插入图片描述
    Transformer - layer norm_第3张图片
    在机器学习实践中,我们一定要注意训练数据集和实际情况产生的数据分布不同而带来的影响。

  • batch norm vs layer norm

BN:
Transformer - layer norm_第4张图片
LN:
Transformer - layer norm_第5张图片

  • Layer normalized recurrent neural networks

Transformer - layer norm_第6张图片
its normalization terms dependonly on the summed inputs to a layer at the current time-step. It also has only one set of gain (g) and bias (b) parameters shared over all time-steps.

你可能感兴趣的:(NLP)