TensorFlow代码如下
def get_position_encoding(
length, hidden_size, min_timescale=1.0, max_timescale=1.0e4):
"""Return positional encoding.
Calculates the position encoding as a mix of sine and cosine functions with
geometrically increasing wavelengths.
Defined and formulized in Attention is All You Need, section 3.5.
Args:
length: Sequence length.
hidden_size: Size of the
min_timescale: Minimum scale that will be applied at each position
max_timescale: Maximum scale that will be applied at each position
Returns:
Tensor with shape [length, hidden_size]
"""
position = tf.to_float(tf.range(length))
num_timescales = hidden_size // 2
log_timescale_increment = (
math.log(float(max_timescale) / float(min_timescale)) /
(tf.to_float(num_timescales) - 1))
inv_timescales = min_timescale * tf.exp(
tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)
scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(inv_timescales, 0)
signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1)
return signal
假设length=5 ,hidden_size=14,代表一个句子5个单词,每个单词向量14维
下面按行解释
19:词维度除以2,因为词维度一半要求sin,一半要求cos
20:因子log(max/min) / (7-1)
23:行向量[0,1,2,3,4,5,6] * l o g ( m a x / m i n ) 6 \frac{log(max/min)}{6} 6log(max/min) 即
[ 0 ∗ l o g ( m a x / m i n ) 6 , 1 ∗ l o g ( m a x / m i n ) 6 , 2 ∗ l o g ( m a x / m i n ) 6 , 3 ∗ l o g ( m a x / m i n ) 6 , 4 ∗ l o g ( m a x / m i n ) 6 , 5 ∗ l o g ( m a x / m i n ) 6 , 6 ∗ l o g ( m a x / m i n ) 6 ] [\frac{0*log(max/min)}{6},\frac{1*log(max/min)}{6},\frac{2*log(max/min)}{6},\frac{3*log(max/min)}{6},\frac{4*log(max/min)}{6},\frac{5*log(max/min)}{6},\frac{6*log(max/min)}{6}] [60∗log(max/min),61∗log(max/min),62∗log(max/min),63∗log(max/min),64∗log(max/min),65∗log(max/min),66∗log(max/min)]
就是将log(max/min)均分7份(词维度一半)
25:将维度为length的列向量[0,1,2,3,4]乘以上一步的行向量
得到shape为 [ l e n g t h , h i d d e n _ s i z e 2 ] [length,\frac{hidden\_size}{2}] [length,2hidden_size]的矩阵scaled_time ,
26:分别计算scaled_time 的sin和cos得到两个 [ l e n g t h , h i d d e n _ s i z e 2 ] [length,\frac{hidden\_size}{2}] [length,2hidden_size]的矩阵,将这两个矩阵按列拼接成
[ l e n g t h , h i d d e n _ s i z e ] [length,{hidden\_size}] [length,hidden_size]的矩阵