transformer中pos embedding原理

TensorFlow代码如下

def get_position_encoding(
    length, hidden_size, min_timescale=1.0, max_timescale=1.0e4):
  """Return positional encoding.

  Calculates the position encoding as a mix of sine and cosine functions with
  geometrically increasing wavelengths.
  Defined and formulized in Attention is All You Need, section 3.5.

  Args:
    length: Sequence length.
    hidden_size: Size of the
    min_timescale: Minimum scale that will be applied at each position
    max_timescale: Maximum scale that will be applied at each position

  Returns:
    Tensor with shape [length, hidden_size]
  """
  position = tf.to_float(tf.range(length))
  num_timescales = hidden_size // 2
  log_timescale_increment = (
      math.log(float(max_timescale) / float(min_timescale)) /
      (tf.to_float(num_timescales) - 1))
  inv_timescales = min_timescale * tf.exp(
      tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)
  scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(inv_timescales, 0)
  signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1)
  return signal

假设length=5 ,hidden_size=14,代表一个句子5个单词,每个单词向量14维
下面按行解释
19:词维度除以2,因为词维度一半要求sin,一半要求cos

20:因子log(max/min) / (7-1)

23:行向量[0,1,2,3,4,5,6] * l o g ( m a x / m i n ) 6 \frac{log(max/min)}{6} 6log(max/min)

[ 0 ∗ l o g ( m a x / m i n ) 6 , 1 ∗ l o g ( m a x / m i n ) 6 , 2 ∗ l o g ( m a x / m i n ) 6 , 3 ∗ l o g ( m a x / m i n ) 6 , 4 ∗ l o g ( m a x / m i n ) 6 , 5 ∗ l o g ( m a x / m i n ) 6 , 6 ∗ l o g ( m a x / m i n ) 6 ] [\frac{0*log(max/min)}{6},\frac{1*log(max/min)}{6},\frac{2*log(max/min)}{6},\frac{3*log(max/min)}{6},\frac{4*log(max/min)}{6},\frac{5*log(max/min)}{6},\frac{6*log(max/min)}{6}] [60log(max/min),61log(max/min),62log(max/min),63log(max/min),64log(max/min),65log(max/min),66log(max/min)]

就是将log(max/min)均分7份(词维度一半)

25:将维度为length的列向量[0,1,2,3,4]乘以上一步的行向量

得到shape为 [ l e n g t h , h i d d e n _ s i z e 2 ] [length,\frac{hidden\_size}{2}] [length,2hidden_size]的矩阵scaled_time ,

26:分别计算scaled_time 的sin和cos得到两个 [ l e n g t h , h i d d e n _ s i z e 2 ] [length,\frac{hidden\_size}{2}] [length,2hidden_size]的矩阵,将这两个矩阵按列拼接成
[ l e n g t h , h i d d e n _ s i z e ] [length,{hidden\_size}] [length,hidden_size]的矩阵

你可能感兴趣的:(transformer中pos embedding原理)