Tensorflow--embedding_attention_seq2seq--encoder part 学习

Tensorflow--embedding_attention_seq2seq--encoder part 学习

    seq2seq Model中,decoder 调用 embedding_attention_seq2seq函数调用过程如下:

decoder_outputs, _ = tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
                self.encoder_inputs,self.decoder_inputs, encoCell, num_encoder_symbols=source_vocab_size,
                num_decoder_symbols=target_vocab_size, embedding_size=hidden_size, output_projection=output_projection,
                feed_previous=False)

    上述参数分别为:

                       encoder_input: encoder输入

                       decoder_input: decoder输入

                       encocell: 定义的encoder-decoder的rnn单元

                       num_encoder_symbols: 定义的输入词典大小

                       num_decoder_symbols: 定义的输出词典大小

                       output_projection: 输入时候进行预测的权重矩阵 大小为(512,40000)

                       feed_previous: 如果为True decoder的input只使用后面decoder的输入将使用之前decoder的输出

                                               如果为Flase则使用默认decoder输入

Tensorflow API 给出的函数调用接口如下:

https://tensorflow.google.cn/api_docs/python/tf/contrib/legacy_seq2seq/embedding_attention_seq2seq?hl=zh-cn

tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
    encoder_inputs,  //
    decoder_inputs,
    cell,
    num_encoder_symbols,
    num_decoder_symbols,
    embedding_size,
    num_heads=1,
    output_projection=None,
    feed_previous=False,
    dtype=None,
    scope=None,
    initial_state_attention=False
)

    模型首先将encoder_inputs做word embedding,embedding矩阵为[num_encoder_symbols x input_size],然后执行RNN将encoder_input进行编码,整个过程将RNN的输出保存,以便在稍后的Attention环节使用。同样,创建一个大小为 [num_decoder_symbols x input_size]的词嵌入矩阵作为decode_inputs的word embedding矩阵,将encoder的最后一隐层状态来进行decoder with attention,将解码结果作为输出。输出格式: (outputs, state)元祖,outputs,与decoder_inputs相同维度的2维tensor shape [batch_size x num_decoder_symbols] state:最终时间每个解码器单元的状态。输出是一个[batch_size x cell.state_size]的2D张量。

embedding_attention_seq2seq函数如下

def embedding_attention_seq2seq(encoder_inputs,
                                decoder_inputs,
                                cell,
                                num_encoder_symbols,
                                num_decoder_symbols,
                                embedding_size,
                                num_heads=1,
                                output_projection=None,
                                feed_previous=False,
                                dtype=None,
                                scope=None,
                                initial_state_attention=False, beam_search=True, beam_size=10):
    with variable_scope.variable_scope(scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
        dtype = scope.dtype
        # Encoder.
        encoder_cell = copy.deepcopy(cell)
        encoder_cell = core_rnn_cell.EmbeddingWrapper(encoder_cell, embedding_classes=num_encoder_symbols, embedding_size=embedding_size)
        encoder_outputs, encoder_state = rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)

        # First calculate a concatenation of encoder outputs to put attention on.
        top_states = [array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs]
        attention_states = array_ops.concat(top_states, 1)

        # Decoder.
        output_size = None
        if output_projection is None:
            cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
            output_size = num_decoder_symbols

        return embedding_attention_decoder(
            decoder_inputs,
            encoder_state,
            attention_states,
            cell,
            num_decoder_symbols,
            embedding_size,
            num_heads=num_heads,
            output_size=output_size,
            output_projection=output_projection,
            feed_previous=feed_previous,
            initial_state_attention=initial_state_attention, beam_search=beam_search, beam_size=beam_size)

Python copy() 与 copy.deepcopy()

        Python 中的copy()函数可以分为深copy 和单纯的copy,也就是说当调用 deepcopy()时候,程序会新建立一个内存分配地址,将copy的内容存入,而copy()仅仅是将内容指向之前的地址,而不是重新分配内存,因此当使用copy()时候,改变原有内存地址内容的是,copy的对象也会随之更改,而使用deepcopy()的时候,由于是重新分配了内存地址,因此不会因原数据改变而导致现有数据改变的状况,因此在embedding_attention_decoder种,使用copy.deecopy()函数,将定义好的cell进行复制,一个作为encoder_cell使用,另一个作为decoder_cell使用。

EmbeddingWrapper() 函数:

        EmbeddingWrapper函数继承RNNcell,作用就是将embedding的输入给指定cell

class EmbeddingWrapper(RNNCell):
  """Operator adding input embedding to the given cell.

  Note: in many cases it may be more efficient to not use this wrapper,
  but instead concatenate the whole sequence of your inputs in time,
  do the embedding on this batch-concatenated sequence, then split it and
  feed into your RNN.
  """

  def __init__(self,
               cell,
               embedding_classes,
               embedding_size,
               initializer=None,
               reuse=None):
    """Create a cell with an added input embedding.

    Args:
      cell: an RNNCell, an embedding will be put before its inputs.
      embedding_classes: integer, how many symbols will be embedded.
      embedding_size: integer, the size of the vectors we embed into.
      initializer: an initializer to use when creating the embedding;
        if None, the initializer from variable scope or a default one is used.
      reuse: (optional) Python boolean describing whether to reuse variables
        in an existing scope.  If not `True`, and the existing scope already has
        the given variables, an error is raised.

    Raises:
      TypeError: if cell is not an RNNCell.
      ValueError: if embedding_classes is not positive.
    """
    super(EmbeddingWrapper, self).__init__(_reuse=reuse)
    if not _like_rnncell(cell):
      raise TypeError("The parameter cell is not RNNCell.")
    if embedding_classes <= 0 or embedding_size <= 0:
      raise ValueError("Both embedding_classes and embedding_size must be > 0: "
                       "%d, %d." % (embedding_classes, embedding_size))
    self._cell = cell
    self._embedding_classes = embedding_classes
    self._embedding_size = embedding_size
    self._initializer = initializer

  @property
  def state_size(self):
    return self._cell.state_size

  @property
  def output_size(self):
    return self._cell.output_size

  def zero_state(self, batch_size, dtype):
    with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
      return self._cell.zero_state(batch_size, dtype)

  def call(self, inputs, state):
    """Run the cell on embedded inputs."""
    with ops.device("/cpu:0"):
      if self._initializer:
        initializer = self._initializer
      elif vs.get_variable_scope().initializer:
        initializer = vs.get_variable_scope().initializer
      else:
        # Default initializer for embeddings should have variance=1.
        sqrt3 = math.sqrt(3)  # Uniform(-sqrt(3), sqrt(3)) has variance=1.
        initializer = init_ops.random_uniform_initializer(-sqrt3, sqrt3)

      if isinstance(state, tuple):
        data_type = state[0].dtype
      else:
        data_type = state.dtype

      embedding = vs.get_variable(
          "embedding", [self._embedding_classes, self._embedding_size],
          initializer=initializer,
          dtype=data_type)
      embedded = embedding_ops.embedding_lookup(embedding,
                                                array_ops.reshape(inputs, [-1]))

      return self._cell(embedded, state)

函数初始化,cell : RNN cell 之前定义的encoder_cell 

                    embedding_classes : int 将要进行embedding的词典大小

                    embedding_size : int 生成word embedding 维度

                    initializer : 创建embedding时候使用的初始化程序 默认为None

                    reuse :描述是否在现有作用域中重用变量的Python布尔值 默认为None

embedding = vs.get_variable(
          "embedding", [self._embedding_classes, self._embedding_size],
          initializer=initializer,
          dtype=data_type)
      embedded = embedding_ops.embedding_lookup(embedding,array_ops.reshape(inputs, [-1]))
      return self._cell(embedded, stat)

    实际上参看整个类,所做的工作就是生成一个shape 为(vocbulary_size,embedding_size)的矩阵,使用初始化函数initializer的话能够使生成的矩阵符合一定分布,比如正太分布等。之后进行lookup即在encoder_input(或者decoder_input)生成对应的embedding矩阵。

回到embedding_attention_seq2seq函数中,定义好encoder_cell之后,将encoder_cell,enocder_input放入rnn中计算

encoder_outputs, encoder_state = rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
top_states = [array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs]

        如果reshape的值为-1 ,那么Numpy会根据剩下的维度计算出数组的另外一个shape属性值。[-1,1,cell.output_size]也就是行数未知,输出列数为1 列中元素个数为cell.output_size。

        例:

import numpy as np
image = np.array([[[1,2,3], [4,5,6]], [[1,1,1], [1,1,1]]])
print(image.shape)
print(image.reshape([-1,2,3]))

output:                                   

[[[1 2 3]
  [4 5 6]]
 [[1 1 1]

  [1 1 1]]]

print(image.reshape([-1,1,2]))

output:

[[[1 2]]
 [[3 4]]
 [[5 6]]
 [[1 1]]
 [[1 1]]

 [[1 1]]]

attention_states = array_ops.concat(top_states, 1)
之后将RNN的所有输出在第一个维度上连接,这个作为decoder部分输入。

你可能感兴趣的:(Seq2Seq,seq2seq,enceder,decoder)