Tensorflow--embedding_attention_seq2seq--encoder part 学习
seq2seq Model中,decoder 调用 embedding_attention_seq2seq函数调用过程如下:
decoder_outputs, _ = tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
self.encoder_inputs,self.decoder_inputs, encoCell, num_encoder_symbols=source_vocab_size,
num_decoder_symbols=target_vocab_size, embedding_size=hidden_size, output_projection=output_projection,
feed_previous=False)
上述参数分别为:
encoder_input:
decoder_input:
encocell:
num_encoder_symbols:
num_decoder_symbols:
output_projection:
feed_previous: 如果为True decoder的input只使用
如果为Flase则使用默认decoder输入
Tensorflow API 给出的函数调用接口如下:
https://tensorflow.google.cn/api_docs/python/tf/contrib/legacy_seq2seq/embedding_attention_seq2seq?hl=zh-cn
tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
encoder_inputs, //
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
num_heads=1,
output_projection=None,
feed_previous=False,
dtype=None,
scope=None,
initial_state_attention=False
)
模型首先将encoder_inputs做word embedding,embedding矩阵为[num_encoder_symbols x input_size],然后执行RNN将encoder_input进行编码,整个过程将RNN的输出保存,以便在稍后的Attention环节使用。同样,创建一个大小为 [num_decoder_symbols x input_size]的词嵌入矩阵作为decode_inputs的word embedding矩阵,将encoder的最后一隐层状态来进行decoder with attention,将解码结果作为输出。输出格式: (outputs, state)元祖,outputs,与decoder_inputs相同维度的2维tensor shape [batch_size x num_decoder_symbols] state
:最终时间每个解码器单元的状态。输出是一个[batch_size x cell.state_size]的2D张量。
embedding_attention_seq2seq函数如下
def embedding_attention_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
num_heads=1,
output_projection=None,
feed_previous=False,
dtype=None,
scope=None,
initial_state_attention=False, beam_search=True, beam_size=10):
with variable_scope.variable_scope(scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
dtype = scope.dtype
# Encoder.
encoder_cell = copy.deepcopy(cell)
encoder_cell = core_rnn_cell.EmbeddingWrapper(encoder_cell, embedding_classes=num_encoder_symbols, embedding_size=embedding_size)
encoder_outputs, encoder_state = rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
# First calculate a concatenation of encoder outputs to put attention on.
top_states = [array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs]
attention_states = array_ops.concat(top_states, 1)
# Decoder.
output_size = None
if output_projection is None:
cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
output_size = num_decoder_symbols
return embedding_attention_decoder(
decoder_inputs,
encoder_state,
attention_states,
cell,
num_decoder_symbols,
embedding_size,
num_heads=num_heads,
output_size=output_size,
output_projection=output_projection,
feed_previous=feed_previous,
initial_state_attention=initial_state_attention, beam_search=beam_search, beam_size=beam_size)
Python copy() 与 copy.deepcopy()
Python 中的copy()函数可以分为深copy 和单纯的copy,也就是说当调用 deepcopy()时候,程序会新建立一个内存分配地址,将copy的内容存入,而copy()仅仅是将内容指向之前的地址,而不是重新分配内存,因此当使用copy()时候,改变原有内存地址内容的是,copy的对象也会随之更改,而使用deepcopy()的时候,由于是重新分配了内存地址,因此不会因原数据改变而导致现有数据改变的状况,因此在embedding_attention_decoder种,使用copy.deecopy()函数,将定义好的cell进行复制,一个作为encoder_cell使用,另一个作为decoder_cell使用。
EmbeddingWrapper() 函数:
EmbeddingWrapper函数继承RNNcell,作用就是将embedding的输入给指定cell
class EmbeddingWrapper(RNNCell):
"""Operator adding input embedding to the given cell.
Note: in many cases it may be more efficient to not use this wrapper,
but instead concatenate the whole sequence of your inputs in time,
do the embedding on this batch-concatenated sequence, then split it and
feed into your RNN.
"""
def __init__(self,
cell,
embedding_classes,
embedding_size,
initializer=None,
reuse=None):
"""Create a cell with an added input embedding.
Args:
cell: an RNNCell, an embedding will be put before its inputs.
embedding_classes: integer, how many symbols will be embedded.
embedding_size: integer, the size of the vectors we embed into.
initializer: an initializer to use when creating the embedding;
if None, the initializer from variable scope or a default one is used.
reuse: (optional) Python boolean describing whether to reuse variables
in an existing scope. If not `True`, and the existing scope already has
the given variables, an error is raised.
Raises:
TypeError: if cell is not an RNNCell.
ValueError: if embedding_classes is not positive.
"""
super(EmbeddingWrapper, self).__init__(_reuse=reuse)
if not _like_rnncell(cell):
raise TypeError("The parameter cell is not RNNCell.")
if embedding_classes <= 0 or embedding_size <= 0:
raise ValueError("Both embedding_classes and embedding_size must be > 0: "
"%d, %d." % (embedding_classes, embedding_size))
self._cell = cell
self._embedding_classes = embedding_classes
self._embedding_size = embedding_size
self._initializer = initializer
@property
def state_size(self):
return self._cell.state_size
@property
def output_size(self):
return self._cell.output_size
def zero_state(self, batch_size, dtype):
with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
return self._cell.zero_state(batch_size, dtype)
def call(self, inputs, state):
"""Run the cell on embedded inputs."""
with ops.device("/cpu:0"):
if self._initializer:
initializer = self._initializer
elif vs.get_variable_scope().initializer:
initializer = vs.get_variable_scope().initializer
else:
# Default initializer for embeddings should have variance=1.
sqrt3 = math.sqrt(3) # Uniform(-sqrt(3), sqrt(3)) has variance=1.
initializer = init_ops.random_uniform_initializer(-sqrt3, sqrt3)
if isinstance(state, tuple):
data_type = state[0].dtype
else:
data_type = state.dtype
embedding = vs.get_variable(
"embedding", [self._embedding_classes, self._embedding_size],
initializer=initializer,
dtype=data_type)
embedded = embedding_ops.embedding_lookup(embedding,
array_ops.reshape(inputs, [-1]))
return self._cell(embedded, state)
函数初始化,cell : RNN cell 之前定义的encoder_cell
embedding_classes : int 将要进行embedding的词典大小
embedding_size : int 生成word embedding 维度
initializer : 创建embedding时候使用的初始化程序 默认为None
reuse :描述是否在现有作用域中重用变量的Python布尔值 默认为None
embedding = vs.get_variable(
"embedding", [self._embedding_classes, self._embedding_size],
initializer=initializer,
dtype=data_type)
embedded = embedding_ops.embedding_lookup(embedding,array_ops.reshape(inputs, [-1]))
return self._cell(embedded, stat)
实际上参看整个类,所做的工作就是生成一个shape 为(vocbulary_size,embedding_size)的矩阵,使用初始化函数initializer的话能够使生成的矩阵符合一定分布,比如正太分布等。之后进行lookup即在encoder_input(或者decoder_input)生成对应的embedding矩阵。
回到embedding_attention_seq2seq函数中,定义好encoder_cell之后,将encoder_cell,enocder_input放入rnn中计算
encoder_outputs, encoder_state = rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
top_states = [array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs]
如果reshape的值为-1 ,那么Numpy会根据剩下的维度计算出数组的另外一个shape属性值。[-1,1,cell.output_size]也就是行数未知,输出列数为1 列中元素个数为cell.output_size。
例:
import numpy as np
image = np.array([[[1,2,3], [4,5,6]], [[1,1,1], [1,1,1]]])
print(image.shape)
print(image.reshape([-1,2,3]))
output:
[[[1 2 3]
[4 5 6]]
[[1 1 1]
[1 1 1]]]
print(image.reshape([-1,1,2]))
output:
[[[1 2]]
[[3 4]]
[[5 6]]
[[1 1]]
[[1 1]]
[[1 1]]]
attention_states = array_ops.concat(top_states, 1)
之后将RNN的所有输出在第一个维度上连接,这个作为decoder部分输入。