0. tf跑起来一直没有用GPU...
尴尬,跑起来发现GPU没用起来,CPU满了。发现装错了,应该装tensorflow-gpu。
代码测试是否用的是GPU:https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell
1. tf.app.run()的疑惑
http://stackoverflow.com/questions/33703624/how-does-tf-app-run-work
tf.app类似python中argparse
2. variable scope 和 name scope
Variable Scope mechanism: https://www.tensorflow.org/programmers_guide/variable_scope
http://stackoverflow.com/questions/35919020/whats-the-difference-of-name-scope-and-a-variable-scope-in-tensorflow
重点:Name scopes can be opened in addition to a variable scope, and then they will only affect the names of the ops, but not of variables.
with tf.variable_scope("foo"):
with tf.name_scope("bar"):
v = tf.get_variable("v", [1])
x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"
scope.original_name_scope和scope.name的区别
http://stackoverflow.com/questions/41756054/tensorflow-variablescope-original-name-scope-vs-name
3. Python2 Python3区别
在修改data_utils.py(https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/data_utils.py)文件中:
没注意版本不同的区别,Python3语法中print语句没有了,取而代之的是print()函数。
另外 用python2执行时候:
with gfile.GFile(data_path, mode="rb") as f:
counter = 0
for line in f:
这里line里面含有'\n',用split切分后会和最后一个word组合一起读入list。出现list写到文件中和len(list)大小不一致。
比如 li=['a', 'b\n'] 写入文件。'\n'会换行,写的文件成为3行。
下次逐行读入时候会把空符号(‘’)计算为一个新word。
4. TensorFlow Saver类 https://www.tensorflow.org/api_docs/python/tf/train/Saver
http://blog.csdn.net/u011500062/article/details/51728830
5. Seq2Seq模型保存
其中保存的模型为:
translate.ckpt-16.data-00000-of-00001
translate.ckpt-16.index
translate.ckpt-16.meta
这些东西的解释见:
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Y4mzbDAUSec
http://stackoverflow.com/questions/36195454/what-is-the-tensorflow-checkpoint-meta-file
6. RNN示例
https://uqer.io/community/share/58a9332bf1973300597ae209
http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
7. List of tensor to tensor
http://stackoverflow.com/questions/35730161/how-to-convert-a-list-of-tensors-of-dim-n-to-a-tensor-of-dim-n1
http://blog.csdn.net/sherry_up/article/details/52169318
8. batch_matmul问题
想进行的操作: suppose I have a T x n x k
and want to multiply it by a k x k2
, and then to a max pool overT
and then a mean pool over n
. To do this now, I think you need to reshape, do the matmul()
and then undo the reshape and then do the pooling.
https://github.com/tensorflow/tensorflow/issues/216
https://www.tensorflow.org/versions/r0.10/api_docs/python/math_ops/matrix_math_functions#batch_matmul
使用时候报错:AttributeError: 'module' object has no attribute 'batch_matmul',才发现1.0版本中没有这个。需要用matmul加参数进行使用
9. Cannot feed value of shape (XX) for Tensor u'target/Y:0', which has shape '(YY)'?
第一次遇到这种问题,google后说是input feed的数据shape不一致。但出问题是第五个变量(mask5),导致自己以为不是input feed问题(如果是为什么会是第5个才出问题?)。瞎折腾好久后,发现还是输入数据时候的问题.....
10. 读取现有模型
之前一直可以读取指定目录下现有模型,后来发现读不了,折腾了几个小时才发现以前下面是有一个checkpoint文件,里面会告诉模型两个path:
model_checkpoint_path: "translate.ckpt-101000"
all_model_checkpoint_paths: "translate.ckpt-101000"
11. 读模型内的参数值
https://www.tensorflow.org/programmers_guide/variables#checkpoint_files:
When you create a Saver object, you can optionally choose names for the variables in the checkpoint files. By default, it uses the value of the tf.Variable.name property for each variable.
To understand what variables are in a checkpoint, you can use the inspect_checkpoint library, and in particular, the print_tensors_in_checkpoint_file function.
*https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py
用法:
python inspect_checkpoint.py --file_name=./alpha_easy_nmt/valid_model/translate.ckpt-625000tensor
显示:
Decoder/trg_lookup_table/embedding (DT_FLOAT) [16000,620]
Decoder/trg_lookup_table/embedding/Adadelta (DT_FLOAT) [16000,620]
Decoder/trg_lookup_table/embedding/Adadelta_1 (DT_FLOAT) [16000,620]
用法:
python inspect_checkpoint.py --file_name=./alpha_easy_nmt/valid_model/translate.ckpt-625000 --tensor_name=Decoder/W_sf
显示:
tensor_name: Decoder/W_sf
[[ -4.55709170e-07 -9.10816539e-07 4.44753543e-02 ..., -2.58049741e-02
4.26506670e-03 -3.64431571e-07]
[ 7.86067460e-07 7.86348721e-07 1.29140466e-02 ..., 7.92008177e-06
5.49392325e-07 6.99410566e-06]
[ -5.86683996e-07 5.51591484e-08 9.70983803e-02 ..., 2.75615434e-07
-4.86231060e-04 1.23817983e-07]
...,
[ -1.40239194e-06 -1.00237912e-06 -1.44313052e-01 ..., -1.33047411e-06
-1.17946070e-06 -2.41477892e-07]
[ 1.19242941e-06 -9.48488719e-08 -2.48298571e-02 ..., 1.00101170e-03
-3.03782895e-03 1.45507602e-06]
[ -1.27071712e-06 -1.27975386e-06 -2.31240150e-02 ..., -7.33333752e-02
2.30671745e-03 -5.72958811e-07]]
12. tf.get_variable的default initializer
https://www.tensorflow.org/api_docs/python/tf/get_variable:
If initializer is None (the default), the default initializer passed in the variable scope will be used. If that one is None too, a glorot_uniform_initializer will be used. The initializer can also be a Tensor, in which case the variable is initialized to this value and shape.
奇怪是glorot_uniform_initializer也差不到任何文档提及,github上倒是有人问过这个问题,不过没人回答https://github.com/tensorflow/tensorflow/issues/7791。
13. 系统记录
13.1 之前系统在NIST06上BLEU到20就停住了(theano版本和我师弟的版本都能到34)。beam search输出每一个beam,发现特别多的over translation问题。找了下发现自己系统出了一个小Bug,在beam search中忘记给source annotation加上mask。
Vocab = 16k,Batch = 50,两种优化算法每隔3000batch测一次BLEU。
Adam:
72000 BLEU score = 0.2412 BEST BLEU is 0
75000 BLEU score = 0.2377 BEST BLEU is 0.2412
78000 BLEU score = 0.2380 BEST BLEU is 0.2412
81000 BLEU score = 0.2513 BEST BLEU is 0.2412
84000 BLEU score = 0.2231 BEST BLEU is 0.2513
87000 BLEU score = 0.2527 BEST BLEU is 0.2513
90000 BLEU score = 0.2314 BEST BLEU is 0.2527
93000 BLEU score = 0.2498 BEST BLEU is 0.2527
96000 BLEU score = 0.2445 BEST BLEU is 0.2527
99000 BLEU score = 0.2487 BEST BLEU is 0.2527
102000 BLEU score = 0.2497 BEST BLEU is 0.2527
105000 BLEU score = 0.2523 BEST BLEU is 0.2527
108000 BLEU score = 0.2375 BEST BLEU is 0.2527
111000 BLEU score = 0.2380 BEST BLEU is 0.2527
114000 BLEU score = 0.2457 BEST BLEU is 0.2527
117000 BLEU score = 0.2525 BEST BLEU is 0.2527
120000 BLEU score = 0.2519 BEST BLEU is 0.2527
123000 BLEU score = 0.2491 BEST BLEU is 0.2527
126000 BLEU score = 0.2391 BEST BLEU is 0.2527
129000 BLEU score = 0.2304 BEST BLEU is 0.2527
132000 BLEU score = 0.2618 BEST BLEU is 0.2527
135000 BLEU score = 0.2489 BEST BLEU is 0.2618
138000 BLEU score = 0.2458 BEST BLEU is 0.2618
141000 BLEU score = 0.2505 BEST BLEU is 0.2618
144000 BLEU score = 0.2558 BEST BLEU is 0.2618
147000 BLEU score = 0.2492 BEST BLEU is 0.2618
150000 BLEU score = 0.2463 BEST BLEU is 0.2618
153000 BLEU score = 0.2586 BEST BLEU is 0.2618
156000 BLEU score = 0.2495 BEST BLEU is 0.2618
159000 BLEU score = 0.2568 BEST BLEU is 0.2618
162000 BLEU score = 0.2571 BEST BLEU is 0.2618
165000 BLEU score = 0.2611 BEST BLEU is 0.2618
168000 BLEU score = 0.2508 BEST BLEU is 0.2618
171000 BLEU score = 0.2450 BEST BLEU is 0.2618
174000 BLEU score = 0.2459 BEST BLEU is 0.2618
177000 BLEU score = 0.2579 BEST BLEU is 0.2618
180000 BLEU score = 0.2580 BEST BLEU is 0.2618
183000 BLEU score = 0.2520 BEST BLEU is 0.2618
186000 BLEU score = 0.2730 BEST BLEU is 0.2618
189000 BLEU score = 0.2430 BEST BLEU is 0.273
192000 BLEU score = 0.2571 BEST BLEU is 0.273
195000 BLEU score = 0.2541 BEST BLEU is 0.273
198000 BLEU score = 0.2471 BEST BLEU is 0.273
201000 BLEU score = 0.2491 BEST BLEU is 0.273
204000 BLEU score = 0.2589 BEST BLEU is 0.273
207000 BLEU score = 0.2523 BEST BLEU is 0.273
210000 BLEU score = 0.2536 BEST BLEU is 0.273
213000 BLEU score = 0.2557 BEST BLEU is 0.273
216000 BLEU score = 0.2457 BEST BLEU is 0.273
219000 BLEU score = 0.2661 BEST BLEU is 0.273
222000 BLEU score = 0.2515 BEST BLEU is 0.273
225000 BLEU score = 0.2644 BEST BLEU is 0.273
228000 BLEU score = 0.2616 BEST BLEU is 0.273
231000 BLEU score = 0.2554 BEST BLEU is 0.273
234000 BLEU score = 0.2621 BEST BLEU is 0.273
237000 BLEU score = 0.2519 BEST BLEU is 0.273
240000 BLEU score = 0.2440 BEST BLEU is 0.273
243000 BLEU score = 0.2572 BEST BLEU is 0.273
246000 BLEU score = 0.2488 BEST BLEU is 0.273
249000 BLEU score = 0.2631 BEST BLEU is 0.273
252000 BLEU score = 0.2584 BEST BLEU is 0.273
255000 BLEU score = 0.2570 BEST BLEU is 0.273
258000 BLEU score = 0.2581 BEST BLEU is 0.273
261000 BLEU score = 0.2510 BEST BLEU is 0.273
264000 BLEU score = 0.2476 BEST BLEU is 0.273
267000 BLEU score = 0.2667 BEST BLEU is 0.273
270000 BLEU score = 0.2689 BEST BLEU is 0.273
273000 BLEU score = 0.2596 BEST BLEU is 0.273
276000 BLEU score = 0.2592 BEST BLEU is 0.273
279000 BLEU score = 0.2617 BEST BLEU is 0.273
282000 BLEU score = 0.2652 BEST BLEU is 0.273
285000 BLEU score = 0.2651 BEST BLEU is 0.273
288000 BLEU score = 0.2732 BEST BLEU is 0.273
291000 BLEU score = 0.2505 BEST BLEU is 0.2732
294000 BLEU score = 0.2545 BEST BLEU is 0.2732
297000 BLEU score = 0.2737 BEST BLEU is 0.2732
300000 BLEU score = 0.2662 BEST BLEU is 0.2737
Adadelta:
72000 BLEU score = 0.1732 BEST BLEU is 0
75000 BLEU score = 0.1752 BEST BLEU is 0.1732
78000 BLEU score = 0.1888 BEST BLEU is 0.1752
81000 BLEU score = 0.1771 BEST BLEU is 0.1888
84000 BLEU score = 0.1876 BEST BLEU is 0.1888
87000 BLEU score = 0.1968 BEST BLEU is 0.1888
90000 BLEU score = 0.1664 BEST BLEU is 0.1968
93000 BLEU score = 0.2059 BEST BLEU is 0.1968
96000 BLEU score = 0.1816 BEST BLEU is 0.2059
99000 BLEU score = 0.2098 BEST BLEU is 0.2059
102000 BLEU score = 0.2086 BEST BLEU is 0.2098
105000 BLEU score = 0.2029 BEST BLEU is 0.2098
108000 BLEU score = 0.2222 BEST BLEU is 0.2098
111000 BLEU score = 0.1929 BEST BLEU is 0.2222
114000 BLEU score = 0.1951 BEST BLEU is 0.2222
117000 BLEU score = 0.2212 BEST BLEU is 0.2222
120000 BLEU score = 0.2111 BEST BLEU is 0.2222
123000 BLEU score = 0.1981 BEST BLEU is 0.2222
126000 BLEU score = 0.2054 BEST BLEU is 0.2222
129000 BLEU score = 0.2228 BEST BLEU is 0.2222
132000 BLEU score = 0.2250 BEST BLEU is 0.2228
135000 BLEU score = 0.2061 BEST BLEU is 0.225
138000 BLEU score = 0.2333 BEST BLEU is 0.225
141000 BLEU score = 0.2236 BEST BLEU is 0.2333
144000 BLEU score = 0.2123 BEST BLEU is 0.2333
147000 BLEU score = 0.2242 BEST BLEU is 0.2333
150000 BLEU score = 0.2120 BEST BLEU is 0.2333
153000 BLEU score = 0.2404 BEST BLEU is 0.2333
156000 BLEU score = 0.2348 BEST BLEU is 0.2404
159000 BLEU score = 0.2195 BEST BLEU is 0.2404
162000 BLEU score = 0.2383 BEST BLEU is 0.2404
165000 BLEU score = 0.2192 BEST BLEU is 0.2404
168000 BLEU score = 0.2240 BEST BLEU is 0.2404
171000 BLEU score = 0.2265 BEST BLEU is 0.2404
174000 BLEU score = 0.2211 BEST BLEU is 0.2404
177000 BLEU score = 0.2302 BEST BLEU is 0.2404
180000 BLEU score = 0.2360 BEST BLEU is 0.2404
183000 BLEU score = 0.2161 BEST BLEU is 0.2404
186000 BLEU score = 0.2316 BEST BLEU is 0.2404
189000 BLEU score = 0.2298 BEST BLEU is 0.2404
192000 BLEU score = 0.2316 BEST BLEU is 0.2404
195000 BLEU score = 0.2166 BEST BLEU is 0.2404
198000 BLEU score = 0.2350 BEST BLEU is 0.2404
201000 BLEU score = 0.2295 BEST BLEU is 0.2404
204000 BLEU score = 0.2456 BEST BLEU is 0.2404
207000 BLEU score = 0.2392 BEST BLEU is 0.2456
210000 BLEU score = 0.2311 BEST BLEU is 0.2456
213000 BLEU score = 0.2113 BEST BLEU is 0.2456
216000 BLEU score = 0.2223 BEST BLEU is 0.2456
219000 BLEU score = 0.2258 BEST BLEU is 0.2456
222000 BLEU score = 0.2304 BEST BLEU is 0.2456
225000 BLEU score = 0.2165 BEST BLEU is 0.2456
228000 BLEU score = 0.2336 BEST BLEU is 0.2456
231000 BLEU score = 0.2345 BEST BLEU is 0.2456
234000 BLEU score = 0.2444 BEST BLEU is 0.2456
237000 BLEU score = 0.2310 BEST BLEU is 0.2456
240000 BLEU score = 0.2406 BEST BLEU is 0.2456
243000 BLEU score = 0.2294 BEST BLEU is 0.2456
246000 BLEU score = 0.2469 BEST BLEU is 0.2456
249000 BLEU score = 0.2479 BEST BLEU is 0.2469
252000 BLEU score = 0.2464 BEST BLEU is 0.2479
255000 BLEU score = 0.2490 BEST BLEU is 0.2479
258000 BLEU score = 0.2406 BEST BLEU is 0.249
261000 BLEU score = 0.2477 BEST BLEU is 0.249
264000 BLEU score = 0.2392 BEST BLEU is 0.249
267000 BLEU score = 0.2516 BEST BLEU is 0.249
270000 BLEU score = 0.2521 BEST BLEU is 0.2516
273000 BLEU score = 0.2370 BEST BLEU is 0.2521
276000 BLEU score = 0.2431 BEST BLEU is 0.2521
279000 BLEU score = 0.2548 BEST BLEU is 0.2521
282000 BLEU score = 0.2605 BEST BLEU is 0.2548
285000 BLEU score = 0.2421 BEST BLEU is 0.2605
288000 BLEU score = 0.2446 BEST BLEU is 0.2605
291000 BLEU score = 0.2521 BEST BLEU is 0.2605
294000 BLEU score = 0.2529 BEST BLEU is 0.2605
297000 BLEU score = 0.2453 BEST BLEU is 0.2605
300000 BLEU score = 0.2361 BEST BLEU is 0.2605
13.2 系统性能还是没有我师弟的高。继续检查发现初始化init_context变量有问题。我为了图方便,用最后一个source annotation作为init context,我师弟和之前theano版本用的是mean pooling方法。Nematus: a Toolkit for Neural Machine Translation也提到了这个问题:We initialize the decoder hidden state with the mean of the source annotation, rather than the annotation at the last position of the encoder backward RNN. 似乎我那种做法是有问题的。
13.3
14. Dropout问题
http://blog.csdn.net/wangxinginnlp/article/details/72649820
*. GRU+Embedding的追溯
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell_impl.py
class _RNNCell(object):
"""Abstract object representing an RNN cell.
Every `RNNCell` must have the properties below and implement `__call__` with
the following signature.
This definition of cell differs from the definition used in the literature.
In the literature, 'cell' refers to an object with a single scalar output.
This definition refers to a horizontal array of such units.
An RNN cell, in the most abstract setting, is anything that has
a state and performs some operation that takes a matrix of inputs.
This operation results in an output matrix with `self.output_size` columns.
If `self.state_size` is an integer, this operation also results in a new
state matrix with `self.state_size` columns. If `self.state_size` is a
tuple of integers, then it results in a tuple of `len(state_size)` state
matrices, each with a column size corresponding to values in `state_size`.
"""
def __call__(self, inputs, state, scope=None):
"""Run this RNN cell on inputs, starting from the given state.
Args:
inputs: `2-D` tensor with shape `[batch_size x input_size]`.
state: if `self.state_size` is an integer, this should be a `2-D Tensor`
with shape `[batch_size x self.state_size]`. Otherwise, if
`self.state_size` is a tuple of integers, this should be a tuple
with shapes `[batch_size x s] for s in self.state_size`.
scope: VariableScope for the created subgraph; defaults to class name.
Returns:
A pair containing:
- Output: A `2-D` tensor with shape `[batch_size x self.output_size]`.
- New state: Either a single `2-D` tensor, or a tuple of tensors matching
the arity and shapes of `state`.
"""
raise NotImplementedError("Abstract method")
@property
def state_size(self):
"""size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers
or TensorShapes.
"""
raise NotImplementedError("Abstract method")
@property
def output_size(self):
"""Integer or TensorShape: size of outputs produced by this cell."""
raise NotImplementedError("Abstract method")
def zero_state(self, batch_size, dtype):
"""Return zero-filled state tensor(s).
Args:
batch_size: int, float, or unit Tensor representing the batch size.
dtype: the data type to use for the state.
Returns:
If `state_size` is an int or TensorShape, then the return value is a
`N-D` tensor of shape `[batch_size x state_size]` filled with zeros.
If `state_size` is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of `2-D` tensors with
the shapes `[batch_size x s]` for each s in `state_size`.
"""
with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
state_size = self.state_size
return _zero_state_tensors(state_size, batch_size, dtype)
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py
_BIAS_VARIABLE_NAME = "biases"
_WEIGHTS_VARIABLE_NAME = "weights"
'''
matmul(): ultiplies matrix `a` by matrix `b`, producing `a` * `b`.
concat(): Concatenates tensors along one dimension.
'''
def _linear(args, output_size, bias, bias_start=0.0):
"""Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.
Args:
args: a 2D Tensor or a list of 2D, batch x n, Tensors.
output_size: int, second dimension of W[i].
bias: boolean, whether to add a bias term or not.
bias_start: starting value to initialize the bias; 0 by default.
Returns:
A 2D Tensor with shape [batch x output_size] equal to
sum_i(args[i] * W[i]), where W[i]s are newly created matrices.
Raises:
ValueError: if some of the arguments has unspecified or wrong shape.
"""
if args is None or (nest.is_sequence(args) and not args):
raise ValueError("`args` must be specified")
if not nest.is_sequence(args):
args = [args]
# Calculate the total size of arguments on dimension 1.
total_arg_size = 0
shapes = [a.get_shape() for a in args]
for shape in shapes:
if shape.ndims != 2:
raise ValueError("linear is expecting 2D arguments: %s" % shapes)
if shape[1].value is None:
raise ValueError("linear expects shape[1] to be provided for shape %s, "
"but saw %s" % (shape, shape[1]))
else:
total_arg_size += shape[1].value
dtype = [a.dtype for a in args][0]
# Now the computation.
scope = vs.get_variable_scope()
with vs.variable_scope(scope) as outer_scope:
weights = vs.get_variable(
_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)
if len(args) == 1:
res = math_ops.matmul(args[0], weights)
else:
res = math_ops.matmul(array_ops.concat(args, 1), weights)
if not bias:
return res
with vs.variable_scope(outer_scope) as inner_scope:
inner_scope.set_partitioner(None)
biases = vs.get_variable(
_BIAS_VARIABLE_NAME, [output_size],
dtype=dtype,
initializer=init_ops.constant_initializer(bias_start, dtype=dtype))
return nn_ops.bias_add(res, biases)
class GRUCell(RNNCell):
"""Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""
def __init__(self, num_units, input_size=None, activation=tanh, reuse=None):
if input_size is not None:
logging.warn("%s: The input_size parameter is deprecated.", self)
self._num_units = num_units
self._activation = activation
self._reuse = reuse
@property
def state_size(self):
return self._num_units
@property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Gated recurrent unit (GRU) with nunits cells."""
with _checked_scope(self, scope or "gru_cell", reuse=self._reuse):
with vs.variable_scope("gates"): # Reset gate and update gate.
# We start with bias of 1.0 to not reset and not update.
value = sigmoid(_linear(
[inputs, state], 2 * self._num_units, True, 1.0))
r, u = array_ops.split(
value=value,
num_or_size_splits=2,
axis=1)
with vs.variable_scope("candidate"):
c = self._activation(_linear([inputs, r * state],
self._num_units, True))
new_h = u * state + (1 - u) * c
return new_h, new_h
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py
class EmbeddingWrapper(RNNCell):
"""Operator adding input embedding to the given cell.
Note: in many cases it may be more efficient to not use this wrapper,
but instead concatenate the whole sequence of your inputs in time,
do the embedding on this batch-concatenated sequence, then split it and
feed into your RNN.
"""
def __init__(self, cell, embedding_classes, embedding_size, initializer=None,
reuse=None):
"""Create a cell with an added input embedding.
Args:
cell: an RNNCell, an embedding will be put before its inputs.
embedding_classes: integer, how many symbols will be embedded.
embedding_size: integer, the size of the vectors we embed into.
initializer: an initializer to use when creating the embedding;
if None, the initializer from variable scope or a default one is used.
reuse: (optional) Python boolean describing whether to reuse variables
in an existing scope. If not `True`, and the existing scope already has
the given variables, an error is raised.
Raises:
TypeError: if cell is not an RNNCell.
ValueError: if embedding_classes is not positive.
"""
if not isinstance(cell, RNNCell):
raise TypeError("The parameter cell is not RNNCell.")
if embedding_classes <= 0 or embedding_size <= 0:
raise ValueError("Both embedding_classes and embedding_size must be > 0: "
"%d, %d." % (embedding_classes, embedding_size))
self._cell = cell
self._embedding_classes = embedding_classes
self._embedding_size = embedding_size
self._initializer = initializer
self._reuse = reuse
@property
def state_size(self):
return self._cell.state_size
@property
def output_size(self):
return self._cell.output_size
def zero_state(self, batch_size, dtype):
with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]):
return self._cell.zero_state(batch_size, dtype)
def __call__(self, inputs, state, scope=None):
"""Run the cell on embedded inputs."""
with _checked_scope(self, scope or "embedding_wrapper", reuse=self._reuse):
with ops.device("/cpu:0"):
if self._initializer:
initializer = self._initializer
elif vs.get_variable_scope().initializer:
initializer = vs.get_variable_scope().initializer
else:
# Default initializer for embeddings should have variance=1.
sqrt3 = math.sqrt(3) # Uniform(-sqrt(3), sqrt(3)) has variance=1.
initializer = init_ops.random_uniform_initializer(-sqrt3, sqrt3)
if type(state) is tuple:
data_type = state[0].dtype
else:
data_type = state.dtype
embedding = vs.get_variable(
"embedding", [self._embedding_classes, self._embedding_size],
initializer=initializer,
dtype=data_type)
embedded = embedding_ops.embedding_lookup(
embedding, array_ops.reshape(inputs, [-1]))
return self._cell(embedded, state)
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn.py
def static_rnn(cell, inputs, initial_state=None, dtype=None,
sequence_length=None, scope=None):
"""Creates a recurrent neural network specified by RNNCell `cell`.
The simplest form of RNN network generated is:
```python
state = cell.zero_state(...)
outputs = []
for input_ in inputs:
output, state = cell(input_, state)
outputs.append(output)
return (outputs, state)
```
However, a few other options are available:
An initial state can be provided.
If the sequence_length vector is provided, dynamic calculation is performed.
This method of calculation does not compute the RNN steps past the maximum
sequence length of the minibatch (thus saving computational time),
and properly propagates the state at an example's sequence length
to the final state output.
The dynamic calculation performed is, at time `t` for batch row `b`,
```python
(output, state)(b, t) =
(t >= sequence_length(b))
? (zeros(cell.output_size), states(b, sequence_length(b) - 1))
: cell(input(b, t), state(b, t - 1))
```
Args:
cell: An instance of RNNCell.
inputs: A length T list of inputs, each a `Tensor` of shape
`[batch_size, input_size]`, or a nested tuple of such elements.
initial_state: (optional) An initial state for the RNN.
If `cell.state_size` is an integer, this must be
a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`.
If `cell.state_size` is a tuple, this should be a tuple of
tensors having shapes `[batch_size, s] for s in cell.state_size`.
dtype: (optional) The data type for the initial state and expected output.
Required if initial_state is not provided or RNN state has a heterogeneous
dtype.
sequence_length: Specifies the length of each sequence in inputs.
An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`.
scope: VariableScope for the created subgraph; defaults to "rnn".
Returns:
A pair (outputs, state) where:
- outputs is a length T list of outputs (one for each input), or a nested
tuple of such elements.
- state is the final state
Raises:
TypeError: If `cell` is not an instance of RNNCell.
ValueError: If `inputs` is `None` or an empty list, or if the input depth
(column size) cannot be inferred from inputs via shape inference.
"""
if not isinstance(cell, core_rnn_cell.RNNCell):
raise TypeError("cell must be an instance of RNNCell")
if not nest.is_sequence(inputs):
raise TypeError("inputs must be a sequence")
if not inputs:
raise ValueError("inputs must not be empty")
outputs = []
# Create a new scope in which the caching device is either
# determined by the parent scope, or is set to place the cached
# Variable using the same placement as for the rest of the RNN.
with vs.variable_scope(scope or "rnn") as varscope:
if varscope.caching_device is None:
varscope.set_caching_device(lambda op: op.device)
# Obtain the first sequence of the input
first_input = inputs
while nest.is_sequence(first_input):
first_input = first_input[0]
# Temporarily avoid EmbeddingWrapper and seq2seq badness
# TODO(lukaszkaiser): remove EmbeddingWrapper
if first_input.get_shape().ndims != 1:
input_shape = first_input.get_shape().with_rank_at_least(2)
fixed_batch_size = input_shape[0]
flat_inputs = nest.flatten(inputs)
for flat_input in flat_inputs:
input_shape = flat_input.get_shape().with_rank_at_least(2)
batch_size, input_size = input_shape[0], input_shape[1:]
fixed_batch_size.merge_with(batch_size)
for i, size in enumerate(input_size):
if size.value is None:
raise ValueError(
"Input size (dimension %d of inputs) must be accessible via "
"shape inference, but saw value None." % i)
else:
fixed_batch_size = first_input.get_shape().with_rank_at_least(1)[0]
if fixed_batch_size.value:
batch_size = fixed_batch_size.value
else:
batch_size = array_ops.shape(first_input)[0]
if initial_state is not None:
state = initial_state
else:
if not dtype:
raise ValueError("If no initial_state is provided, "
"dtype must be specified")
state = cell.zero_state(batch_size, dtype)
if sequence_length is not None: # Prepare variables
sequence_length = ops.convert_to_tensor(
sequence_length, name="sequence_length")
if sequence_length.get_shape().ndims not in (None, 1):
raise ValueError(
"sequence_length must be a vector of length batch_size")
def _create_zero_output(output_size):
# convert int to TensorShape if necessary
size = _state_size_with_prefix(output_size, prefix=[batch_size])
output = array_ops.zeros(
array_ops.stack(size), _infer_state_dtype(dtype, state))
shape = _state_size_with_prefix(
output_size, prefix=[fixed_batch_size.value])
output.set_shape(tensor_shape.TensorShape(shape))
return output
output_size = cell.output_size
flat_output_size = nest.flatten(output_size)
flat_zero_output = tuple(
_create_zero_output(size) for size in flat_output_size)
zero_output = nest.pack_sequence_as(structure=output_size,
flat_sequence=flat_zero_output)
sequence_length = math_ops.to_int32(sequence_length)
min_sequence_length = math_ops.reduce_min(sequence_length)
max_sequence_length = math_ops.reduce_max(sequence_length)
for time, input_ in enumerate(inputs):
if time > 0: varscope.reuse_variables()
# pylint: disable=cell-var-from-loop
call_cell = lambda: cell(input_, state)
# pylint: enable=cell-var-from-loop
if sequence_length is not None:
(output, state) = _rnn_step(
time=time,
sequence_length=sequence_length,
min_sequence_length=min_sequence_length,
max_sequence_length=max_sequence_length,
zero_output=zero_output,
state=state,
call_cell=call_cell,
state_size=cell.state_size)
else:
(output, state) = call_cell()
outputs.append(output)
return (outputs, state)
*. embedding_attention_seq2seq追溯
embedding_attention_decoder -> attention_decoder
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py
def attention_decoder(decoder_inputs,
initial_state,
attention_states,
cell,
output_size=None,
num_heads=1,
loop_function=None,
dtype=None,
scope=None,
initial_state_attention=False):
"""RNN decoder with attention for the sequence-to-sequence model.
In this context "attention" means that, during decoding, the RNN can look up
information in the additional tensor attention_states, and it does this by
focusing on a few entries from the tensor. This model has proven to yield
especially good results in a number of sequence-to-sequence tasks. This
implementation is based on http://arxiv.org/abs/1412.7449 (see below for
details). It is recommended for complex sequence-to-sequence tasks.
Args:
decoder_inputs: A list of 2D Tensors [batch_size x input_size].
initial_state: 2D Tensor [batch_size x cell.state_size].
attention_states: 3D Tensor [batch_size x attn_length x attn_size].
cell: core_rnn_cell.RNNCell defining the cell function and size.
output_size: Size of the output vectors; if None, we use cell.output_size.
num_heads: Number of attention heads that read from attention_states.
loop_function: If not None, this function will be applied to i-th output
in order to generate i+1-th input, and decoder_inputs will be ignored,
except for the first element ("GO" symbol). This can be used for decoding,
but also for training to emulate http://arxiv.org/abs/1506.03099.
Signature -- loop_function(prev, i) = next
* prev is a 2D Tensor of shape [batch_size x output_size],
* i is an integer, the step number (when advanced control is needed),
* next is a 2D Tensor of shape [batch_size x input_size].
dtype: The dtype to use for the RNN initial state (default: tf.float32).
scope: VariableScope for the created subgraph; default: "attention_decoder".
initial_state_attention: If False (default), initial attentions are zero.
If True, initialize the attentions from the initial state and attention
states -- useful when we wish to resume decoding from a previously
stored decoder state and attention states.
Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors of
shape [batch_size x output_size]. These represent the generated outputs.
Output i is computed from input i (which is either the i-th element
of decoder_inputs or loop_function(output {i-1}, i)) as follows.
First, we run the cell on a combination of the input and previous
attention masks:
cell_output, new_state = cell(linear(input, prev_attn), prev_state).
Then, we calculate new attention masks:
new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))
and then we calculate the output:
output = linear(cell_output, new_attn).
state: The state of each decoder cell the final time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
Raises:
ValueError: when num_heads is not positive, there are no inputs, shapes
of attention_states are not set, or input size cannot be inferred
from the input.
"""
if not decoder_inputs:
raise ValueError("Must provide at least 1 input to attention decoder.")
if num_heads < 1:
raise ValueError("With less than 1 heads, use a non-attention decoder.")
if attention_states.get_shape()[2].value is None:
raise ValueError("Shape[2] of attention_states must be known: %s" %
attention_states.get_shape())
if output_size is None:
output_size = cell.output_size
with variable_scope.variable_scope(
scope or "attention_decoder", dtype=dtype) as scope:
dtype = scope.dtype
batch_size = array_ops.shape(decoder_inputs[0])[0] # Needed for reshaping.
attn_length = attention_states.get_shape()[1].value
if attn_length is None:
attn_length = array_ops.shape(attention_states)[1]
attn_size = attention_states.get_shape()[2].value
# To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before.
hidden = array_ops.reshape(attention_states,
[-1, attn_length, 1, attn_size])
hidden_features = []
v = []
attention_vec_size = attn_size # Size of query vectors for attention.
for a in xrange(num_heads):
k = variable_scope.get_variable("AttnW_%d" % a,
[1, 1, attn_size, attention_vec_size])
hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
v.append(
variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size]))
state = initial_state
def attention(query):
"""Put attention masks on hidden using hidden_features and query."""
ds = [] # Results of attention reads will be stored here.
if nest.is_sequence(query): # If the query is a tuple, flatten it.
query_list = nest.flatten(query)
for q in query_list: # Check that ndims == 2 if specified.
ndims = q.get_shape().ndims
if ndims:
assert ndims == 2
query = array_ops.concat(query_list, 1)
for a in xrange(num_heads):
with variable_scope.variable_scope("Attention_%d" % a):
y = linear(query, attention_vec_size, True)
y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
# Attention mask is a softmax of v^T * tanh(...).
s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y),
[2, 3])
a = nn_ops.softmax(s)
# Now calculate the attention-weighted vector d.
d = math_ops.reduce_sum(
array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2])
ds.append(array_ops.reshape(d, [-1, attn_size]))
return ds
outputs = []
prev = None
batch_attn_size = array_ops.stack([batch_size, attn_size])
attns = [
array_ops.zeros(
batch_attn_size, dtype=dtype) for _ in xrange(num_heads)
]
for a in attns: # Ensure the second shape of attention vectors is set.
a.set_shape([None, attn_size])
if initial_state_attention:
attns = attention(initial_state)
for i, inp in enumerate(decoder_inputs):
if i > 0:
variable_scope.get_variable_scope().reuse_variables()
# If loop_function is set, we use it instead of decoder_inputs.
if loop_function is not None and prev is not None:
with variable_scope.variable_scope("loop_function", reuse=True):
inp = loop_function(prev, i)
# Merge input and previous attentions into one vector of the right size.
input_size = inp.get_shape().with_rank(2)[1]
if input_size.value is None:
raise ValueError("Could not infer input size from input: %s" % inp.name)
x = linear([inp] + attns, input_size, True)
# Run the RNN.
cell_output, state = cell(x, state)
# Run the attention mechanism.
if i == 0 and initial_state_attention:
with variable_scope.variable_scope(
variable_scope.get_variable_scope(), reuse=True):
attns = attention(state)
else:
attns = attention(state)
with variable_scope.variable_scope("AttnOutputProjection"):
output = linear([cell_output] + attns, output_size, True)
if loop_function is not None:
prev = output
outputs.append(output)
return outputs, state
def embedding_attention_decoder(decoder_inputs,
initial_state,
attention_states,
cell,
num_symbols,
embedding_size,
num_heads=1,
output_size=None,
output_projection=None,
feed_previous=False,
update_embedding_for_previous=True,
dtype=None,
scope=None,
initial_state_attention=False):
"""RNN decoder with embedding and attention and a pure-decoding option.
Args:
decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs).
initial_state: 2D Tensor [batch_size x cell.state_size].
attention_states: 3D Tensor [batch_size x attn_length x attn_size].
cell: core_rnn_cell.RNNCell defining the cell function.
num_symbols: Integer, how many symbols come into the embedding.
embedding_size: Integer, the length of the embedding vector for each symbol.
num_heads: Number of attention heads that read from attention_states.
output_size: Size of the output vectors; if None, use output_size.
output_projection: None or a pair (W, B) of output projection weights and
biases; W has shape [output_size x num_symbols] and B has shape
[num_symbols]; if provided and feed_previous=True, each fed previous
output will first be multiplied by W and added B.
feed_previous: Boolean; if True, only the first of decoder_inputs will be
used (the "GO" symbol), and all other decoder inputs will be generated by:
next = embedding_lookup(embedding, argmax(previous_output)),
In effect, this implements a greedy decoder. It can also be used
during training to emulate http://arxiv.org/abs/1506.03099.
If False, decoder_inputs are used as given (the standard decoder case).
update_embedding_for_previous: Boolean; if False and feed_previous=True,
only the embedding for the first symbol of decoder_inputs (the "GO"
symbol) will be updated by back propagation. Embeddings for the symbols
generated from the decoder itself remain unchanged. This parameter has
no effect if feed_previous=False.
dtype: The dtype to use for the RNN initial states (default: tf.float32).
scope: VariableScope for the created subgraph; defaults to
"embedding_attention_decoder".
initial_state_attention: If False (default), initial attentions are zero.
If True, initialize the attentions from the initial state and attention
states -- useful when we wish to resume decoding from a previously
stored decoder state and attention states.
Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors with
shape [batch_size x output_size] containing the generated outputs.
state: The state of each decoder cell at the final time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
Raises:
ValueError: When output_projection has the wrong shape.
"""
if output_size is None:
output_size = cell.output_size
if output_projection is not None:
proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)
proj_biases.get_shape().assert_is_compatible_with([num_symbols])
with variable_scope.variable_scope(
scope or "embedding_attention_decoder", dtype=dtype) as scope:
embedding = variable_scope.get_variable("embedding",
[num_symbols, embedding_size])
loop_function = _extract_argmax_and_embed(
embedding, output_projection,
update_embedding_for_previous) if feed_previous else None
emb_inp = [
embedding_ops.embedding_lookup(embedding, i) for i in decoder_inputs
]
return attention_decoder(
emb_inp,
initial_state,
attention_states,
cell,
output_size=output_size,
num_heads=num_heads,
loop_function=loop_function,
initial_state_attention=initial_state_attention)
def embedding_attention_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
num_heads=1,
output_projection=None,
feed_previous=False,
dtype=None,
scope=None,
initial_state_attention=False):
"""Embedding sequence-to-sequence model with attention.
This model first embeds encoder_inputs by a newly created embedding (of shape
[num_encoder_symbols x input_size]). Then it runs an RNN to encode
embedded encoder_inputs into a state vector. It keeps the outputs of this
RNN at every step to use for attention later. Next, it embeds decoder_inputs
by another newly created embedding (of shape [num_decoder_symbols x
input_size]). Then it runs attention decoder, initialized with the last
encoder state, on embedded decoder_inputs and attending to encoder outputs.
Warning: when output_projection is None, the size of the attention vectors
and variables will be made proportional to num_decoder_symbols, can be large.
Args:
encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
cell: core_rnn_cell.RNNCell defining the cell function and size.
num_encoder_symbols: Integer; number of symbols on the encoder side.
num_decoder_symbols: Integer; number of symbols on the decoder side.
embedding_size: Integer, the length of the embedding vector for each symbol.
num_heads: Number of attention heads that read from attention_states.
output_projection: None or a pair (W, B) of output projection weights and
biases; W has shape [output_size x num_decoder_symbols] and B has
shape [num_decoder_symbols]; if provided and feed_previous=True, each
fed previous output will first be multiplied by W and added B.
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
of decoder_inputs will be used (the "GO" symbol), and all other decoder
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
If False, decoder_inputs are used as given (the standard decoder case).
dtype: The dtype of the initial RNN state (default: tf.float32).
scope: VariableScope for the created subgraph; defaults to
"embedding_attention_seq2seq".
initial_state_attention: If False (default), initial attentions are zero.
If True, initialize the attentions from the initial state and attention
states.
Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors with
shape [batch_size x num_decoder_symbols] containing the generated
outputs.
state: The state of each decoder cell at the final time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
"""
with variable_scope.variable_scope(
scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
dtype = scope.dtype
# Encoder.
encoder_cell = copy.deepcopy(cell)
encoder_cell = core_rnn_cell.EmbeddingWrapper(
encoder_cell,
embedding_classes=num_encoder_symbols,
embedding_size=embedding_size)
encoder_outputs, encoder_state = core_rnn.static_rnn(
encoder_cell, encoder_inputs, dtype=dtype)
# First calculate a concatenation of encoder outputs to put attention on.
top_states = [
array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs
]
attention_states = array_ops.concat(top_states, 1)
# Decoder.
output_size = None
if output_projection is None:
cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
output_size = num_decoder_symbols
if isinstance(feed_previous, bool):
return embedding_attention_decoder(
decoder_inputs,
encoder_state,
attention_states,
cell,
num_decoder_symbols,
embedding_size,
num_heads=num_heads,
output_size=output_size,
output_projection=output_projection,
feed_previous=feed_previous,
initial_state_attention=initial_state_attention)
# If feed_previous is a Tensor, we construct 2 graphs and use cond.
def decoder(feed_previous_bool):
reuse = None if feed_previous_bool else True
with variable_scope.variable_scope(
variable_scope.get_variable_scope(), reuse=reuse):
outputs, state = embedding_attention_decoder(
decoder_inputs,
encoder_state,
attention_states,
cell,
num_decoder_symbols,
embedding_size,
num_heads=num_heads,
output_size=output_size,
output_projection=output_projection,
feed_previous=feed_previous_bool,
update_embedding_for_previous=False,
initial_state_attention=initial_state_attention)
state_list = [state]
if nest.is_sequence(state):
state_list = nest.flatten(state)
return outputs + state_list
outputs_and_state = control_flow_ops.cond(feed_previous,
lambda: decoder(True),
lambda: decoder(False))
outputs_len = len(decoder_inputs) # Outputs length same as decoder inputs.
state_list = outputs_and_state[outputs_len:]
state = state_list[0]
if nest.is_sequence(encoder_state):
state = nest.pack_sequence_as(
structure=encoder_state, flat_sequence=state_list)
return outputs_and_state[:outputs_len], state
6. model_with_buckets追溯
def sequence_loss_by_example(logits,
targets,
weights,
average_across_timesteps=True,
softmax_loss_function=None,
name=None):
"""Weighted cross-entropy loss for a sequence of logits (per example).
Args:
logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
targets: List of 1D batch-sized int32 Tensors of the same length as logits.
weights: List of 1D batch-sized float-Tensors of the same length as logits.
average_across_timesteps: If set, divide the returned cost by the total
label weight.
softmax_loss_function: Function (labels, logits) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
**Note that to avoid confusion, it is required for the function to accept
named arguments.**
name: Optional name for this operation, default: "sequence_loss_by_example".
Returns:
1D batch-sized float Tensor: The log-perplexity for each sequence.
Raises:
ValueError: If len(logits) is different from len(targets) or len(weights).
"""
if len(targets) != len(logits) or len(weights) != len(logits):
raise ValueError("Lengths of logits, weights, and targets must be the same "
"%d, %d, %d." % (len(logits), len(weights), len(targets)))
with ops.name_scope(name, "sequence_loss_by_example",
logits + targets + weights):
log_perp_list = []
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
labels=target, logits=logit)
else:
crossent = softmax_loss_function(labels=target, logits=logit)
log_perp_list.append(crossent * weight)
log_perps = math_ops.add_n(log_perp_list)
if average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
log_perps /= total_size
return log_perps
def sequence_loss(logits,
targets,
weights,
average_across_timesteps=True,
average_across_batch=True,
softmax_loss_function=None,
name=None):
"""Weighted cross-entropy loss for a sequence of logits, batch-collapsed.
Args:
logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
targets: List of 1D batch-sized int32 Tensors of the same length as logits.
weights: List of 1D batch-sized float-Tensors of the same length as logits.
average_across_timesteps: If set, divide the returned cost by the total
label weight.
average_across_batch: If set, divide the returned cost by the batch size.
softmax_loss_function: Function (labels, logits) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
**Note that to avoid confusion, it is required for the function to accept
named arguments.**
name: Optional name for this operation, defaults to "sequence_loss".
Returns:
A scalar float Tensor: The average log-perplexity per symbol (weighted).
Raises:
ValueError: If len(logits) is different from len(targets) or len(weights).
"""
with ops.name_scope(name, "sequence_loss", logits + targets + weights):
cost = math_ops.reduce_sum(
sequence_loss_by_example(
logits,
targets,
weights,
average_across_timesteps=average_across_timesteps,
softmax_loss_function=softmax_loss_function))
if average_across_batch:
batch_size = array_ops.shape(targets[0])[0]
return cost / math_ops.cast(batch_size, cost.dtype)
else:
return cost
def model_with_buckets(encoder_inputs,
decoder_inputs,
targets,
weights,
buckets,
seq2seq,
softmax_loss_function=None,
per_example_loss=False,
name=None):
"""Create a sequence-to-sequence model with support for bucketing.
The seq2seq argument is a function that defines a sequence-to-sequence model,
e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(
x, y, core_rnn_cell.GRUCell(24))
Args:
encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input.
decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input.
targets: A list of 1D batch-sized int32 Tensors (desired output sequence).
weights: List of 1D batch-sized float-Tensors to weight the targets.
buckets: A list of pairs of (input size, output size) for each bucket.
seq2seq: A sequence-to-sequence model function; it takes 2 input that
agree with encoder_inputs and decoder_inputs, and returns a pair
consisting of outputs and states (as, e.g., basic_rnn_seq2seq).
softmax_loss_function: Function (labels, logits) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
**Note that to avoid confusion, it is required for the function to accept
named arguments.**
per_example_loss: Boolean. If set, the returned loss will be a batch-sized
tensor of losses for each sequence in the batch. If unset, it will be
a scalar with the averaged loss from all examples.
name: Optional name for this operation, defaults to "model_with_buckets".
Returns:
A tuple of the form (outputs, losses), where:
outputs: The outputs for each bucket. Its j'th element consists of a list
of 2D Tensors. The shape of output tensors can be either
[batch_size x output_size] or [batch_size x num_decoder_symbols]
depending on the seq2seq model used.
losses: List of scalar Tensors, representing losses for each bucket, or,
if per_example_loss is set, a list of 1D batch-sized float Tensors.
Raises:
ValueError: If length of encoder_inputs, targets, or weights is smaller
than the largest (last) bucket.
"""
if len(encoder_inputs) < buckets[-1][0]:
raise ValueError("Length of encoder_inputs (%d) must be at least that of la"
"st bucket (%d)." % (len(encoder_inputs), buckets[-1][0]))
if len(targets) < buckets[-1][1]:
raise ValueError("Length of targets (%d) must be at least that of last"
"bucket (%d)." % (len(targets), buckets[-1][1]))
if len(weights) < buckets[-1][1]:
raise ValueError("Length of weights (%d) must be at least that of last"
"bucket (%d)." % (len(weights), buckets[-1][1]))
all_inputs = encoder_inputs + decoder_inputs + targets + weights
losses = []
outputs = []
with ops.name_scope(name, "model_with_buckets", all_inputs):
for j, bucket in enumerate(buckets):
with variable_scope.variable_scope(
variable_scope.get_variable_scope(), reuse=True if j > 0 else None):
bucket_outputs, _ = seq2seq(encoder_inputs[:bucket[0]],
decoder_inputs[:bucket[1]])
outputs.append(bucket_outputs)
if per_example_loss:
losses.append(
sequence_loss_by_example(
outputs[-1],
targets[:bucket[1]],
weights[:bucket[1]],
softmax_loss_function=softmax_loss_function))
else:
losses.append(
sequence_loss(
outputs[-1],
targets[:bucket[1]],
weights[:bucket[1]],
softmax_loss_function=softmax_loss_function))
return outputs, losses