版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xierhacker/article/details/78772560
参考:
Module: tf.contrib.cudnn_rnn
Module: tf.contrib.rnn
更新:
2017.12.25
增加了tf.nn.embedding_lookup 来进行embedding的内容
2018.1.14
增加tf.sequence_mask和tf.boolean_mask 来对于序列的padding进行去除的内容.
2018.3.13
增加了手动调用call 函数实现的LSTM的网络.
LSTM的理论就不多讲了,对于理论不是很熟悉的童鞋转到:
深度学习笔记七:循环神经网络RNN(基本理论)
深度学习笔记八:长短时记忆网络LSTM(基本理论)
来复习一下基本的理论.这里要知道的是,在深度学习的实践里面,必须先把理论给弄懂了才方便写代码的,LSTM更加是,所以务必把基础打好.不然在代码中很多地方为什么那么写都不知道.
理论搞定之后,很重要的一点就是实践上面怎么使用LSTM了,估计很多人在使用tensorflow写LSTM的时候走了弯路.花了很多时间才弄清楚一点.不是因为LSTM有多难(在时序和多层次上考虑,其实也还是有点抽象的),而是不知道常见的结构可以怎么定义怎么写出来.对于初学者是很不友好的.
所以,本文先直接给出API文档里面常用的几个类和函数,然后写一些玩具案例,虽然案例是玩具,但是全部消化的话,不说精通,入门绝对是够了.
一.重要函数和类
这节主要就是说一下tensorflow里面在LSTM中比较常用的API了,毕竟是砖头,弄清楚肯定是有益处的.
这里先列一下
tensorflow.contrib.rnn.BasicLSTMCell
tensorflow.contrib.rnn.MultiRNNCell
tf.nn.dynamic_rnn()
tf.nn.bidirectional_dynamic_rnn()
tf.sequence_mask()
tf.boolean_mask()
既然说到这里,那这里还说一个与词向量有关的常见函数,后面一并讲解.
tf.nn.embedding_lookup()
这里只说最基本的够用的. 当然还有几个这里没有列出来的,可以在最开始列出来的文档参考,等到以后升到高阶,也许会用得到.
Ⅰ.tensorflow.contrib.rnn.BasicLSTMCell
文档:tensorflow.contrib.rnn.BasicLSTMCell
BasicLSTMCell是比较基本的创建LSTM cell的一个类,首先来看一下使用的时候怎么创建一个对象吧,
构造函数为:
__init__(num_units,forget_bias=1.0,state_is_tuple=True,activation=None,reuse=None)
参数:
num_units: int类型,LSTM cell里面使用多少个units.其实就可以理解为节点数量.
forget_bias: float类型, 遗忘门加上一个bias. 为了减少在训练早期的遗忘尺度.
state_is_tuple: 要是为True 的话, 接受和返回的states都是一个tuples,其中成员是和返回的状态都是一个2元元组,成员分别为 c_state and m_state. 这里只看这个参数是True情况,因为将来将只有这种方式来使用state.
activation: 内部的激活函数,默认是tanh
举个例子,比如你想定义一个内部节点数为128的一个Cell,就可以用下面的语句,
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
cell=rnn.BasicLSTMCell(num_units=128, forget_bias=1.0, state_is_tuple=True)
你会发现这个构造函数里面居然没有基本的输入信息! 但是不用担心,关于输入的一些细节底层都做好了,只要在后面的环节里面给进去输入就行了.后面会继续讲到.
这里还要了解一点,相对于别人把这128叫做隐藏层的节点,其实我更倾向于理解为在Cell中的128个节点,每个节点接受同样的输入向量,然后得到一个值,128个节点合起来,输出的话就是一个128维的向量.
知道怎么创建之后,这里提一下这个类比较重要的两个属性(当然,这个类不止这两个属性).分别是:output_size和state_size.
看名字就能够猜到,output_size和state_size 分别表示的LSTM的输出状态和state信息的.这里就举几个例子来看一下这两个属性到底在各种情况下怎么表示.
例1:
import tensorflow as tf
import numpy as np
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=128)
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
print(lstm_cell.state_size.h)
print(lstm_cell.state_size.c)
结果:
其实这里num_units就已经决定了输出的尺度.128个units决定了output_size就是128维的,这个很简单.这里的重点是state的格式,这里发现他是一个LSTMStateTuple的类型,别管那么多,直接当做一个tuple看待就行.之所以是一个tuple,是因为state包含了h和c两方面的内容(这里需要知道一些LSTM的原理),更加详细的后面会讲到.
然后这个类还有一个很重要的函数,如下
zero_state(batch_size,dtype)
作用:
这个函数主要是用来进行填零的初始化.注意,这里是初始化一个state,不是初始化整个LSTM.
参数:
batch_size: 批大小
dtype: state使用的类型.
返回一个填零的状态state
要是state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size x state_size] filled with zeros.
要是state_size 是一个tuple,那么返回的值是同样结构的tuple,其中每个元素都是一个2-D的tensor,他们的形状为[batch_size x s] 其中的s是state_size中各自的s .
__call__(inputs,state,scope=None)
作用:在给定的状态(state)和输入上运行RNN cell,这个方法是在一个”时间点”上面运行一次RNN的方法,是比较偏底层的一个函数,对于理解RNN的运行过程非常有帮助,后面将会讲到的tf.nn.dynamic_rnn() 等接口就是更加高层的接口,直接把所有的运行过程都得到了.
参数:
inputs: 2-D tensor,形状为[batch_size x input_size].在实际使用的时候,你会先把数据整理成为[batch_size,time_steps_size,input_size] 的形状,所以假如当前时刻是i,那么使用的时候,直接使用[:,i,:] 作为数据传入就行了.
state: 要是self.state_size 是一个整形 ,那么这个参数应该当是一个形状为 [batch_size x self.state_size] 的tensor,否则,要是self.state_size 是一个整数的元组,那么这个应当是一个形状为[batch_size x s] 的元组,其中s在self.state_size 中.
scope: 这个创建的子图的变量域(VariableScope),默认是类名.
返回值:
A pair containing:
Output: 一个形状为[batch_size x self.output_size] 的2-D tensor
New state:新的state,和之前的state结构一样.
然后还有一些其他的方法和属性,这里不讲了,在真的有需求的时候可以参照官方文档.
Ⅱ.tensorflow.contrib.rnn.MultiRNNCell
前面的类可以定义一个一层的LSTM,那么怎么定义多层的LSTM类呢? 这个类主要的作用是把前面的单层LSTM结合为多层的LSTM.
首先来看他的构造函数是怎样的.
__init__( cells,state_is_tuple=True)
参数:
cells:一个列表,里面是你想叠起来的RNNCells,
state_is_tuple:要是是True 的话, 接受和返回的state都是n-tuple,其中n = len(cells).
然后还有一些其他的函数和属性都和前面的BasicLSTMCell差不多.但是这里还是要说一下在这里,他的两个属性output_size和state_size 会变成怎么样的形式.下面举一个例子:
import tensorflow as tf
import numpy as np
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多层lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
结果:
这里首先建立了3层LSTM,然后使用MultiRNNCell 的构造函数把他们堆叠在一起,所以结果中的属性output_size为512,就是最后那层的units数量了.这个比较简单,重要的是state_size的样式,可以看到是一个tuple里面,然后又有3个LSTMStateTuple对象,其实这里也可以看出来了,就是每层的LSTMStateTuple属性放到了一个大的tuple里面.这里还是非常重要的.之后各种需要state 的地方可能涉及到state的转换.要是这里不清楚,到时候就不好转换了.
Ⅲ.tf.nn.dynamic_rnn()
这个函数的作用就是通过指定的RNN Cell来展开计算神经网络.
他的构造函数如下:
dynamic_rnn(cell,inputs,sequence_length=None,initial_state=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)
对于dynamic_rnn来说每个batch的序列长度都是一样的(不足的话自己要去padding),这个函数会根据 sequence_length 中止计算.同时dynamic_rnn是动态生成graph的
参数:
cell: RNNCell的对象.
inputs: RNN的输入,当time_major == False (default) 的时候,必须是形状为 [batch_size, max_time, ...] 的tensor, 要是 time_major == True 的话, 必须是形状为 [max_time, batch_size, ...] 的tensor. 前面两个维度应该在所有的输入里面都应该匹配.
sequence_length: 可选,一个int32/int64类型的vector,他的尺寸是[batch_size]. 对于最后结果的正确性,这个还是非常有用的.因为给他具体每一个序列的长度,能够精确的得到结果,排除了之前为了把所有的序列弄成一样的长度padding造成的不准确.
initial_state: 可选,RNN的初始状态. 要是cell.state_size 是一个整形,那么这个参数必须是一个形状为 [batch_size, cell.state_size] 的tensor. 要是cell.state_size 是一个tuple, 那么这个参数必须是一个tuple,其中元素为形状为[batch_size, s] 的tensor,s为cell.state_size 中的各个相应size.
dtype: 可选,表示输入的数据类型和期望输出的数据类型.当初始状态没有被提供或者RNN的状态由多种形式构成的时候需要显示指定.
parallel_iterations: 默认是32,表示的是并行运行的迭代数量(Default: 32). 有一些没有任何时间依赖的操作能够并行计算,实际上就是空间换时间和时间换空间的折中,当value远大于1的时候,会使用的更多的内存但是能够减少时间,当这个value值很小的时候,会使用小一点的内存,但是会花更多的时间.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: 规定了输入和输出tensor的数据组织格式,如果 true, tensor的形状需要是[max_time, batch_size, depth]. 若是false, 那么tensor的形状为[batch_size, max_time, depth]. 要是使用time_major = True 的话,会更加高效率一点,因为避免了在RNN计算的开始和结束的时候对于矩阵的转置 ,然而,大多数的tensorflow数据格式都是采用的以batch为主的格式,所以这里也默认采用以batch为主的格式.
scope: 子图的scope名称,默认是”rnn”
返回:
重点内容返回(outputs, state)形式的结果对,其中
outputs: 表示RNN的输出tensor,要是time_major == False (default),那么这个tensor的形状为[batch_size, max_time, cell.output_size],要是time_major == True, 这个Tensor的形状为[max_time, batch_size, cell.output_size]. 这里有个地方需要注意的就是,要是cell.output_size 是一个tuple的话,那么outputs 也会是一个和 cell.output_size 相同构造的tuple,其中包含和cell.output_size 中对应形状相同的tensors.
state: 最终state,要是cell.state_size 是一个int类型,那么这个state是一个形状为[batch_size, cell.state_size].的tensor,要是为一个tuple,那么这个state也会是一个tuple并且有相应的形状. If cells are LSTMCells state will be a tuple containing a LSTMStateTuple for each cell.
例1:单层lstm
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
#lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多层lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell_1.output_size)
print("state_size:",lstm_cell_1.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
cell=lstm_cell_1,
inputs=inputs,
dtype=tf.float32
)
print("output.shape:",output.shape)
print("len of state tuple",len(state))
print("state.h.shape:",state.h.shape)
print("state.c.shape:",state.c.shape)
结果:
例二:多层LSTM
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多层lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(
cell=lstm_cell,
inputs=inputs,
dtype=tf.float32
)
print("output.shape:",output.shape)
print("len of state tuple",len(state))
结果:
多层的state就是一个tuple,而tuple的每一个元素都是每一层的state.
Ⅳ tf.nn.bidirectional_dynamic_rnn
bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs,sequence_length=None,initial_state_fw=None,initial_state_bw=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)
参数:
cell_fw:RNNCell的一个实例,用于正向。
cell_bw:RNNCell的一个实例,用于反向。
inputs:RNN输入。如果time_major == False(默认),则它必须是形状为 [batch_size, max_time, ...]的tensor,或者这些元素的嵌套元组。如果time_major == True,则它必须是形状为[max_time, batch_size, ...]的tensor ,或者是这些元素的嵌套元组。
sequence_length:(可选)一个int32 / int64向量,大小[batch_size],包含批处理中每个序列的实际长度。如果未提供,则所有批次条目均假定为完整序列; 并且时间反转从时间0到max_time每个序列被应用。
initial_state_fw:(可选)前向RNN的初始状态。这必须是适当类型和形状的张量[batch_size, cell_fw.state_size]。如果cell_fw.state_size是一个元组,这应该是一个具有形状的张量的元组[batch_size, s] for s in cell_fw.state_size。
initial_state_bw:(可选)与之相同initial_state_fw,但使用相应的属性cell_bw。
dtype:(可选)初始状态和预期输出的数据类型。如果未提供initial_states或者RNN状态具有异构dtype,则为必需。
parallel_iterations:(默认:32)。并行运行的迭代次数。那些没有任何时间依赖性并且可以并行运行的操作将会是。此参数用于空间换算时间。值>> 1使用更多的内存,但花费更少的时间,而更小的值使用更少的内存,但计算需要更长的时间。
swap_memory:透明地交换前向推理中产生的张量,但是从GPU到后端支持所需的张量。这允许训练通常不适合单个GPU的RNN,而且性能损失非常小(或不)。
time_major:inputs和outputs张量的形状格式。如果为True的话,这些都Tensors的形状为[max_time, batch_size, depth]。如果为False的话,这些Tensors的形状是[batch_size, max_time, depth]。
scope:创建子图的VariableScope; 默认为“bidirectional_rnn”
返回:
元组(outputs,output_states) 其中
outputs:包含正向和反向rnn输出的元组(output_fw,output_bw)。
如果time_major == False(默认值),则output_fw将是一个形状为[batch_size, max_time, cell_fw.output_size] 的tensor,output_bw将是一个形状为[batch_size, max_time, cell_bw.output_size]的tensor.
如果time_major == True,则output_fw将为一个形状为[max_time, batch_size, cell_fw.output_size] 的tensor, output_bw将是一个形状为[max_time, batch_size, cell_bw.output_size] 的tensor.
output_state,也是一个tuple,内容是(output_state_fw, output_state_bw) 也就是说,前向的state和后向的state放到了一个元组里面.
这里举一个例子:
import tensorflow as tf
import numpy as np
inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_fw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_bw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#多层lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])
print("output_fw_size:",lstm_cell_fw.output_size)
print("state_fw_size:",lstm_cell_fw.state_size)
print("output_bw_size:",lstm_cell_bw.output_size)
print("state_bw_size:",lstm_cell_bw.state_size)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_cell_fw,
cell_bw=lstm_cell_bw,
inputs=inputs,
dtype=tf.float32
)
output_fw=output[0]
output_bw=output[1]
state_fw=state[0]
state_bw=state[1]
print("output_fw.shape:",output_fw.shape)
print("output_bw.shape:",output_bw.shape)
print("len of state tuple",len(state_fw))
print("state_fw:",state_fw)
print("state_bw:",state_bw)
#print("state.h.shape:",state.h.shape)
#print("state.c.shape:",state.c.shape)
#state_concat=tf.concat(values=[state_fw,state_fw],axis=1)
#print(state_concat)
state_h_concat=tf.concat(values=[state_fw.h,state_bw.h],axis=1)
print("state_fw_h_concat.shape",state_h_concat.shape)
state_c_concat=tf.concat(values=[state_fw.c,state_bw.c],axis=1)
print("state_fw_h_concat.shape",state_c_concat.shape)
state_concat=tf.contrib.rnn.LSTMStateTuple(c=state_c_concat,h=state_h_concat)
print(state_concat)
结果:
output_fw_size: 128
state_fw_size: LSTMStateTuple(c=128, h=128)
output_bw_size: 128
state_bw_size: LSTMStateTuple(c=128, h=128)
output_fw.shape: (32, 40, 128)
output_bw.shape: (32, 40, 128)
len of state tuple 2
state_fw: LSTMStateTuple(c=
state_bw: LSTMStateTuple(c=
state_fw_h_concat.shape (32, 256)
state_fw_h_concat.shape (32, 256)
LSTMStateTuple(c=
在这个例子里面,还用到了一个拼接state的例子,可以作为自己初始化state或者拼接state的例子.
Ⅴ.tf.nn.embedding_lookup()
embedding_lookup(params,ids,partition_strategy=’mod’,name=None,max_norm=None)
这个函数主要是在任务内进行embeddings的时候使用的一个函数,通过这个函数来把一个字或者词映射到对应维度的词向量上面去. 要是设置为可训练的Variable的话,可以在进行任务的时候同时对于词向量进行训练.
params: 表示完整的嵌入张量,或者除了第一维度之外具有相同形状的P个张量的列表,表示经分割的嵌入张量。
ids: 一个类型为int32或int64的Tensor,包含要在params中查找的id
partition_strategy: 指定分区策略的字符串,如果len(params)> 1,则相关。当前支持“div”和“mod”。 默认为“mod”
name: 操作名称(可选)
max_norm: 如果不是None,嵌入值将被l2归一化为max_norm的值
Ⅵ. tf.sequence_mask()
sequence_mask(lengths,maxlen=None,dtype=tf.bool,name=None)
作用:返回一个mask tensor表示每个序列的前N个位置.
If lengths has shape [d_1, d_2, …, d_n] the resulting tensor mask has dtype dtype and shape [d_1, d_2, …, d_n, maxlen], with
mask[i_1, i_2, …, i_n, j] = (j < lengths[i_1, i_2, …, i_n])
参数:
lengths: 整形的tensor, 他的所有的值都要小于或等于maxlen.
maxlen: scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.
dtype: output type of the resulting tensor.
name: op名称
返回:
A mask tensor of shape lengths.shape + (maxlen,), cast to specified dtype.
例子:
tf.sequence_mask([1, 3, 2], 5) # [[True, False, False, False, False],
# [True, True, True, False, False],
# [True, True, False, False, False]]
tf.sequence_mask([[1, 3],[2,0]]) # [[[True, False, False],
# [True, True, True]],
# [[True, True, False],
# [False, False, False]]]
Ⅶ.tf.boolean_mask()
boolean_mask(tensor,mask,name=’boolean_mask’)
把boolean类型的mask值应用到tensor上面,可以和numpy里面的tensor[mask] 类比.
参数:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N同时K必须是已知的
name: 可选,操作名
返回:
一个(N-K+1)维tensor.相应的值对应mask tensor中的True.
上面这些API是对于要使用的东西就基本的了解.接下来就开始讲例子了.
二.实例
如开头所说,接下来讲的几个例子都是一些玩具示例,但是对于新手是绝对友好的,这些简单例子涵盖了进行LSTM编程需要的一些基本思想和手段,通过消化这些简单例子可以快速上手,构建出后面适合自己的更加复杂的网络结构.
接下来从最基本的例子一个一个来讲,每个例子都可以直接作为脚本直接跑起来.
Ⅰ.预测sin函数
代码:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=150
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
#initialize to zero
init_state=lstm_cell.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=lstm_cell,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
结果:
图中红色粗线是真实值,可以看到,在迭代150个epoch之后,我们的结果越来越接近真实值了.
Ⅱ.预测sin函数多层版
代码:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#initialize to zero
init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
结果:
在这里,我们发现仅仅是50个epoch之后,得到的效果就要明显好于前面第一个的结果.
预测sin函数手写版
import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100
#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):
X=[]
y=[]
for i in range(len(seq)-TIME_STEPS):
X.append([seq[i:i+TIME_STEPS]])
y.append([seq[i+TIME_STEPS]])
return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)
#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)
seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))
#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()
X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)
#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))
#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#自己初始化state
#第一层state
lstm_layer1_c=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
lstm_layer1_h=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))
layer1_state=rnn.LSTMStateTuple(c=lstm_layer1_c,h=lstm_layer1_h)
#第二层state
lstm_layer2_c = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
lstm_layer2_h = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))
layer2_state = rnn.LSTMStateTuple(c=lstm_layer2_c, h=lstm_layer2_h)
init_state=(layer1_state,layer2_state)
print(init_state)
#自己展开RNN计算
outputs = list() #用来接收存储每步的结果
state = init_state
with tf.variable_scope('RNN'):
for timestep in range(TIME_STEPS):
if timestep > 0:
tf.get_variable_scope().reuse_variables()
# 这里的state保存了每一层 LSTM 的状态
(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
outputs.append(cell_output)
h = outputs[-1]
#---------------------------------define loss and optimizer----------------------------------#
mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)
#print(loss.shape)
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
results = np.zeros(shape=(TEST_EXAMPLES, 1))
train_losses=[]
test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss=sess.run(
fetches=(optimizer,mse),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
print("average training loss:", sum(train_losses) / len(train_losses))
for j in range(TEST_EXAMPLES//BATCH_SIZE):
result,test_loss=sess.run(
fetches=(h,mse),
feed_dict={
X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=result
test_losses.append(test_loss)
print("average test loss:", sum(test_losses) / len(test_losses))
plt.plot(range(1000),results[:1000,0])
plt.show()
这个例子和上面的Ⅱ是一模一样的,唯一的区别就是使用了自定义的初始状态,从这个例子可以看一下怎么自定义一个状态. 然后就是之前自动展开的,
这里变成了手动展开计算,这里的计算过程
#自己展开RNN计算
outputs = list() #用来接收存储每步的结果
state = init_state
with tf.variable_scope('RNN'):
for timestep in range(TIME_STEPS):
if timestep > 0:
tf.get_variable_scope().reuse_variables()
# 这里的state保存了每一层 LSTM 的状态
(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)
outputs.append(cell_output)
h = outputs[-1]
很有启发意义,有时候需要自己掌控每一步的结果的时候,可以使用这个来展开计算.
Ⅲ.MNIST图像分类
LSTM也可以做图像分类,在这里,思想还是非常简单的,MNIST的图像可以表示为28x28 的形式,
代码:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])
#initialize to zero
init_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)
#dynamic rnn
outputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)
#print(outputs.shape)
h=outputs[:,-1,:]
#print(h.shape)
#--------------------------------------------------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
结果:
Ⅳ.双向LSTM做图像分类
代码:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_forward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_backward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
outputs,states=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_forward,
cell_bw=lstm_backward,
inputs=X_p,
dtype=tf.float32
)
outputs_fw=outputs[0]
outputs_bw = outputs[1]
h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
# print(h.shape)
#---------------------------------------;-----------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
这个例子的结果为:
会发现在后面不管怎么学都学不到东西了.这是因为上面我们只使用了一层双向网络.接下来仅仅需要小小的改动,把上面这个网络改为深层的双向LSTM.
Ⅴ.深层双向LSTM做图像分类
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
TIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50
TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000
#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")
# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")
# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values
#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)
#-----------------------------------------------------------------------------------------------------#
#--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():
#------------------------------------construct LSTM------------------------------------------#
#place hoder
X_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")
y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")
#lstm instance
lstm_forward_1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_forward_2=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_forward=rnn.MultiRNNCell(cells=[lstm_forward_1,lstm_forward_2])
lstm_backward_1 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)
lstm_backward_2 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)
lstm_backward=rnn.MultiRNNCell(cells=[lstm_backward_1,lstm_backward_2])
outputs,states=tf.nn.bidirectional_dynamic_rnn(
cell_fw=lstm_forward,
cell_bw=lstm_backward,
inputs=X_p,
dtype=tf.float32
)
outputs_fw=outputs[0]
outputs_bw = outputs[1]
h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]
# print(h.shape)
#---------------------------------------;-----------------------------------------------------#
#---------------------------------define loss and optimizer----------------------------------#
cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)
#print(loss.shape)
correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)
init=tf.global_variables_initializer()
#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:
sess.run(init)
for epoch in range(1,EPOCH+1):
#results = np.zeros(shape=(TEST_EXAMPLES, 10))
train_losses=[]
accus=[]
#test_losses=[]
print("epoch:",epoch)
for j in range(TRAIN_EXAMPLES//BATCH_SIZE):
_,train_loss,accu=sess.run(
fetches=(optimizer,cross_loss,accuracy),
feed_dict={
X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],
y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]
}
)
train_losses.append(train_loss)
accus.append(accu)
print("average training loss:", sum(train_losses) / len(train_losses))
print("accuracy:",sum(accus)/len(accus))
相比起上面单层的bilstm,这里才35轮就已经到了95%了,说明在信息抽象的能力上面,多层的架构要好于单层的架构.
---------------------
作者:谢小小XH
来源:CSDN
原文:https://blog.csdn.net/xierhacker/article/details/78772560
版权声明:本文为博主原创文章,转载请附上博文链接!