人工智能与算法学习

TensorFlow 2.0 模型：循环神经网络

文 / 李锡涵，Google Developers Expert

本文节选自《简单粗暴 TensorFlow 2.0》

上一篇文章中，我们介绍了在图像领域中广泛使用的卷积神经网络及其在 TensorFlow 2.0 中的实现。本文继续介绍另一种广泛流行的神经网络结构，即循环神经网络，内容如下：

以文本自动生成任务为例，介绍循环神经网络在 TensorFlow 2.0 中的实现方式；
为深度学习的入门者简介循环神经网络的原理。

使用 Keras 建立基于循环神经网络的文本生成模型

循环神经网络（Recurrent Neural Network, RNN）是一种适宜于处理序列数据的神经网络，被广泛用于语言模型、文本生成、机器翻译等。

基础知识和原理

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs (http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)

台湾大学李宏毅教授的《机器学习》课程的 Recurrent Neural Network (part 1) (https://www.bilibili.com/video/av10590361/?p=37) Recurrent Neural Network (part 2) (https://www.bilibili.com/video/av10590361/?p=37) 两部分。

LSTM 原理：Understanding LSTM Networks (https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

Graves, Alex. “Generating Sequences With Recurrent Neural Networks.” ArXiv:1308.0850 [Cs], August 4, 2013. (http://arxiv.org/abs/1308.0850).

这里，我们使用 RNN 来进行尼采风格文本的自动生成。[5]

这个任务的本质其实预测一段英文文本的接续字母的概率分布。比如，我们有以下句子:

I am a studen

这个句子（序列）一共有 13 个字符（包含空格）。当我们阅读到这个由 13 个字符组成的序列后，根据我们的经验，我们可以预测出下一个字符很大概率是 “t”。

我们希望建立这样一个模型，逐个输入一段长为 seq_length 的序列，输出这些序列接续的下一个字符的概率分布。我们从下一个字符的概率分布中采样作为预测值，然后滚雪球式地生成下两个字符，下三个字符等等，即可完成文本的生成任务。

首先，还是实现一个简单的 DataLoader 类来读取文本，并以字符为单位进行编码。设字符种类数为 num_chars ，则每种字符赋予一个 0 到 num_chars - 1 之间的唯一整数编号 i 。

 1class DataLoader():
 2    def __init__(self):
 3        path = tf.keras.utils.get_file('nietzsche.txt',
 4            origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
 5        with open(path, encoding='utf-8') as f:
 6            self.raw_text = f.read().lower()
 7        self.chars = sorted(list(set(self.raw_text)))
 8        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
 9        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
10        self.text = [self.char_indices[c] for c in self.raw_text]
11
12    def get_batch(self, seq_length, batch_size):
13        seq = []
14        next_char = []
15        for i in range(batch_size):
16            index = np.random.randint(0, len(self.text) - seq_length)
17            seq.append(self.text[index:index+seq_length])
18            next_char.append(self.text[index+seq_length])
19        return np.array(seq), np.array(next_char)       # [batch_size, seq_length], [num_batch]

接下来进行模型的实现。在 __init__ 方法中我们实例化一个常用的 LSTMCell 单元，以及一个线性变换用的全连接层，我们首先对序列进行 “One Hot” 操作，即将序列中的每个字符的编码 i 均变换为一个 num_char 维向量，其第 i 位为 1，其余均为 0。变换后的序列张量形状为 [seq_length, num_chars] 。然后，我们初始化 RNN 单元的状态，存入变量 state 中。接下来，将序列从头到尾依次送入 RNN 单元，即在 t 时刻，将上一个时刻 t-1 的 RNN 单元状态 state 和序列的第 t 个元素 inputs[t, :] 送入 RNN 单元，得到当前时刻的输出 output 和 RNN 单元状态。取 RNN 单元最后一次的输出，通过全连接层变换到 num_chars 维，即作为模型的输出。

output, state = self.cell(inputs[:, t, :], state) 图示

RNN 流程图示

具体实现如下：

 1class RNN(tf.keras.Model):
 2    def __init__(self, num_chars, batch_size, seq_length):
 3        super().__init__()
 4        self.num_chars = num_chars
 5        self.seq_length = seq_length
 6        self.batch_size = batch_size
 7        self.cell = tf.keras.layers.LSTMCell(units=256)
 8        self.dense = tf.keras.layers.Dense(units=self.num_chars)
 9
10    def call(self, inputs, from_logits=False):
11        inputs = tf.one_hot(inputs, depth=self.num_chars)       # [batch_size, seq_length, num_chars]
12        state = self.cell.get_initial_state(batch_size=self.batch_size, dtype=tf.float32)
13        for t in range(self.seq_length):
14            output, state = self.cell(inputs[:, t, :], state)
15        logits = self.dense(output)
16        if from_logits:
17            return logits
18        else:
19            return tf.nn.softmax(logits)

定义一些模型超参数：

1    num_batches = 1000
2    seq_length = 40
3    batch_size = 50
4    learning_rate = 1e-3

训练过程与之前的连载文章基本一致，在此复述：

从 DataLoader 中随机取一批训练数据；
将这批数据送入模型，计算出模型的预测值；
将模型预测值与真实值进行比较，计算损失函数（loss）；
计算损失函数关于模型变量的导数；
使用优化器更新模型参数以最小化损失函数。

 1    data_loader = DataLoader()
 2    model = RNN(num_chars=len(data_loader.chars), batch_size=batch_size, seq_length=seq_length)
 3    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
 4    for batch_index in range(num_batches):
 5        X, y = data_loader.get_batch(seq_length, batch_size)
 6        with tf.GradientTape() as tape:
 7            y_pred = model(X)
 8            loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)
 9            loss = tf.reduce_mean(loss)
10            print("batch %d: loss %f" % (batch_index, loss.numpy()))
11        grads = tape.gradient(loss, model.variables)
12        optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))

关于文本生成的过程有一点需要特别注意。之前，我们一直使用 tf.argmax() 函数，将对应概率最大的值作为预测值。然而对于文本生成而言，这样的预测方式过于绝对，会使得生成的文本失去丰富性。

于是，我们使用 np.random.choice() 函数按照生成的概率分布取样。这样，即使是对应概率较小的字符，也有机会被取样到。同时，我们加入一个 temperature 参数控制分布的形状，参数值越大则分布越平缓（最大值和最小值的差值越小），生成文本的丰富度越高；参数值越小则分布越陡峭，生成文本的丰富度越低。

1    def predict(self, inputs, temperature=1.):
2        batch_size, _ = tf.shape(inputs)
3        logits = self(inputs, from_logits=True)
4        prob = tf.nn.softmax(logits / temperature).numpy()
5        return np.array([np.random.choice(self.num_chars, p=prob[i, :])
6                         for i in range(batch_size.numpy())])

通过这种方式进行 “滚雪球” 式的连续预测，即可得到生成文本。

1    X_, _ = data_loader.get_batch(seq_length, 1)
2    for diversity in [0.2, 0.5, 1.0, 1.2]:
3        X = X_
4        print("diversity %f:" % diversity)
5        for t in range(400):
6            y_pred = model.predict(X, diversity)
7            print(data_loader.indices_char[y_pred[0]], end='', flush=True)
8            X = np.concatenate([X[:, 1:], np.expand_dims(y_pred, axis=1)], axis=-1)
9        print("
")

生成的文本如下:

 1diversity 0.200000:
 2conserted and conseive to the conterned to it is a self--and seast and the selfes as a seast the expecience and and and the self--and the sered is a the enderself and the sersed and as a the concertion of the series of the self in the self--and the serse and and the seried enes and seast and the sense and the eadure to the self and the present and as a to the self--and the seligious and the enders
 3
 4diversity 0.500000:
 5can is reast to as a seligut and the complesed
 6has fool which the self as it is a the beasing and us immery and seese for entoured underself of the seless and the sired a mears and everyther to out every sone thes and reapres and seralise as a streed liees of the serse to pease the cersess of the selung the elie one of the were as we and man one were perser has persines and conceity of all self-el
 7
 8diversity 1.000000:
 9entoles by
10their lisevers de weltaale, arh pesylmered, and so jejurted count have foursies as is
11descinty iamo; to semplization refold, we dancey or theicks-welf--atolitious on his
12such which
13here
14oth idey of pire master, ie gerw their endwit in ids, is an trees constenved mase commars is leed mad decemshime to the mor the elige. the fedies (byun their ope wopperfitious--antile and the it as the f
15
16diversity 1.200000:
17cain, elvotidue, madehoublesily
18inselfy!--ie the rads incults of to prusely le]enfes patuateded:.--a coud--theiritibaior "nrallysengleswout peessparify oonsgoscess teemind thenry ansken suprerial mus, cigitioum: 4reas. whouph: who
19eved
20arn inneves to sya" natorne. hag open reals whicame oderedte,[fingo is
21zisternethta simalfule dereeg hesls lang-lyes thas quiin turjentimy; periaspedey tomm--whach

[5] 此处的任务及实现参考了：

https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

循环神经网络的工作过程

循环神经网络是一个处理时间序列数据的神经网络结构，也就是说，我们需要在脑海里有一根时间轴，循环神经网络具有初始状态，在每个时间点迭代对当前时间的输入进行处理，修改自身的状态，并进行输出。

循环神经网络的核心是状态，是一个特定维数的向量，类似于神经网络的 “记忆”。在的初始时刻，被赋予一个初始值（常用的为全 0 向量）。然后，我们用类似于递归的方法来描述循环神经网络的工作过程。即在时刻，我们假设已经求出，关注如何在此基础上求出：

对输入向量通过矩阵进行线性变换，与状态 s 具有相同的维度；

对通过矩阵进行线性变换，与状态 s 具有相同的维度；

将上述得到的两个向量相加并通过激活函数，作为当前状态的值，即。也就是说，当前状态的值是上一个状态的值和当前输入进行某种信息整合而产生的；

对当前状态通过矩阵进行线性变换，得到当前时刻的输出。

RNN 工作过程图示（来自 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/）

我们假设输入向量、状态和输出向量的维度分别为、、，则、、。

上述为最基础的 RNN 原理介绍。在实际使用时往往使用一些常见的改进型，如 LSTM（长短期记忆神经网络，解决了长序列的梯度消失问题，适用于较长的序列）、GRU 等。

福利 | 问答环节

我们知道在入门一项新的技术时有许多挑战与困难需要克服。如果您有关于 TensorFlow 的相关问题，可在本文后留言，我们的工程师和 GDE 将挑选其中具有代表性的问题在下一期进行回答~

在上一篇文章《TensorFlow 2.0 模型：卷积神经网络》中，我们对于部分具有代表性的问题回答如下：

Q1：能不能讲一下有关模型的保存类型？以前 TensorFlow1.x 保存成 ckpt 再转换成 pb，TensorFlow 2 会怎么保存使用呢?
A：在 TensorFlow 2.0 中，我们主要使用 tf.train.Checkpoint 保存模型参数。使用 SavedModel 导出完整模型。可参考：

https://tf.wiki/zh/basic/tools.html#tf-train-checkpoint
https://tf.wiki/zh/deployment/export.html#savedmodel

Q2：文章的样例代码可否开源？

A：本手册的原文在 GitHub 开源(https://github.com/snowkylin/tensorflow-handbook)，样例代码可访问 (https://github.com/snowkylin/tensorflow-handbook/tree/master/source/_static/code/zh) 获得。

Q3：有讲分布式的吗？感觉tf的分布式好难用啊
A：TensorFlow 2.0 提供了非常友好的分布式API tf.distributed，只需实例化一个MirroredStrategy策略:

1strategy = tf.distribute.MirroredStrategy()

并将模型构建的代码放入 strategy.scope() 的上下文环境中:

1with strategy.scope():
2# 模型构建代码

多机训练的话，将MirroredStrategy 改成 MultiWorkerMirroredStrategy 就好了。
是的，就这么简单！
我们会在之后的连载中专题介绍 TensorFlow 2.0 的分布式计算。更详细的使用方法及示例可参考：https://tf.wiki/zh/appendix/distributed.html

Q4：入门 TensorFlow，请问有没有较好的学习路线。
A：本系列教程《简单粗暴 TensorFlow 2.0》(https://tf.wiki) 即希望为 TensorFlow 初学者提供易于上手的系统指导。另外，TensorFlow的官方教程也经过了大量的更新以改善易读性，可参考：https://tensorflow.google.cn/overview

TensorFlow 2.0 模型：循环神经网络

你可能感兴趣的:(TensorFlow 2.0 模型：循环神经网络)