欢迎来到恐龙岛! 6500万年前,恐龙就存在了。在这项任务中他们又回来了。 现在你负责一项特殊任务。领先的生物学研究人员正在创造新的恐龙种类并将它们带到地球上,您的工作就是为这些恐龙命名。如果一只恐龙不喜欢它的名字,它可能会被人误认,所以请明智地选择!
幸运的是,你已经学会了一些深度学习,你会用它来拯救这一天。 你的助手收集了他们可以找到的所有恐龙名称的列表,并将它们编译成这个数据集。要创建新的恐龙名称,您将构建一个字母级级语言模型以生成新名称。 您的算法将学习不同的名称模式,并随机生成新名称。希望这个算法能够让你和你的团队免于恐龙的愤怒!
通过完成这项任务,你将学到:
import numpy as np
from utils import *
import random
import numpy as np
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def smooth(loss, cur_loss):
return loss * 0.999 + cur_loss * 0.001
def print_sample(sample_ix, ix_to_char):
txt = ''.join(ix_to_char[ix] for ix in sample_ix)
txt = txt[0].upper() + txt[1:] # capitalize first character
print ('%s' % (txt, ), end='')
def get_initial_loss(vocab_size, seq_length):
return -np.log(1.0/vocab_size)*seq_length
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def initialize_parameters(n_a, n_x, n_y):
"""
Initialize parameters with small random values
Returns:
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
b -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
"""
np.random.seed(1)
Wax = np.random.randn(n_a, n_x)*0.01 # input to hidden
Waa = np.random.randn(n_a, n_a)*0.01 # hidden to hidden
Wya = np.random.randn(n_y, n_a)*0.01 # hidden to output
b = np.zeros((n_a, 1)) # hidden bias
by = np.zeros((n_y, 1)) # output bias
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b,"by": by}
return parameters
def rnn_step_forward(parameters, a_prev, x):
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
a_next = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b) # hidden state
p_t = softmax(np.dot(Wya, a_next) + by) # unnormalized log probabilities for next chars # probabilities for next chars
return a_next, p_t
def rnn_step_backward(dy, gradients, parameters, x, a, a_prev):
gradients['dWya'] += np.dot(dy, a.T)
gradients['dby'] += dy
da = np.dot(parameters['Wya'].T, dy) + gradients['da_next'] # backprop into h
daraw = (1 - a * a) * da # backprop through tanh nonlinearity
gradients['db'] += daraw
gradients['dWax'] += np.dot(daraw, x.T)
gradients['dWaa'] += np.dot(daraw, a_prev.T)
gradients['da_next'] = np.dot(parameters['Waa'].T, daraw)
return gradients
def update_parameters(parameters, gradients, lr):
parameters['Wax'] += -lr * gradients['dWax']
parameters['Waa'] += -lr * gradients['dWaa']
parameters['Wya'] += -lr * gradients['dWya']
parameters['b'] += -lr * gradients['db']
parameters['by'] += -lr * gradients['dby']
return parameters
def rnn_forward(X, Y, a0, parameters, vocab_size = 27):
# Initialize x, a and y_hat as empty dictionaries
x, a, y_hat = {}, {}, {}
a[-1] = np.copy(a0)
# initialize your loss to 0
loss = 0
for t in range(len(X)):
# Set x[t] to be the one-hot vector representation of the t'th character in X.
# if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector.
x[t] = np.zeros((vocab_size,1))
if (X[t] != None):
x[t][X[t]] = 1
# Run one step forward of the RNN
a[t], y_hat[t] = rnn_step_forward(parameters, a[t-1], x[t])
# Update the loss by substracting the cross-entropy term of this time-step from it.
loss -= np.log(y_hat[t][Y[t],0])
cache = (y_hat, a, x)
return loss, cache
def rnn_backward(X, Y, parameters, cache):
# Initialize gradients as an empty dictionary
gradients = {}
# Retrieve from cache and parameters
(y_hat, a, x) = cache
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
# each one should be initialized to zeros of the same dimension as its corresponding parameter
gradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)
gradients['db'], gradients['dby'] = np.zeros_like(b), np.zeros_like(by)
gradients['da_next'] = np.zeros_like(a[0])
### START CODE HERE ###
# Backpropagate through time
for t in reversed(range(len(X))):
dy = np.copy(y_hat[t])
dy[Y[t]] -= 1
gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t-1])
### END CODE HERE ###
return gradients, a
运行程序读取恐龙名称的数据集,创建唯一字符的列表(例如a-z),并计算数据集和词汇的大小。
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
# There are 19909 total characters and 27 unique characters in your data.
这些字符是a-z(26个字符)加上”\n”(换行符),换行符在这里 表示恐龙名称的结尾,而不是句子的结尾。 在下面的程序中,我们创建了一个Python字典(即哈希表),将每个字符映射到0-26的索引。 我们还创建了第二个python字典,将每个索引映射回相应的字符。 这将帮助你找出哪些索引对应于softmax图层的概率分布输出中的什么字符。 下面,char_to_ix和ix_to_char是python字典。
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)
# {0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}
模型结构
在每个时间步,RNN 视图通过给出的前面的字母序列来预测下一个字母。对于每个时间步t, 有 y = x
在这个部分,你讲建立模型的两个重要模块
之后你将利用这些模块构建模型。
在本节中,您将实现clip函数一遍在优化循环中调用。 回想一下,整体循环结构通常由正向传播,损失函数计算,反向传播和参数更新组成。 在更新参数之前,在需要时执行梯度剪裁,以确保梯度不会“爆炸”,即不出现过大的值。
在下面的练习中,您将实现一个clip函数,接收一个梯度词典并在需要时返回剪裁版本的梯度词典。 有多种不同的方法来对梯度进行剪切。 我们将使用一个简单的基于元素的剪裁方式,其中梯度向量的每个元素都被裁剪到范围[-N,N]之间。 (范围外被剪成边界值)
实现下面的函数返回剪裁后的梯度字典。 这个函数将设置一个最大剪裁阈值并返回剪裁版本的梯度字典。
### GRADED FUNCTION: clip
def clip(gradients, maxValue):
'''
Clips the gradients' values between minimum and maximum.
Arguments:
gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
Returns:
gradients -- a dictionary with the clipped gradients.
'''
dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
### START CODE HERE ###
# clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
for gradient in [dWax, dWaa, dWya, db, dby]:
np.clip(gradient, -maxValue, maxValue, out=gradient)
### END CODE HERE ###
gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
return gradients
######################################################
np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
# gradients["dWaa"][1][2] = 10.0
# gradients["dWax"][3][1] = -10.0
# gradients["dWya"][1][2] = 0.29713815361
# gradients["db"][4] = [ 10.]
# gradients["dby"][1] = [ 8.45833407]
期待的输出
key | value |
---|---|
gradients[“dWaa”][1][2] | 10.0 |
gradients[“dWax”][3][1] | -10.0 |
gradients[“dWya”][1][2] | 0.29713815361 |
gradients[“db”][4] | [ 10.] |
gradients[“dby”][1] | [ 8.45833407] |
现在假设你的模型已经训练好了,你需要以此生成新的字母,过程如下:
1、 第一个“虚拟”输入 x⟨1⟩=0⃗ x ⟨ 1 ⟩ = 0 ⃗ (零向量)。 这是我们生成任何字符之前的默认输入。 我们还设置了一个 a⟨0⟩=0⃗ a ⟨ 0 ⟩ = 0 ⃗
2、 执行一步前向传播得到 a<1> a < 1 > 和 ŷ <1>,如下是公式: y ^ < 1 > , 如 下 是 公 式 : a⟨t+1⟩z⟨t+1⟩ŷ ⟨t+1⟩=tanh(Waxx⟨t⟩+Waaa⟨t⟩+b)=Wyaa⟨t+1⟩+by=softmax(z⟨t+1⟩) a ⟨ t + 1 ⟩ = tanh ( W a x x ⟨ t ⟩ + W a a a ⟨ t ⟩ + b ) z ⟨ t + 1 ⟩ = W y a a ⟨ t + 1 ⟩ + b y y ^ ⟨ t + 1 ⟩ = s o f t m a x ( z ⟨ t + 1 ⟩ ) $
注意, ŷ ⟨t+1⟩ y ^ ⟨ t + 1 ⟩ 是一个(softmax)概率向量(其各项介于0和1之间且总和为1)。 ŷ ⟨t+1⟩i y ^ i ⟨ t + 1 ⟩ 表示“i”索引的字符是下一个字符的概率。 我们提供了一个可以使用的softmax()函数。
3、 进行抽样:根据 ŷ ⟨t+1⟩ y ^ ⟨ t + 1 ⟩ 指定的概率分布选择下一个字符的索引。 这意味着如果 ŷ ⟨t+1⟩i y ^ i ⟨ t + 1 ⟩ = 0.16,你会以16%的概率选择索引“i”。 要实现它,你可以使用np.random.choice。
使用 np.random.choice() 的一个例子
np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())
这意味着你将根据如下分布选择index :
P(index=0)=0.1, P(index=1)=0.0, P(index=2)=0.7, P(index=3)=0.2
4、 在 sample() 中实现的最后一步是覆盖变量x,其值由 ŷ ⟨t⟩ y ^ ⟨ t ⟩ 改为 ŷ ⟨t+1⟩ y ^ ⟨ t + 1 ⟩ 。 您将通过创建一个one-hot向量作为预测来表示 ŷ ⟨t+1⟩ y ^ ⟨ t + 1 ⟩ 。 然后你将在步骤1中向前传播 ŷ ⟨t+1⟩ y ^ ⟨ t + 1 ⟩ 并继续重复这个过程,直到你得到一个”\n”字符,表示你已经达到了恐龙名字的末尾。
# GRADED FUNCTION: sample
def sample(parameters, char_to_ix, seed):
"""
Sample a sequence of characters according to a sequence of probability distributions output of the RNN
Arguments:
parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.
char_to_ix -- python dictionary mapping each character to an index.
seed -- used for grading purposes. Do not worry about it.
Returns:
indices -- a list of length n containing the indices of the sampled characters.
"""
# Retrieve parameters and relevant shapes from "parameters" dictionary
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
vocab_size = by.shape[0]
n_a = Waa.shape[1]
### START CODE HERE ###
# Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
x = np.zeros((vocab_size, 1))
# Step 1': Initialize a_prev as zeros (≈1 line)
a_prev = np.zeros((n_a, 1))
# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
indices = []
# Idx is a flag to detect a newline character, we initialize it to -1
idx = -1
# Loop over time-steps t. At each time-step, sample a character from a probability distribution and append
# its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well
# trained model), which helps debugging and prevents entering an infinite loop.
counter = 0
newline_character = char_to_ix['\n']
while (idx != newline_character and counter != 50):
# Step 2: Forward propagate x using the equations (1), (2) and (3)
a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)
z = np.dot(Wya, a) + by
y = softmax(z)
# for grading purposes
np.random.seed(counter+seed)
# Step 3: Sample the index of a character within the vocabulary from the probability distribution y
idx = np.random.choice(range(len(y)), p = y.ravel())
# Append the index to "indices"
indices.append(idx)
# Step 4: Overwrite the input character as the one corresponding to the sampled index.
x = np.zeros((vocab_size, 1))
x[idx] = 1
# Update "a_prev" to be "a"
a_prev = a
# for grading purposes
seed += 1
counter +=1
### END CODE HERE ###
if (counter == 50):
indices.append(char_to_ix['\n'])
return indices
#########################################################
np.random.seed(2)
_, n_a = 20, 100
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
indices = sample(parameters, char_to_ix, 0)
print("Sampling:")
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])
# Sampling:
# list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
# list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']
期待的输出
key | value |
---|---|
list of sampled indices: | [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0] |
list of sampled characters: | [‘l’, ‘q’, ‘x’, ‘n’, ‘m’, ‘i’, ‘j’, ‘v’, ‘x’, ‘f’, ‘m’, ‘k’, ‘l’, ‘f’, ‘u’, ‘o’, ‘u’, ‘n’, ‘c’, ‘b’, ‘a’, ‘u’, ‘r’, ‘x’, ‘g’, ‘y’, ‘f’, ‘y’, ‘r’, ‘j’, ‘p’, ‘b’, ‘c’, ‘h’, ‘o’, ‘l’, ‘k’, ‘g’, ‘a’, ‘l’, ‘j’, ‘b’, ‘g’, ‘g’, ‘k’, ‘e’, ‘f’, ‘l’, ‘y’, ‘\n’, ‘\n’] |
是时候为文本生成器构建字符级别的语言模型了。
在本节中,您将实现单步随机梯度下降的函数(使用裁剪梯度)。一次一个样例,所以是随机梯度下降。 提醒一下,以下是RNN通用优化循环的步骤:
已经提供的函数
def rnn_forward(X, Y, a_prev, parameters):
""" Performs the forward propagation through the RNN and computes the cross-entropy loss.
It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
....
return loss, cache
def rnn_backward(X, Y, parameters, cache):
""" Performs the backward propagation through time to compute the gradients of the loss with respect
to the parameters. It returns also all the hidden states."""
...
return gradients, a
def update_parameters(parameters, gradients, learning_rate):
""" Updates parameters using the Gradient Descent Update Rule."""
...
return parameters
代码
# GRADED FUNCTION: optimize
def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
"""
Execute one step of the optimization to train the model.
Arguments:
X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
Y -- list of integers, exactly the same as X but shifted one index to the left.
a_prev -- previous hidden state.
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
b -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
learning_rate -- learning rate for the model.
Returns:
loss -- value of the loss function (cross-entropy)
gradients -- python dictionary containing:
dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
db -- Gradients of bias vector, of shape (n_a, 1)
dby -- Gradients of output bias vector, of shape (n_y, 1)
a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
"""
### START CODE HERE ###
# Forward propagate through time (≈1 line)
loss, cache = rnn_forward(X, Y, a_prev, parameters)
# Backpropagate through time (≈1 line)
gradients, a = rnn_backward(X, Y, parameters, cache)
# Clip your gradients between -5 (min) and 5 (max) (≈1 line)
gradients = clip(gradients, 5)
# Update parameters (≈1 line)
parameters = update_parameters(parameters, gradients, learning_rate)
### END CODE HERE ###
return loss, gradients, a[len(X)-1]
########################################################
np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]
loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])
# Loss = 126.503975722
# gradients["dWaa"][1][2] = 0.194709315347
# np.argmax(gradients["dWax"]) = 93
# gradients["dWya"][1][2] = -0.007773876032
# gradients["db"][4] = [-0.06809825]
# gradients["dby"][1] = [ 0.01538192]
# a_last[4] = [-1.]
期待的输出
key | value |
---|---|
Loss | 126.503975722 |
gradients[“dWaa”][1][2] | 0.194709315347 |
np.argmax(gradients[“dWax”]) | 93 |
gradients[“dWya”][1][2] | -0.007773876032 |
gradients[“db”][4] | [-0.06809825] |
gradients[“dby”][1] | [ 0.01538192] |
a_last[4] | [-1.] |
基于恐龙名称的数据集,我们使用数据集的每一行(一个名称)作为一个训练示例。 每100个随机梯度下降步骤抽取10个随机名称来查看该算法的效果。 请记住要打乱数据集,以便随机梯度下降以随机顺序访问示例。
当examples[index]包含一个恐龙名字时,我们可以如下创建一个(X,Y)
index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]]
Y = X[1:] + [char_to_ix["\n"]]
注意:使用index= j % len(examples), 其中 j = 1….num_iterations, 为了确保index不会越界examples
X的第一个元素是None(也就是rnn_forward()中x⟨0⟩=0),Y=X+1并且Y以”\n”结尾
# GRADED FUNCTION: model
def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
"""
Trains the model and generates dinosaur names.
Arguments:
data -- text corpus
ix_to_char -- dictionary that maps the index to a character
char_to_ix -- dictionary that maps a character to an index
num_iterations -- number of iterations to train the model for
n_a -- number of units of the RNN cell
dino_names -- number of dinosaur names you want to sample at each iteration.
vocab_size -- number of unique characters found in the text, size of the vocabulary
Returns:
parameters -- learned parameters
"""
# Retrieve n_x and n_y from vocab_size
n_x, n_y = vocab_size, vocab_size
# Initialize parameters
parameters = initialize_parameters(n_a, n_x, n_y)
# Initialize loss (this is required because we want to smooth our loss, don't worry about it)
loss = get_initial_loss(vocab_size, dino_names)
# Build list of all dinosaur names (training examples).
with open("dinos.txt") as f:
examples = f.readlines()
examples = [x.lower().strip() for x in examples]
# Shuffle list of all dinosaur names
np.random.seed(0)
np.random.shuffle(examples)
# Initialize the hidden state of your LSTM
a_prev = np.zeros((n_a, 1))
# Optimization loop
for j in range(num_iterations):
### START CODE HERE ###
# Use the hint above to define one training example (X,Y) (≈ 2 lines)
index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]]
Y = X[1:] + [char_to_ix["\n"]]
# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
# Choose a learning rate of 0.01
curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
### END CODE HERE ###
# Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
loss = smooth(loss, curr_loss)
# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
if j % 2000 == 0:
print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
# The number of dinosaur names to print
seed = 0
for name in range(dino_names):
# Sample indices and print them
sampled_indices = sample(parameters, char_to_ix, seed)
print_sample(sampled_indices, ix_to_char)
seed += 1 # To get the same result for grading purposed, increment the seed by one.
print('\n')
return parameters
######################################################
parameters = model(data, ix_to_char, char_to_ix)
开始时随机的字符串像是随机字符串,当循环几千次后,将会学会合理的名字。
你可以看到你的算法在最后已经能够生成合理的恐龙名称了。起初,算法生成的是随机字符串,但后来就可以生成一些结尾很酷的恐龙的名字。如果你想要更好的结果,可以运行更长的时间或者调整超参数。我们的实现确实产生了一些很酷的名字,比如:maconucon,marloralus和macingsersaurus。从结果我们也了解到恐龙的名字通常以saurus, don, aura, tor等结尾。
如果你的模型产生了一些不够酷的名字,也不要完全责怪你的模型。并不是所有的真实恐龙名字都很酷。(例如:dromaeosauroides是一种真正的恐龙名称,并且在训练集中。)但是,这个模型给了你一组候选,你可以从中选取最酷的。
这项任务使用的数据集相对较小,以便于你在CPU上快速训练RNN,培养英语语言模型需要一个更大的数据集,通常需要更多的计算,更长的时间。目前我们最细换的名字是伟大的,不可战胜的,彪悍的:Mangosaurus!
下面的部分是可选的,但希望你做一下,因为它信息丰富并且非常有趣。
一个类似的(但更复杂的)任务是生成莎士比亚诗歌。 不是从恐龙名称的数据集中学习,而是使用莎士比亚诗歌集。 使用LSTM单元,你可以学习跨越很多个文本的信息,这一点在恐龙名字中没有那么重要,因为名字很短。
我们已经利用Keras实现了一个莎士比亚诗歌生成器。
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io
# Using TensorFlow backend.
#
# Loading text data...
# Creating training set...
# number of training examples: 31412
# Vectorizing training set...
# Loading model...
shakespeare_utils 中有用的函数
# Load Packages
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import random
import sys
import io
def build_data(text, Tx = 40, stride = 3):
"""
Create a training set by scanning a window of size Tx over the text corpus, with stride 3.
Arguments:
text -- string, corpus of Shakespearian poem
Tx -- sequence length, number of time-steps (or characters) in one training example
stride -- how much the window shifts itself while scanning
Returns:
X -- list of training examples
Y -- list of training labels
"""
X = []
Y = []
### START CODE HERE ### (≈ 3 lines)
for i in range(0, len(text) - Tx, stride):
X.append(text[i: i + Tx])
Y.append(text[i + Tx])
### END CODE HERE ###
print('number of training examples:', len(X))
return X, Y
def vectorization(X, Y, n_x, char_indices, Tx = 40):
"""
Convert X and Y (lists) into arrays to be given to a recurrent neural network.
Arguments:
X --
Y --
Tx -- integer, sequence length
Returns:
x -- array of shape (m, Tx, len(chars))
y -- array of shape (m, len(chars))
"""
m = len(X)
x = np.zeros((m, Tx, n_x), dtype=np.bool)
y = np.zeros((m, n_x), dtype=np.bool)
for i, sentence in enumerate(X):
for t, char in enumerate(sentence):
x[i, t, char_indices[char]] = 1
y[i, char_indices[Y[i]]] = 1
return x, y
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
out = np.random.choice(range(len(chars)), p = probas.ravel())
return out
#return np.argmax(probas)
def on_epoch_end(epoch, logs):
# Function invoked at end of each epoch. Prints generated text.
None
#start_index = random.randint(0, len(text) - Tx - 1)
#generated = ''
#sentence = text[start_index: start_index + Tx]
#sentence = '0'*Tx
#usr_input = input("Write the beginning of your poem, the Shakespearian machine will complete it.")
# zero pad the sentence to Tx characters.
#sentence = ('{0:0>' + str(Tx) + '}').format(usr_input).lower()
#generated += sentence
#
#sys.stdout.write(usr_input)
#for i in range(400):
"""
#x_pred = np.zeros((1, Tx, len(chars)))
for t, char in enumerate(sentence):
if char != '0':
x_pred[0, t, char_indices[char]] = 1.
preds = model.predict(x_pred, verbose=0)[0]
next_index = sample(preds, temperature = 1.0)
next_char = indices_char[next_index]
generated += next_char
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
if next_char == '\n':
continue
# Stop at the end of a line (4 lines)
print()
"""
print("Loading text data...")
text = io.open('shakespeare.txt', encoding='utf-8').read().lower()
#print('corpus length:', len(text))
Tx = 40
chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
#print('number of unique characters in the corpus:', len(chars))
print("Creating training set...")
X, Y = build_data(text, Tx, stride = 3)
print("Vectorizing training set...")
x, y = vectorization(X, Y, n_x = len(chars), char_indices = char_indices)
print("Loading model...")
model = load_model('models/model_shakespeare_kiank_350_epoch.h5')
def generate_output():
generated = ''
#sentence = text[start_index: start_index + Tx]
#sentence = '0'*Tx
usr_input = input("Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: ")
# zero pad the sentence to Tx characters.
sentence = ('{0:0>' + str(Tx) + '}').format(usr_input).lower()
generated += usr_input
sys.stdout.write("\n\nHere is your poem: \n\n")
sys.stdout.write(usr_input)
for i in range(400):
x_pred = np.zeros((1, Tx, len(chars)))
for t, char in enumerate(sentence):
if char != '0':
x_pred[0, t, char_indices[char]] = 1.
preds = model.predict(x_pred, verbose=0)[0]
next_index = sample(preds, temperature = 1.0)
next_char = indices_char[next_index]
generated += next_char
sentence = sentence[1:] + next_char
sys.stdout.write(next_char)
sys.stdout.flush()
if next_char == '\n':
continue
为了节省你的时间,我们已经训练了1000个epoch的模型“The Sonnets”。
让我们将模型再训练一个epoch,当它完成一个epoch的训练时(这也需要几分钟),你可以运行generate_output,它会提示你输入一个句子(< 40个字符)。 诗将以你的句子开头,我们的RNN莎士比亚将为你完成诗的其余部分! 例如,尝试”Forsooth this maketh no sense”(不要输入引号)。 根据最后是否包含空格,结果可能也会不同 (尝试带和不带空格的两种方法,并尝试其他输入)
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])
# Epoch 1/1
# 31412/31412 [==============================] - 205s - loss: 2.5645
输出
# Run this cell to try with different inputs without having to re-train the model
generate_output()
RNN-Shakespeare模型与你为恐龙名称建立的模型非常相似。 唯一的主要区别是:
如果您想了解更多信息,可以在GitHub上查看KerasTeam的文本生成实现:https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py。
恭喜你完成这个作业!