keras bi-lstm_LSTM用于文本生成的应用介绍-使用Keras和启用GPU的Kaggle Kernels

keras bi-lstm

by Megan Risdal

梅根·里斯达尔(Megan Risdal)

LSTM用于文本生成的应用介绍-使用Keras和启用GPU的Kaggle Kernels (An applied introduction to LSTMs for text generation — using Keras and GPU-enabled Kaggle Kernels)

Kaggle recently gave data scientists the ability to add a GPU to Kernels (Kaggle’s cloud-based hosted notebook platform). I knew this would be the perfect opportunity for me to learn how to build and train more computationally intensive models.

Kaggle最近使数据科学家能够在Kernels (Kaggle的基于云的托管笔记本平台)中添加GPU。 我知道这将是我学习如何构建和训练更多计算密集型模型的绝佳机会。

With Kaggle Learn, Keras documentation, and cool natural language data from freeCodeCamp I had everything I needed to advance from random forests to recurrent neural networks.

使用Kaggle Learn , Keras文档和freeCodeCamp的出色自然语言数据,我拥有了从随机森林发展到递归神经网络所需的一切。

In this blog post, I’ll show you how I used text from freeCodeCamp’s Gitter chat logs dataset published on Kaggle Datasets to train an LSTM network which generates novel text output.

在此博客文章中,我将向您展示如何使用freeCodeCamp的Gitter聊天日志数据集(发布在Kaggle数据集上)中的文本来训练生成新文本输出的LSTM网络。

You can find all of my reproducible code in this Python notebook kernel.

您可以在此Python笔记本内核中找到我所有可复制的代码。

Now that you can use GPUs in Kernels — Kaggle’s, cloud-based hosted notebook platform— with 6 hours of run time, you can train much more computationally intensive models than ever before on Kaggle.

现在,您可以在运行6个小时的 Kernels(基于Kaggle的基于云的托管笔记本平台)中使用GPU ,现在可以在Kaggle上训练比以往更多的计算密集型模型。

I’ll use a GPU to train the model in this notebook. (You can request a GPU for your session by clicking on the “Settings” tab from a kernel editor.)

我将使用GPU在此笔记本中训练模型。 (您可以通过在内核编辑器中单击“设置”选项卡来为会话请求GPU。)

import tensorflow as tfprint(tf.test.gpu_device_name())# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growthconfig = tf.ConfigProto()config.gpu_options.allow_growth = True

I’ll use text from one of the channel’s most prolific user ids as the training data. There are two parts to this notebook:

我将使用渠道中最多产的用户ID之一中的文本作为训练数据。 此笔记本有两部分:

  1. Reading in, exploring, and preparing the data

    读入,探索和准备数据
  2. Training the LSTM on a single user id’s chat logs and generating novel text as output

    在单个用户ID的聊天记录上训练LSTM并生成新颖的文本作为输出

You can follow along by simply reading the notebook or you can fork it (click “Fork notebook”) and run the cells yourself to learn what each part is doing interactively. By the end, you’ll learn how to format text data as input to a character-level LSTM model implemented in Keras and in turn use the model’s character-level predictions to generate novel sequences of text.

您可以通过简单地阅读笔记本来进行操作,也可以将其派生 (单击“叉子笔记本”)并自己运行单元以了解每个部分的交互作用。 到最后,您将学习如何格式化文本数据作为在Keras实现的字符级LSTM模型的输入 ,然后使用模型的字符级预测来生成新颖的文本序列

Before I get into the code, what is an LSTM (“Long Short-Term Memory”) network anyway?

在开始学习代码之前,什么是LSTM(“长期短期记忆”)网络?

In this tutorial, we’ll take a hands-on approach to implementing this flavor of recurrent neural network especially equipped to handle longer distance dependencies (including ones you get with language) in Keras, a deep learning framework.

在本教程中,我们将采用动手方法来实现这种递归神经网络,特别是在深度学习框架Keras中,它能够处理更长距离的依赖关系(包括使用语言获得的依赖关系)。

If you want to review more of the theoretical underpinnings, I recommend that you check out this excellent blog post, Understanding LSTM Networks by Christopher Olah.

如果您想回顾更多的理论基础,建议您阅读这篇出色的博客文章,克里斯托弗·奥拉(Christopher Olah)的《 了解LSTM网络》 。

第一部分:数据准备 (Part one: Data Preparation)

In part one, I’ll first read in the data and try to explore it enough to give you a sense of what we’re working with. One of my frustrations with following non-interactive tutorials (such as static code shared on GitHub) is that it’s often hard to know how the data you want to work with differs from the code sample. You have to download it and compare it locally which is a pain.

在第一部分中,我将首先读入数据并尝试对其进行足够的探索,以使您对我们正在使用的内容有所了解。 我对以下非交互式教程(例如在GitHub上共享的静态代码)的不满之一是,通常很难知道您要使用的数据与代码示例有何不同。 您必须下载它并在本地进行比较,这很痛苦。

The two nice things about following this tutorial using Kernels is that a) I’ll try to give you glimpses into the data at every significant step; and 2) you can always fork this notebook and ?boom? you’ve got a copy of my environment, data, Docker image, and all with no downloads or installs necessary whatsoever. Especially if you have experience installing CUDA to use GPUs for deep learning, you’ll appreciate how wonderful it is to have an environment already setup for you.

使用Kernels跟随本教程的两件事是:a)我将在每个重要步骤中尝试让您了解数据; 和2)您随时可以拨叉笔记本并“轰”? 您可以获得我的环境,数据,Docker映像的副本,并且所有这些都无需下载或安装。 尤其是如果您有安装CUDA以便使用GPU进行深度学习的经验,那么您将欣赏为您设置好的环境是多么美妙。

读入数据 (Read in the data)

import pandas as pdimport numpy as np# Read in only the two columns we need chat = pd.read_csv('../input/freecodecamp_casual_chatroom.csv', usecols = ['fromUser.id', 'text'])
# Removing user id for CamperBotchat = chat[chat['fromUser.id'] != '55b977f00fc9f982beab7883'] chat.head()

Looks good!

看起来挺好的!

探索数据 (Explore the data)

In my plot below, you can see the number of posts from the top ten most active chat participants by their user id in freeCodeCamp’s Gitter:

在下面的图表中,您可以通过freeCodeCamp的Gitter中的用户ID查看前十名最活跃的聊天参与者的帖子数:

import matplotlib.pyplot as pltplt.style.use('fivethirtyeight')f, g = plt.subplots(figsize=(12, 9))chat['fromUser.id'].value_counts().head(10).plot.bar(color="green")g.set_xticklabels(g.get_xticklabels(), rotation=25)plt.title("Most active users in freeCodeCamp's Gitter channel")plt.show(g)

So, userid 55a7c9e08a7b72f55c3f991e is the most active user in the channel with over 140,000 messages. We'll use their messages to train the LSTM to generate novel 55a7c9e08a7b72f55c3f991e-like sentences. But first, let's take a look at the first few messages from 55a7c9e08a7b72f55c3f991e to get a sense for what they're chatting about:

因此,userid 55a7c9e08a7b72f55c3f991e是该频道中最活跃的用户,包含超过140,000条消息。 我们将使用他们的消息来训练LSTM生成55a7c9e08a7b72f55c3f991e的新颖句子。 但是首先,让我们看一下来自55a7c9e08a7b72f55c3f991e的前几条消息,以了解他们正在聊天的内容:

chat[chat['fromUser.id'] == "55a7c9e08a7b72f55c3f991e"].text.head(20)

I see words and phrases like “documentation”, “pair coding”, “BASH”, “Bootstrap”, “CSS”, etc. And I can only assume the sentence starting “With all of the various frameworks…” is referring to JavaScript. Yep, sounds like they’re on-topic as far as freeCodeCamp goes. So we’ll expect our novel sentences to look roughly like this if we’re successful.

我看到诸如“文档”,“对编码”,“ BASH”,“ Bootstrap”,“ CSS”等之类的单词和短语。我只能假设以“使用所有各种框架……”开头的句子是指JavaScript 。 是的,听起来好像它们在freeCodeCamp范围内都是热门话题。 因此,如果成功的话,我们希望我们的新颖句子大致像这样。

准备序列数据以输入到LSTM (Prepare sequence data for input to LSTM)

Right now we have a dataframe with columns corresponding to user ids and message text where each row corresponds to a single message sent. This is pretty far from the 3D shape the input layer of our LSTM network requires: model.add(LSTM(batch_size, input_shape=(time_steps, features))) where batch_size is the number of sequences in each sample (can be one or more), time_steps is the size of observations in each sample, and features is the number of possible observable features (i.e., characters in our case).

现在,我们有了一个数据框,其中的列与用户ID和消息文本相对应,其中每一行都与发送的一条消息相对应。 这与LSTM网络输入层所需的3D形状相差很远: model.add(LSTM(batch_size, input_shape=(time_steps, features)))其中batch_size是每个样本中的序列数(可以是一个或多个) ), time_steps是每个样本中观察值的大小, features是可能可观察到的特征(即本例中的字符)的数量。

So how do we get from a dataframe to sequence data in the correct shape? I’ll break it into three steps:

那么,如何从数据帧中以正确的形状对数据进行排序呢? 我将其分为三个步骤:

  1. Subset the data to form a corpus

    子集数据以形成语料库
  2. Format the corpus from #1 into arrays of semi-overlapping sequences of uniform length and next characters

    将语料库从#1格式化为均等长度和下一个字符的半重叠序列的数组
  3. Represent the sequence data from #2 as sparse boolean tensors

    将#2中的序列数据表示为稀疏布尔张量

子集数据以形成语料库 (Subset the data to form a corpus)

In the next two cells, we’ll grab only messages from 55a7c9e08a7b72f55c3f991e ('fromUser.id' == '55a7c9e08a7b72f55c3f991e') to subset the data and collapse the vector of strings into a single string. Since we don't care if our model generates text with correct capitalization, we use tolower(). This gives the model one less dimension to learn.

在接下来的两个单元格中,我们仅捕获来自55a7c9e08a7b72f55c3f991e ( 'fromUser.id' == '55a7c9e08a7b72f55c3f991e' )的消息,以对数据进行子集并将字符串向量折叠为单个字符串。 由于我们不在乎我们的模型是否生成具有正确大写字母的文本,因此我们使用tolower() 。 这使模型学习的维度减少了。

I’m also just going to use the first 20% of the data as a sample since we don’t need more than that to generate halfway decent text. You can try forking this kernel and experimenting with more (or less) data if you want.

我也将仅使用数据的前20%作为样本,因为我们不需要更多的数据来生成一半的体面文字。 您可以尝试分叉此内核,并根据需要尝试使用更多(或更少)的数据。

user = chat[chat['fromUser.id'] == '55a7c9e08a7b72f55c3f991e'].textn_messages = len(user)n_chars = len(' '.join(map(str, user)))print("55a7c9e08a7b72f55c3f991e accounts for %d messages" % n_messages)print("Their messages add up to %d characters" % n_chars)
sample_size = int(len(user) * 0.2)user = user[:sample_size]user = ' '.join(map(str, user)).lower()user[:100] # Show first 100 characters

将语料库格式化为均等长度和下一个字符的半重叠序列的数组 (Format the corpus into arrays of semi-overlapping sequences of uniform length and next characters)

The rest of the code used here is adapated from this example script, originally written by François Chollet (author of Keras and Kaggler), to prepare the data in the correct format for training an LSTM. Since we’re training a character-level model, we relate unique characters (such as “a”, “b”, “c”, …) to numeric indices in the cell below. If you rerun this code yourself by clicking “Fork Notebook” you can print out all of the characters used.

此处使用的其余代码均来自此示例脚本 (最初由FrançoisChollet(Keras和Kaggler的作者)编写)编写,以正确的格式准备数据以训练LSTM。 由于我们正在训练字符级模型,因此我们将唯一字符(例如“ a”,“ b”,“ c”,……)与下面单元格中的数字索引相关联。 如果您通过单击“ Fork Notebook”自己重新运行此代码,则可以打印出所有使用的字符。

chars = sorted(list(set(user)))print('Count of unique characters (i.e., features):', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))indices_char = dict((i, c) for i, c in enumerate(chars))

This next cell step gives us an array, sentences, made up of maxlen (40) character sequences chunked in steps of 3 characters from our corpus user, and next_chars, an array of single characters from user at i + maxlen for each i. I've printed out the first 10 strings in the array so you can see we're chunking the corpus into partially overlapping, equal length "sentences."

接下来的单元格步骤为我们提供了一个数组, sentences ,该sentencesmaxlen (40)个字符序列组成,该序列按语料库user的3个字符进行分步,而next_charsi + maxlen中每个i user的单个字符数组。 我已经打印出数组中的前10个字符串,因此您可以看到我们正在将语料库分块为部分重叠,等长的“句子”。

maxlen = 40step = 3sentences = []next_chars = []for i in range(0, len(user) - maxlen, step):    sentences.append(user[i: i + maxlen])    next_chars.append(user[i + maxlen])print('Number of sequences:', len(sentences), "\n")print(sentences[:10], "\n")print(next_chars[:10])

You can see how the next character following the first sequence 'hi folks. just doing the new signee stuf' is the character f to finish the word "stuff". And the next character following the sequence 'folks. just doing the new signee stuff. ' is the character h to start the word "hello". In this way, it should be clear now how next_chars is the "data labels" or ground truth for our sequences in sentences and our model trained on this labeled data will be able to generate new next characters as predictions given sequence input.

您会看到在第一个序列之后的下一个字符的'hi folks. just doing the new signee stuf' 'hi folks. just doing the new signee stuf' ,字符f就会完成“ stuff”一词。 然后是序列中的下一个字符'folks. just doing the new signee stuff. ' 'folks. just doing the new signee stuff. ' 'folks. just doing the new signee stuff. '是字符“ hello”开头的字符h 。 这样,现在应该很清楚next_charssentences序列的“数据标签”或基础事实,并且在此标记数据上训练的模型将能够生成新的下一个字符作为给定序列输入的预测。

将序列数据表示为稀疏布尔张量 (Represent the sequence data as sparse boolean tensors)

The next cell will take a few seconds to run if you’re following along interactively in the kernel. We’re creating a sparse boolean tensors x and y encoding character-level features from sentences and next_chars to use as inputs to the model we train. The shape we end up with will be: input_shape=(maxlen, len(chars)) where maxlen is 40 and len(chars) is the number of features (i.e., unique count of characters from our corpus).

如果您要在内核中进行交互式跟踪,则下一个单元将需要几秒钟来运行。 我们正在从sentencesnext_chars创建一个稀疏的布尔张量xy编码字符级特征,以用作我们训练模型的输入。 我们最终得到的形状将是: input_shape=(maxlen, len(chars)) ,其中maxlen是40,而len(chars)是特征的数量(即,来自我们语料库的字符的唯一计数)。

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)y = np.zeros((len(sentences), len(chars)), dtype=np.bool)for i, sentence in enumerate(sentences):    for t, char in enumerate(sentence):        x[i, t, char_indices[char]] = 1    y[i, char_indices[next_chars[i]]] = 1

第二部分:建模 (Part two: Modeling)

In part two, we do the actual model training and text generation. We’ve explored the data and reshaped it correctly so we that we can use it as an input to our LSTM model. There are two sections to this part:

在第二部分中,我们进行了实际的模型训练和文本生成。 我们已经探究了数据并正确地对其进行了调整,以便可以将其用作LSTM模型的输入。 本部分分为两部分:

  1. Defining an LSTM network model

    定义LSTM网络模型
  2. Training the model and generating predictions

    训练模型并生成预测

定义LSTM网络模型 (Defining an LSTM network model)

Let’s start by reading in our libraries. I’m using Keras which is a popular and easy-to-use interface to a TensorFlow backend. Read more about why to use Keras as a deep learning framework here. Below you can see the models, layers, optimizers, and callbacks we’ll be using.

让我们开始阅读我们的库。 我正在使用Keras,这是TensorFlow后端的流行且易于使用的界面。 在此处阅读有关为何将Keras用作深度学习框架的更多信息 。 在下面,您可以看到我们将使用的模型,层,优化器和回调。

from keras.models import Sequentialfrom keras.layers import Dense, Activationfrom keras.layers import LSTMfrom keras.optimizers import RMSpropfrom keras.callbacks import LambdaCallback, ModelCheckpointimport randomimport sysimport io

In the cell below, we define the model. We start with a sequential model and add an LSTM as an input layer. The shape we define for our input is identical to our data by this point which is exactly what we need. I’ve selected a batch_size of 128 which is the number of samples, or sequences, our model looks at during training before updating. You can experiment with different numbers here if you want. I'm also adding a dense output layer. Finally, we'll use add an activation layer with softmax as our activation function as we're in essence doing multiclass classification to predict the next character in a sequence.

在下面的单元格中,我们定义模型。 我们从顺序模型开始,然后添加LSTM作为输入层。 至此,我们为输入定义的形状与我们的数据完全相同,这正是我们所需要的。 我选择的batch_size为128,这是样本或序列的数量,我们的模型在训练之前会在更新之前查看。 您可以根据需要在此处尝试不同的数字。 我还添加了一个密集的输出层。 最后,由于本质上我们在进行多类分类以预测序列中的下一个字符,因此我们将使用带有softmax的激活层作为激活函数。

model = Sequential()model.add(LSTM(128, input_shape=(maxlen, len(chars))))model.add(Dense(len(chars)))model.add(Activation('softmax'))

Now we can compile our model. We’ll use RMSprop with a learning rate of 0.1 to optimize the weights in our model (you can experiment with different learning rates here) and categorical_crossentropy as our loss function. Cross entropy is the same as log loss commonly used as the evaluation metric in binary classification competitions on Kaggle (except in our case there are more than two possible outcomes).

现在我们可以编译我们的模型了。 我们将使用学习率为0.1 RMSprop来优化模型中的权重(您可以在此处使用不同的学习率进行实验),并使用categorical_crossentropy作为我们的损失函数。 交叉熵与在Kaggle上二元分类比赛中通常用作评估指标的对数损失相同(在我们的情况下,有两种以上可能的结果)。

optimizer = RMSprop(lr=0.01)model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Now our model is ready. Before we feed it any data, the cell below defines a couple of helper functions with code modified from this script. The first one, sample(), samples an index from an array of probabilities with some temperature. Quick pause to ask, what is temperature exactly?

现在我们的模型已经准备就绪。 在向我们提供任何数据之前,下面的单元格定义了一些辅助函数, 并通过此脚本修改了代码 。 第一个sample()从具有一定temperature的一系列概率中抽取索引。 快速停顿一下,温度到底是多少?

Temperature is a scaling factor applied to the outputs of our dense layer before applying the softmaxactivation function. In a nutshell, it defines how conservative or "creative" the model's guesses are for the next character in a sequence. Lower values of temperature (e.g., 0.2) will generate "safe" guesses whereas values of temperature above 1.0 will start to generate "riskier" guesses. Think of it as the amount of surpise you'd have at seeing an English word start with "st" versus "sg". When temperature is low, we may get lots of "the"s and "and"s; when temperature is high, things get more unpredictable.

在应用softmax激活函数之前, 温度 是应用于密集层输出的比例因子。 简而言之,它定义了模型的猜测对于序列中下一个字符的保守性或“创造性”。 较低的temperature值(例如0.2 )将生成“安全”猜测,而高于1.0temperature值将开始生成“较高风险”猜测。 可以将其想象为看到英语单词以“ st”对“ sg”开头时所产生的惊喜。 当温度低时,我们可能会得到很多“ the”和“ and”; 当温度高时,事情变得更加不可预测。

Anyway, so the second is defining a callback function to print out predicted text generated by our trained LSTM at the first and then every subsequent fifth epoch with five different settings of temperature each time (see the line for diversity in [0.2, 0.5, 1.0, 1.2]: for the values of temperature; feel free to tweak these, too!). This way we can fiddle with the temperature knob to see what gets us the best generated text ranging from conservative to creative. Note that we're using our model to predict based on a random sequence, or "seed", from our original subsetted data, user: start_index = random.randint(0, len(user) - maxlen - 1).

无论如何,因此第二个方法定义了一个回调函数,以在第一个然后每个随后的第五个世纪打印出由我们训练有素的LSTM生成的预测文本,每次使用五个不同的temperature设置(请参见for diversity in [0.2, 0.5, 1.0, 1.2]:for diversity in [0.2, 0.5, 1.0, 1.2]:表示temperature值;也可以随意调整这些值!)。 这样,我们可以摆弄temperature旋钮,看看是什么使我们获得了最佳的生成文本,从保守到创意。 请注意,我们使用模型根据原始子集数据userstart_index = random.randint(0, len(user) - maxlen - 1)的随机序列或“种子”进行预测。

Finally, we name our callback function generate_text which we'll add to the list of callbacks when we fit our model in the cell after this one.

最后,我们将回调函数命名为generate_text ,当我们将模型放入此单元格后,将其添加到回调列表中。

def sample(preds, temperature=1.0):    # helper function to sample an index from a probability array    preds = np.asarray(preds).astype('float64')    preds = np.log(preds) / temperature    exp_preds = np.exp(preds)    preds = exp_preds / np.sum(exp_preds)    probas = np.random.multinomial(1, preds, 1)    return np.argmax(probas)def on_epoch_end(epoch, logs):    # Function invoked for specified epochs. Prints generated text.    # Using epoch+1 to be consistent with the training epochs printed by Keras    if epoch+1 == 1 or epoch+1 == 15:        print()        print('----- Generating text after Epoch: %d' % epoch)        start_index = random.randint(0, len(user) - maxlen - 1)        for diversity in [0.2, 0.5, 1.0, 1.2]:            print('----- diversity:', diversity)            generated = ''            sentence = user[start_index: start_index + maxlen]            generated += sentence            print('----- Generating with seed: "' + sentence + '"')            sys.stdout.write(generated)            for i in range(400):                x_pred = np.zeros((1, maxlen, len(chars)))                for t, char in enumerate(sentence):                    x_pred[0, t, char_indices[char]] = 1.                preds = model.predict(x_pred, verbose=0)[0]                next_index = sample(preds, diversity)                next_char = indices_char[next_index]                generated += next_char                sentence = sentence[1:] + next_char                sys.stdout.write(next_char)                sys.stdout.flush()            print()    else:        print()        print('----- Not generating text after Epoch: %d' % epoch)generate_text = LambdaCallback(on_epoch_end=on_epoch_end)

训练模型并生成预测 (Training the model and generating predictions)

Finally we’ve made it! Our data is ready (x for sequences, y for next characters), we've chosen a batch_size of 128, and we've defined a callback function which will print generated text using model.predict() at the end of the first epoch followed by every fifth epoch with five different temperature setting each time. We have another callback, ModelCheckpoint, which will save the best model at each epoch if it's improved based on our loss value (find the saved weights file weights.hdf5 in the "Output" tab of the kernel).

终于我们做到了! 我们的数据准备就绪( x表示序列, y表示下一个字符),我们选择的batch_size128 ,并且定义了一个回调函数,该函数将在第一个时期末使用model.predict()打印生成的文本。随后是每五个纪元,每次都有五个不同的temperature设置。 我们还有另一个回调ModelCheckpoint ,如果根据我们的损失值进行了改进,它将在每个时期保存最佳模型(在内核的“输出”选项卡中找到保存的权重文件weights.hdf5 )。

Let’s fit our model with these specifications and epochs = 15 for the number of epochs to train. And of course, let's not forget to put our GPU to use! This will make training/prediction much faster than if we used a CPU. In any case, you will still want to grab some lunch or go for a walk while you wait for the model to train and generate predictions if you're running this code interactively.

让我们将模型与这些规格相匹配,并将训练的时期数epochs = 15 。 当然,我们不要忘记使用我们的GPU! 这将使训练/预测比我们使用CPU快得多。 无论如何,如果您正在交互式地运行此代码,则在等待模型训练并生成预测时,您仍然会想要吃午餐或散步。

P.S. If you’re running this interactively in your own notebook on Kaggle, you can click the blue square “Stop” button next to the console at the bottom of your screen to interrupt the model training.

PS:如果您正在自己的笔记本电脑上在Kaggle上交互式运行此程序,则可以单击屏幕底部控制台旁边的蓝色方形“停止”按钮以中断模型训练。

# define the checkpointfilepath = "weights.hdf5"checkpoint = ModelCheckpoint(filepath,                              monitor='loss',                              verbose=1,                              save_best_only=True,                              mode='min')# fit model using our gpuwith tf.device('/gpu:0'):    model.fit(x, y,              batch_size=128,              epochs=15,              verbose=2,              callbacks=[generate_text, checkpoint])

结论 (Conclusion)

And there you have it! If you ran this notebook in Kaggle Kernels, you hopefully caught the model printing out generated text character-by-character to dramatic effect.

在那里,您拥有了! 如果您在Kaggle Kernels中运行此笔记本,则希望该模型能够逐个字符地打印出生成的文本,从而起到戏剧性的作用。

I hope you’ve enjoyed learning how to start from a dataframe containing rows of text to using an LSTM model implemented using Keras in Kernels to generate novel sentences thanks to the power of GPUs. You can see how our model improved from the first epoch to the last. The text generated by the model’s predictions in the first epoch didn’t really resemble English at all. And overall, lower levels of diversity generate text with a lot of repetitions, whereas higher levels of diversity correspond to more gobbledegook.

希望您喜欢从GPU的强大功能开始学习如何从包含文本行的数据帧开始,再到使用在内核中使用Keras实现的LSTM模型生成新颖句子的过程,从中开始。 您可以看到我们的模型从第一个时期到最后一个时期的改进。 该模型在第一个时期的预测所生成的文本与英语完全不相似。 总体而言,较低的多样性水平会生成具有大量重复的文本,而较高的多样性水平则对应于更多的gobbledegook。

Can you tweak the model or its hyperparameters to generate even better text? Try it out for yourself by forking this notebook kernel (click “Fork Notebook” at the top).

您可以调整模型或其超参数以生成更好的文本吗? 通过分叉此笔记本内核来自己尝试一下(单击顶部的“ Fork Notebook”)。

下一步的灵感 (Inspiration for next steps)

Here are just a few ideas for how to take what you learned here and expand it:

以下是有关如何运用您在这里学到的知识并将其扩展的一些想法:

  1. Experiment with different (hyper)-parameters like the amount of training data, number of epochs or batch sizes, temperature, etc.

    使用不同的(超)参数进行试验,例如训练数据的数量,时期或批处理的大小, temperature等。

  2. Try out the same code with different data; fork this notebook, go to the “Data” tab and remove the freeCodeCamp data source, then add a different dataset (good examples here).

    使用不同的数据尝试相同的代码; 派生此笔记本,转到“数据”选项卡并删除freeCodeCamp数据源,然后添加其他数据集( 此处是很好的示例 )。

  3. Try out more complicated network architectures like adding dropout layers.

    试用更复杂的网络体系结构,例如添加退出层。
  4. Learn more about deep learning on Kaggle Learn, a series of videos and hands-on notebook tutorials in Kernels.

    在Kaggle Learn上了解有关深度学习的更多信息 ,该视频在Kernels上有一系列视频和动手笔记本教程。

  5. Use weights.hdf5 in the "Output" to predict based on different data in a new kernel what it would be like if the user in this tutorial completed someone else's sentences.

    使用weights.hdf5在“输出”到一个新的内核将是什么样子,如果在本教程中完成其他人用户的句子地预测基于不同的数据。

  6. Compare the speed-up effect of using a CPU versus a GPU on a minimal example.

    在一个最小的示例中,比较使用CPU与GPU的加速效果。

翻译自: https://www.freecodecamp.org/news/applied-introduction-to-lstms-for-text-generation-380158b29fb3/

keras bi-lstm

你可能感兴趣的:(python,神经网络,机器学习,深度学习,人工智能)