Simple RNN modle
import necessay modules
from __future__ import print_function
from keras.layers import Dense, Activation
from keras.layers.recurrent import SimpleRNN
from keras.models import Sequential
from keras.utils.vis_utils import plot_model
import numpy as np
def process_txt(open_path):
with open(open_path, 'rb') as f:
lines = []
for line in f:
line = line.strip().lower()
line = line.decode("ascii", "ignore")
if 0 == len(line):
text = ' '.join(lines)
return text
text = process_txt('test.txt')
building a character-level RNN
chars = set([c for c in text])
chars_count = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))
*STEP : 输入序列在整个时间序列数据上的span(跨度),滑动的步长
SEQLEN: 输入序列的长度
STEP = 1, SEQLEN = 5
>abcde -> f
>bcdef -> g
>cdefg -> h
>defgh -> i
>efghi -> j
create input and label texts
STEP = 1
input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
vectorize input and label texts
X = np.zeros((len(input_chars), SEQLEN, chars_count), dtype=np.bool)
Y = np.zeros((len(input_chars), chars_count), dtype=np.bool)
for i,input_char in enumerate(input_chars):
for j,c in enumerate(input_char):
X[i, j, char2index[c]] = 1
Y[i, char2index[label_chars[i]]] = 1
X.shape, Y.shape
((1264, 10, 33), (1264, 33))
Build Model
NUM_PREDS_PER_EPOCH : 使用模型生成新的字符的次数
model = Sequential()
model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False,
input_shape=(SEQLEN, chars_count),unroll=True))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
Training and predict
for iteration in range(NUM_ITERATIONS):
print('Iteration : %d'%iteration), Y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION)
# 训练1epoch,测试一次
test_idx = np.random.randint(len(input_chars))
test_chars = input_chars[test_idx]
print('test seed is : %s'%test_chars)
for i in range(NUM_PREDS_PER_EPOCH):
# 测试序列向量化
vec_test = np.zeros((1, SEQLEN, chars_count))
for i, ch in enumerate(test_chars):
vec_test[0, i, char2index[ch]] = 1
pred = model.predict(vec_test, verbose=0)[0]
pred_char = index2char[np.argmax(pred)]
# 不断的加入新生成字符组成新的序列
test_chars = test_chars[1:] + pred_char
Iteration : 0
Epoch 1/1
1264/1264 [==============================] - 0s 100us/step - loss: 0.5441
test seed is : d text, wh
d text, where we set the labelas t oh puaciof iin a ie mrcterelice toat word in a text givel the the oaxu is
Iteration : 1
Epoch 1/1
1264/1264 [==============================] - 0s 85us/step - loss: 0.5206
test seed is : based lang
based language model olxt pice live sois aoslainl u cpnedeto anfed tonguert of purdii toreabile ions baelling
Iteration : 2
Epoch 1/1
1264/1264 [==============================] - 0s 87us/step - loss: 0.4940
test seed is : ut x t+1 a
ut x t+1 at time t+1. for our first example of using keras for building ranguage mrdels arclampleta toompstio
Iteration : 3
Epoch 1/1
1264/1264 [==============================] - 0s 90us/step - loss: 0.4797
test seed is : s instead
s instead of words. we till fo to s ahi a tens an uampir t iomsaiie fomelertoor mpreti s omobiin aoesile t
Iteration : 4
Epoch 1/1
1264/1264 [==============================] - 0s 87us/step - loss: 0.4601
test seed is : rate some
rate some text in the same at lsing f nper1bimedeto tfewoed tood thtra te tuaite t od aois cherabtens asetuu-e
Iteration : 5
Epoch 1/1
1264/1264 [==============================] - 0s 94us/step - loss: 0.4285
test seed is : lling corr
lling correction, and so on ta winut irdlle e tha oathris t ha preeicte horrdst s wllla a aseccerneutod the
Iteration : 6
Epoch 1/1
1264/1264 [==============================] - 0s 88us/step - loss: 0.4158
test seed is : typically
typically a sequence of predicted words. the trained model to the tate that atinss uf lire oxe iacs fom ratba
Iteration : 7
Epoch 1/1
1264/1264 [==============================] - 0s 89us/step - loss: 0.4083
test seed is : various h
various higang aereretilus t r sees in sodea ged in taedtin t adrline s aovengexee toxe il t uis tr wards ts
Iteration : 8
Epoch 1/1
1264/1264 [==============================] - 0s 89us/step - loss: 0.3825
test seed is : g a word-b
g a word-based language model anrewe ue toxte word ie a text as aamivel ofsares. anmaaee tooctalini n adele e
Iteration : 9
Epoch 1/1
1264/1264 [==============================] - 0s 106us/step - loss: 0.3636
test seed is : ulary and
ulary and trains quicker. the idea is the same ity toe mufprt tfxo pseboiis to t illate s s aus d rarali
Iteration : 10
Epoch 1/1
1264/1264 [==============================] - 0s 104us/step - loss: 0.3598
test seed is : nguage mod
nguage models. a language models. a language models. a language models. a language models. a language models.
Iteration : 11
Epoch 1/1
1264/1264 [==============================] - 0s 90us/step - loss: 0.3374
test seed is : o on. a si
o on. a side effect of the ability to predict the next charactersbised aocheracters. ss eae afme ttht so er bu
Iteration : 12
Epoch 1/1
1264/1264 [==============================] - 0s 88us/step - loss: 0.3261
test seed is : instead o
instead of words. we will then use the training data used is enilten oo tuex in ther sme thtu arerabaeil thes
Iteration : 13
Epoch 1/1
1264/1264 [==============================] - 0s 90us/step - loss: 0.3152
test seed is : enerate te
enerate text by ans sevurute th waralt areoatte tho sa inelu wo legtand te tlathe l meael toael the s tuaes to
Iteration : 14
Epoch 1/1
1264/1264 [==============================] - 0s 107us/step - loss: 0.3047
test seed is : (nlp) com
(nlp) community for various applications. one such applications. one such applications. one such applications
Iteration : 15
Epoch 1/1
1264/1264 [==============================] - 0s 108us/step - loss: 0.2944
test seed is : nd the out
nd the output probability of a word in a text given the previous words. language models arelimpo tan for ha i
Iteration : 16
Epoch 1/1
1264/1264 [==============================] - 0s 97us/step - loss: 0.2748
test seed is : ter-based
ter-based model here because it has arsmallertvacabulariens b cherdene s tha term os the tome io t ingtren vro
Iteration : 17
Epoch 1/1
1264/1264 [==============================] - 0s 91us/step - loss: 0.2810
test seed is : wonderlan
wonderland to predict the next character based language models arc amporta thor see iis ay tamllnet. as uach
Iteration : 18
Epoch 1/1
1264/1264 [==============================] - 0s 89us/step - loss: 0.2672
test seed is : te some te
te some text in the same gen whres ten taet axy taralilen uane aoe sifoauel mooe a toc tingta te m1ol
Iteration : 19
Epoch 1/1
1264/1264 [==============================] - 0s 87us/step - loss: 0.2591
test seed is : 1. for our
1. for our first example of using keras for building lnng, ge medel eexe text fivram.min aftf woed bose iod u
Iteration : 20
Epoch 1/1
1264/1264 [==============================] - 0s 86us/step - loss: 0.2424
test seed is : o build a
o build a characters isetead of aoids. whe tuainse moaslli aom abe rd nedtut ve thls ahelasue s oprebivin thed
Iteration : 21
Epoch 1/1
1264/1264 [==============================] - 0s 97us/step - loss: 0.2420
test seed is : on, and so
on, and so on. a side effect of the ability to predict the next character bised language model onxt proeabivit
Iteration : 22
Epoch 1/1
1264/1264 [==============================] - 0s 88us/step - loss: 0.2327
test seed is : is typical
is typically a sequence of predicted words. the trainsd model to tenlrate t mentext of utice tedutaet the tate
Iteration : 23
Epoch 1/1
1264/1264 [==============================] - 0s 87us/step - loss: 0.2186
test seed is : wonderland
wonderland to predict the nrxt chrrdcter biled lang, ge mode praceer ioceaedctoresiton .nn sreg are wordliana
Iteration : 24
Epoch 1/1
1264/1264 [==============================] - 0s 84us/step - loss: 0.2183
test seed is : se charact
se character bised langlage model on the text of alice in wonderland to predict the probabilities. in language