Keras实例目录
代码注释
'''This example demonstrates the use of fasttext for text classification
使用fasttext进行文本分类
Based on Joulin et al's paper:
基于Joulin et al的论文
Bags of Tricks for Efficient Text Classification
有效文本分类的一些技巧
https://arxiv.org/abs/1607.01759
Results on IMDB datasets with uni and bi-gram embeddings:
基于IMDB数据集,使用uni和bi-gram词嵌入得出结果
Uni-gram: 0.8813 test accuracy after 5 epochs. 8s/epoch on i7 cpu.
Uni-gram(单个词): 0.8813 测试准确率,5个周期. 8秒/周期 (i7 cpu).
Bi-gram : 0.9056 test accuracy after 5 epochs. 2s/epoch on GTx 980M gpu.
Bi-gram(3个词) : 0.9056 测试准确率,5个周期. 2秒/周期 (GTx 980M gpu).
'''
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import GlobalAveragePooling1D
from keras.datasets import imdb
def create_ngram_set(input_list, ngram_value=2):
"""
Extract a set of n-grams from a list of integers.
从整数列表中提取一组n-grams。
>>> create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=2)
{(4, 9), (4, 1), (1, 4), (9, 4)}
每个词前后各一词组成元组(共2个词,如果ngram_value=3,取3个词),例如4,取(1,4)、(4、9)
1、4;1、9;1、4;1、1;1、4;4、9;4、4;4、1;4、4;9、4;9、1;9、4;4、1;4、4;1、4
1、4;1、9;1、1;4、9;4、4;4、1;4、4;9、4;9、1;9、4;4、1;4、4;
>>> create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=3)
[(1, 4, 9), (4, 9, 4), (9, 4, 1), (4, 1, 4)]
"""
return set(zip(*[input_list[i:] for i in range(ngram_value)]))
def add_ngram(sequences, token_indice, ngram_range=2):
"""
Augment the input list of list (sequences) by appending n-grams values.
通过添加n-grams值来扩充列表(序列)的输入列表。
Example: adding bi-gram
例如:增加bi-gram(2个词的组合的索引)
>>> sequences = [[1, 3, 4, 5], [1, 3, 7, 9, 2]]
>>> token_indice = {(1, 3): 1337, (9, 2): 42, (4, 5): 2017}
>>> add_ngram(sequences, token_indice, ngram_range=2)
[[1, 3, 4, 5, 1337, 2017], [1, 3, 7, 9, 2, 1337, 42]]
Example: adding tri-gram
例如:增加tri-gram(3个词的组合的索引)
>>> sequences = [[1, 3, 4, 5], [1, 3, 7, 9, 2]]
>>> token_indice = {(1, 3): 1337, (9, 2): 42, (4, 5): 2017, (7, 9, 2): 2018}
>>> add_ngram(sequences, token_indice, ngram_range=3)
[[1, 3, 4, 5, 1337], [1, 3, 7, 9, 2, 1337, 2018]]
"""
new_sequences = []
for input_list in sequences:
new_list = input_list[:]
for i in range(len(new_list) - ngram_range + 1):
for ngram_value in range(2, ngram_range + 1):
ngram = tuple(new_list[i:i + ngram_value])
if ngram in token_indice:
new_list.append(token_indice[ngram])
new_sequences.append(new_list)
return new_sequences
# Set parameters:
# 设置参数
# ngram_range = 2 will add bi-grams features
# ngram_range = 2 增加bi-gram(2个词组合)特征
ngram_range = 1
max_features = 20000 # 语料库中总词汇量(有多少个不同的词语)
maxlen = 400 # 每个句子的最大长度(单词数)
batch_size = 32 # 样品训练批次数(每个批次的样本数)
embedding_dims = 50 # 每个词的维度(每个词用多少字段(维度、列)表示)
epochs = 5 # 训练周期数(训练5个周期)
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Average train sequence length: {}'.format(np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(np.mean(list(map(len, x_test)), dtype=int)))
if ngram_range > 1:
print('Adding {}-gram features'.format(ngram_range))
# Create set of unique n-gram from the training set.
# 基于训练集建立唯一的n-gram集合
ngram_set = set()
for input_list in x_train:
for i in range(2, ngram_range + 1):
set_of_ngram = create_ngram_set(input_list, ngram_value=i)
ngram_set.update(set_of_ngram)
# Dictionary mapping n-gram token to a unique integer.
# Integer values are greater than max_features in order
# to avoid collision with existing features.
# 字典将n-gram标记映射为唯一的整数。整数值大于max_features,以避免与现有特征冲突。
start_index = max_features + 1
token_indice = {v: k + start_index for k, v in enumerate(ngram_set)}
indice_token = {token_indice[k]: k for k in token_indice}
# max_features is the highest integer that could be found in the dataset.
# max_features是在数据集中找到的最大整数。
max_features = np.max(list(indice_token.keys())) + 1
# Augmenting x_train and x_test with n-grams features
# 使用n-grams特征扩大x_train和x_test(数据集)
x_train = add_ngram(x_train, token_indice, ngram_range)
x_test = add_ngram(x_test, token_indice, ngram_range)
print('Average train sequence length: {}'.format(np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(np.mean(list(map(len, x_test)), dtype=int)))
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('Build model...')
model = Sequential()
# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
# 从一个有效的嵌入层开始,将的词典索引映射embedding_dims维度。
model.add(Embedding(max_features,
embedding_dims,
input_length=maxlen))
# we add a GlobalAveragePooling1D, which will average the embeddings
# of all words in the document
# 添加GlobalAveragePooling1D,它将平均文档中所有单词的嵌入。
model.add(GlobalAveragePooling1D())
# We project onto a single unit output layer, and squash it with a sigmoid:
# 输出层一个单元节点(一个神经元节点),使用sigmoid(激活函数)处理
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
代码执行
C:\ProgramData\Anaconda3\python.exe E:/keras-master/examples/imdb_fasttext.py
Testing started at 10:01 ...
Loading data...
25000 train sequences
25000 test sequences
Average train sequence length: 238
Average test sequence length: 230
Pad sequences (samples x time)
x_train shape: (25000, 400)
x_test shape: (25000, 400)
Build model...
Train on 25000 samples, validate on 25000 samples
Epoch 1/5
32/25000 [..............................] - ETA: 16:25 - loss: 0.6936 - acc: 0.4062
224/25000 [..............................] - ETA: 2:25 - loss: 0.6934 - acc: 0.4330
448/25000 [..............................] - ETA: 1:15 - loss: 0.6931 - acc: 0.5156
672/25000 [..............................] - ETA: 51s - loss: 0.6928 - acc: 0.5685
736/25000 [..............................] - ETA: 49s - loss: 0.6928 - acc: 0.5679
896/25000 [>.............................] - ETA: 42s - loss: 0.6927 - acc: 0.5614
1088/25000 [>.............................] - ETA: 35s - loss: 0.6924 - acc: 0.5938
1312/25000 [>.............................] - ETA: 30s - loss: 0.6921 - acc: 0.6006
1536/25000 [>.............................] - ETA: 26s - loss: 0.6919 - acc: 0.5977
1728/25000 [=>............................] - ETA: 24s - loss: 0.6917 - acc: 0.6128
1952/25000 [=>............................] - ETA: 21s - loss: 0.6915 - acc: 0.6076
2176/25000 [=>............................] - ETA: 19s - loss: 0.6913 - acc: 0.6039
2336/25000 [=>............................] - ETA: 18s - loss: 0.6911 - acc: 0.6015
2528/25000 [==>...........................] - ETA: 17s - loss: 0.6909 - acc: 0.6060
2752/25000 [==>...........................] - ETA: 16s - loss: 0.6906 - acc: 0.6130
2976/25000 [==>...........................] - ETA: 15s - loss: 0.6903 - acc: 0.6126
3168/25000 [==>...........................] - ETA: 14s - loss: 0.6902 - acc: 0.6089
3392/25000 [===>..........................] - ETA: 14s - loss: 0.6899 - acc: 0.6094
3584/25000 [===>..........................] - ETA: 13s - loss: 0.6898 - acc: 0.6066
3776/25000 [===>..........................] - ETA: 13s - loss: 0.6895 - acc: 0.6054
3968/25000 [===>..........................] - ETA: 12s - loss: 0.6894 - acc: 0.6033
4160/25000 [===>..........................] - ETA: 12s - loss: 0.6892 - acc: 0.6024
4384/25000 [====>.........................] - ETA: 11s - loss: 0.6888 - acc: 0.6058
4608/25000 [====>.........................] - ETA: 11s - loss: 0.6884 - acc: 0.6053
4800/25000 [====>.........................] - ETA: 10s - loss: 0.6882 - acc: 0.6048
4960/25000 [====>.........................] - ETA: 10s - loss: 0.6878 - acc: 0.6071
5184/25000 [=====>........................] - ETA: 10s - loss: 0.6876 - acc: 0.6105
5408/25000 [=====>........................] - ETA: 10s - loss: 0.6870 - acc: 0.6178
5600/25000 [=====>........................] - ETA: 9s - loss: 0.6867 - acc: 0.6229
5792/25000 [=====>........................] - ETA: 9s - loss: 0.6864 - acc: 0.6259
6016/25000 [======>.......................] - ETA: 9s - loss: 0.6860 - acc: 0.6268
6176/25000 [======>.......................] - ETA: 9s - loss: 0.6857 - acc: 0.6299
6368/25000 [======>.......................] - ETA: 8s - loss: 0.6854 - acc: 0.6329
6560/25000 [======>.......................] - ETA: 8s - loss: 0.6850 - acc: 0.6357
6752/25000 [=======>......................] - ETA: 8s - loss: 0.6846 - acc: 0.6371
6976/25000 [=======>......................] - ETA: 8s - loss: 0.6840 - acc: 0.6408
7200/25000 [=======>......................] - ETA: 8s - loss: 0.6835 - acc: 0.6425
7328/25000 [=======>......................] - ETA: 7s - loss: 0.6832 - acc: 0.6427
7520/25000 [========>.....................] - ETA: 7s - loss: 0.6828 - acc: 0.6434
7680/25000 [========>.....................] - ETA: 7s - loss: 0.6825 - acc: 0.6440
7872/25000 [========>.....................] - ETA: 7s - loss: 0.6820 - acc: 0.6448
8064/25000 [========>.....................] - ETA: 7s - loss: 0.6816 - acc: 0.6465
8256/25000 [========>.....................] - ETA: 7s - loss: 0.6811 - acc: 0.6496
8480/25000 [=========>....................] - ETA: 7s - loss: 0.6805 - acc: 0.6528
8640/25000 [=========>....................] - ETA: 6s - loss: 0.6800 - acc: 0.6556
8864/25000 [=========>....................] - ETA: 6s - loss: 0.6794 - acc: 0.6585
9056/25000 [=========>....................] - ETA: 6s - loss: 0.6788 - acc: 0.6619
9280/25000 [==========>...................] - ETA: 6s - loss: 0.6779 - acc: 0.6655
9472/25000 [==========>...................] - ETA: 6s - loss: 0.6774 - acc: 0.6665
9664/25000 [==========>...................] - ETA: 6s - loss: 0.6769 - acc: 0.6684
9856/25000 [==========>...................] - ETA: 6s - loss: 0.6762 - acc: 0.6704
10080/25000 [===========>..................] - ETA: 6s - loss: 0.6756 - acc: 0.6725
10240/25000 [===========>..................] - ETA: 5s - loss: 0.6751 - acc: 0.6732
10432/25000 [===========>..................] - ETA: 5s - loss: 0.6745 - acc: 0.6750
10656/25000 [===========>..................] - ETA: 5s - loss: 0.6737 - acc: 0.6759
10848/25000 [============>.................] - ETA: 5s - loss: 0.6731 - acc: 0.6773
11008/25000 [============>.................] - ETA: 5s - loss: 0.6728 - acc: 0.6774
11168/25000 [============>.................] - ETA: 5s - loss: 0.6722 - acc: 0.6786
11328/25000 [============>.................] - ETA: 5s - loss: 0.6715 - acc: 0.6802
11520/25000 [============>.................] - ETA: 5s - loss: 0.6707 - acc: 0.6823
11712/25000 [=============>................] - ETA: 5s - loss: 0.6699 - acc: 0.6841
11904/25000 [=============>................] - ETA: 5s - loss: 0.6690 - acc: 0.6858
12128/25000 [=============>................] - ETA: 4s - loss: 0.6685 - acc: 0.6868
12352/25000 [=============>................] - ETA: 4s - loss: 0.6677 - acc: 0.6887
12576/25000 [==============>...............] - ETA: 4s - loss: 0.6666 - acc: 0.6910
12768/25000 [==============>...............] - ETA: 4s - loss: 0.6658 - acc: 0.6926
12960/25000 [==============>...............] - ETA: 4s - loss: 0.6652 - acc: 0.6933
13184/25000 [==============>...............] - ETA: 4s - loss: 0.6646 - acc: 0.6927
13408/25000 [===============>..............] - ETA: 4s - loss: 0.6638 - acc: 0.6932
13600/25000 [===============>..............] - ETA: 4s - loss: 0.6630 - acc: 0.6945
13824/25000 [===============>..............] - ETA: 4s - loss: 0.6621 - acc: 0.6960
14048/25000 [===============>..............] - ETA: 4s - loss: 0.6612 - acc: 0.6973
14240/25000 [================>.............] - ETA: 3s - loss: 0.6604 - acc: 0.6988
14432/25000 [================>.............] - ETA: 3s - loss: 0.6597 - acc: 0.6997
14624/25000 [================>.............] - ETA: 3s - loss: 0.6589 - acc: 0.7002
14784/25000 [================>.............] - ETA: 3s - loss: 0.6583 - acc: 0.7011
15008/25000 [=================>............] - ETA: 3s - loss: 0.6574 - acc: 0.7022
15200/25000 [=================>............] - ETA: 3s - loss: 0.6567 - acc: 0.7032
15424/25000 [=================>............] - ETA: 3s - loss: 0.6556 - acc: 0.7052
15616/25000 [=================>............] - ETA: 3s - loss: 0.6548 - acc: 0.7061
15808/25000 [=================>............] - ETA: 3s - loss: 0.6539 - acc: 0.7070
16000/25000 [==================>...........] - ETA: 3s - loss: 0.6531 - acc: 0.7084
16224/25000 [==================>...........] - ETA: 3s - loss: 0.6522 - acc: 0.7095
16448/25000 [==================>...........] - ETA: 3s - loss: 0.6512 - acc: 0.7112
16640/25000 [==================>...........] - ETA: 2s - loss: 0.6501 - acc: 0.7120
16864/25000 [===================>..........] - ETA: 2s - loss: 0.6490 - acc: 0.7135
17088/25000 [===================>..........] - ETA: 2s - loss: 0.6479 - acc: 0.7149
17280/25000 [===================>..........] - ETA: 2s - loss: 0.6469 - acc: 0.7164
17472/25000 [===================>..........] - ETA: 2s - loss: 0.6462 - acc: 0.7168
17664/25000 [====================>.........] - ETA: 2s - loss: 0.6456 - acc: 0.7174
17888/25000 [====================>.........] - ETA: 2s - loss: 0.6444 - acc: 0.7189
18112/25000 [====================>.........] - ETA: 2s - loss: 0.6434 - acc: 0.7197
18272/25000 [====================>.........] - ETA: 2s - loss: 0.6429 - acc: 0.7195
18432/25000 [=====================>........] - ETA: 2s - loss: 0.6424 - acc: 0.7196
18592/25000 [=====================>........] - ETA: 2s - loss: 0.6418 - acc: 0.7201
18784/25000 [=====================>........] - ETA: 2s - loss: 0.6411 - acc: 0.7209
18944/25000 [=====================>........] - ETA: 2s - loss: 0.6403 - acc: 0.7218
19136/25000 [=====================>........] - ETA: 2s - loss: 0.6393 - acc: 0.7228
19232/25000 [======================>.......] - ETA: 1s - loss: 0.6388 - acc: 0.7231
19456/25000 [======================>.......] - ETA: 1s - loss: 0.6379 - acc: 0.7238
19648/25000 [======================>.......] - ETA: 1s - loss: 0.6370 - acc: 0.7248
19840/25000 [======================>.......] - ETA: 1s - loss: 0.6363 - acc: 0.7255
20000/25000 [=======================>......] - ETA: 1s - loss: 0.6356 - acc: 0.7258
20192/25000 [=======================>......] - ETA: 1s - loss: 0.6346 - acc: 0.7269
20416/25000 [=======================>......] - ETA: 1s - loss: 0.6334 - acc: 0.7278
20640/25000 [=======================>......] - ETA: 1s - loss: 0.6323 - acc: 0.7290
20864/25000 [========================>.....] - ETA: 1s - loss: 0.6312 - acc: 0.7300
21056/25000 [========================>.....] - ETA: 1s - loss: 0.6303 - acc: 0.7308
21248/25000 [========================>.....] - ETA: 1s - loss: 0.6294 - acc: 0.7313
21440/25000 [========================>.....] - ETA: 1s - loss: 0.6284 - acc: 0.7324
21664/25000 [========================>.....] - ETA: 1s - loss: 0.6271 - acc: 0.7332
21824/25000 [=========================>....] - ETA: 1s - loss: 0.6267 - acc: 0.7336
21984/25000 [=========================>....] - ETA: 1s - loss: 0.6260 - acc: 0.7339
22176/25000 [=========================>....] - ETA: 0s - loss: 0.6252 - acc: 0.7346
22400/25000 [=========================>....] - ETA: 0s - loss: 0.6241 - acc: 0.7352
22624/25000 [==========================>...] - ETA: 0s - loss: 0.6230 - acc: 0.7361
22784/25000 [==========================>...] - ETA: 0s - loss: 0.6223 - acc: 0.7365
22912/25000 [==========================>...] - ETA: 0s - loss: 0.6217 - acc: 0.7369
23136/25000 [==========================>...] - ETA: 0s - loss: 0.6208 - acc: 0.7374
23360/25000 [===========================>..] - ETA: 0s - loss: 0.6200 - acc: 0.7380
23584/25000 [===========================>..] - ETA: 0s - loss: 0.6191 - acc: 0.7386
23808/25000 [===========================>..] - ETA: 0s - loss: 0.6178 - acc: 0.7395
24000/25000 [===========================>..] - ETA: 0s - loss: 0.6170 - acc: 0.7402
24224/25000 [============================>.] - ETA: 0s - loss: 0.6158 - acc: 0.7412
24416/25000 [============================>.] - ETA: 0s - loss: 0.6148 - acc: 0.7419
24608/25000 [============================>.] - ETA: 0s - loss: 0.6140 - acc: 0.7425
24800/25000 [============================>.] - ETA: 0s - loss: 0.6131 - acc: 0.7429
25000/25000 [==============================] - 10s 389us/step - loss: 0.6123 - acc: 0.7436 - val_loss: 0.5048 - val_acc: 0.8227
Epoch 2/5
32/25000 [..............................] - ETA: 6s - loss: 0.4693 - acc: 0.7812
224/25000 [..............................] - ETA: 6s - loss: 0.5006 - acc: 0.8080
24864/25000 [============================>.] - ETA: 0s - loss: 0.2209 - acc: 0.9246
25000/25000 [==============================] - 8s 334us/step - loss: 0.2206 - acc: 0.9248 - val_loss: 0.2843 - val_acc: 0.8879
Failure
Expected :{(4, 9), (4, 1), (1, 4), (9, 4)}
Actual :{(4, 1), (9, 4), (4, 9), (1, 4)}
**********************************************************************
File "E:/keras-master/examples/imdb_fasttext.py", line 28, in imdb_fasttext.create_ngram_set
Failed example:
create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=2)
Expected:
{(4, 9), (4, 1), (1, 4), (9, 4)}
Got:
{(4, 1), (9, 4), (4, 9), (1, 4)}
Failure
Expected :[(1, 4, 9), (4, 9, 4), (9, 4, 1), (4, 1, 4)]
Actual :{(4, 1, 4), (4, 9, 4), (1, 4, 9), (9, 4, 1)}
**********************************************************************
File "E:/keras-master/examples/imdb_fasttext.py", line 31, in imdb_fasttext.create_ngram_set
Failed example:
create_ngram_set([1, 4, 9, 4, 1, 4], ngram_value=3)
Expected:
[(1, 4, 9), (4, 9, 4), (9, 4, 1), (4, 1, 4)]
Got:
{(4, 1, 4), (4, 9, 4), (1, 4, 9), (9, 4, 1)}
Process finished with exit code 0
Keras详细介绍
英文:https://keras.io/
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。