基于IMDB(Internet Movie Database,互联网电影数据库)使用一维度卷积文本分类
Keras开发包文件目录
Keras实例文件目录
数据集下载 imdb.npz(网络问题官网下载不了,可用)
IMDB详解
关于IMDB,-------Internet Movie Database(互联网电影数据库) 推荐
https://blog.csdn.net/zubin006/article/details/2245716
IMDB 5000 Movie Dataset(来自IMDB的5000个电影的数据集)
https://blog.csdn.net/greenlight_74110/article/details/77898307
IMDB电影评分正负数据集(3个版本的imdb_full.pkl,imdb.pkl以及imdb.npz)
https://download.csdn.net/download/baoyan2015/10213321
http://keras-cn.readthedocs.io/en/latest/other/datasets/
代码注释
imdb_cnn.py(点击查看原文)
'''This example demonstrates the use of Convolution1D for text classification.
基于IMDB(数据库)使用一维度卷积文本分类
Gets to 0.89 test accuracy after 2 epochs.
2个周期后89%精确度
90s/epoch on Intel i5 2.4Ghz CPU.
90秒/周期,intel i5 2.4Ghz CPU
10s/epoch on Tesla K40 GPU.
10秒/周期,Tesla K40 GPU
'''
from __future__ import print_function
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D
from keras.datasets import imdb
# set parameters:
# 设置参数
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 2
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('Build model...')
model = Sequential()
# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
# 从一个有效的嵌入层开始,该层映射词索引到embedding_dims维度
model.add(Embedding(max_features,
embedding_dims,
input_length=maxlen))
model.add(Dropout(0.2))
# we add a Convolution1D, which will learn filters
# word group filters of size filter_length:
# 添加一维卷积层,
model.add(Conv1D(filters,
kernel_size,
padding='valid',
activation='relu',
strides=1))
# we use max pooling:
# 池化处理
model.add(GlobalMaxPooling1D())
# We add a vanilla hidden layer:
# 添加vanilla(多层感知器)隐藏层
# 多层感知器有时被俗称为“香草”。参考:https://en.wikipedia.org/wiki/Multilayer_perceptron
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))
# We project onto a single unit output layer, and squash it with a sigmoid:
# 对应单个单元(神经元节点)的输出层,使用sigmod函数处理后输出
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
代码执行
C:\ProgramData\Anaconda3\python.exe E:/keras-master/examples/imdb_cnn.py
Using TensorFlow backend.
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 400)
x_test shape: (25000, 400)
Build model...
Train on 25000 samples, validate on 25000 samples
Epoch 1/2
32/25000 [..............................] - ETA: 39:26 - loss: 0.6986 - acc: 0.4688
160/25000 [..............................] - ETA: 8:00 - loss: 0.6962 - acc: 0.4250
24480/25000 [============================>.] - ETA: 0s - loss: 0.4068 - acc: 0.7977
24608/25000 [============================>.] - ETA: 0s - loss: 0.4057 - acc: 0.7984
24704/25000 [============================>.] - ETA: 0s - loss: 0.4053 - acc: 0.7987
24832/25000 [============================>.] - ETA: 0s - loss: 0.4048 - acc: 0.7991
24960/25000 [============================>.] - ETA: 0s - loss: 0.4044 - acc: 0.7993
25000/25000 [==============================] - 19s 754us/step - loss: 0.4044 - acc: 0.7994 - val_loss: 0.3281 - val_acc: 0.8565
Epoch 2/2
32/25000 [..............................] - ETA: 11s - loss: 0.2567 - acc: 0.9062
160/25000 [..............................] - ETA: 11s - loss: 0.2439 - acc: 0.9125
288/25000 [..............................] - ETA: 11s - loss: 0.2417 - acc: 0.9097
416/25000 [..............................] - ETA: 11s - loss: 0.2196 - acc: 0.9135
23616/25000 [===========================>..] - ETA: 0s - loss: 0.2307 - acc: 0.9086
23712/25000 [===========================>..] - ETA: 0s - loss: 0.2308 - acc: 0.9085
23840/25000 [===========================>..] - ETA: 0s - loss: 0.2304 - acc: 0.9086
23968/25000 [===========================>..] - ETA: 0s - loss: 0.2302 - acc: 0.9086
24096/25000 [===========================>..] - ETA: 0s - loss: 0.2299 - acc: 0.9087
24224/25000 [============================>.] - ETA: 0s - loss: 0.2300 - acc: 0.9087
24320/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9087
24448/25000 [============================>.] - ETA: 0s - loss: 0.2298 - acc: 0.9086
24576/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9087
24704/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9088
24832/25000 [============================>.] - ETA: 0s - loss: 0.2300 - acc: 0.9086
24960/25000 [============================>.] - ETA: 0s - loss: 0.2306 - acc: 0.9083
25000/25000 [==============================] - 15s 606us/step - loss: 0.2306 - acc: 0.9083 - val_loss: 0.2987 - val_acc: 0.8752
Process finished with exit code 0
Keras相关资料
英文:https://keras.io/
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。