【我的python机器学习之路·5】几个常见问题的总结和电影评论分类

本系列日记GitHub:

https://github.com/zhengyuv/MyPyMLRoad

欢迎follow和star。

1.TensorFlow和keras读取数据集的一个小区别

使用keras时load_data()函数返回的是元组,所以应该加括号:

2.np.array()与np.asarray()

参考:https://blog.csdn.net/JNingWei/article/details/78811259

3.np.dtype()与np.astype()

参考:https://blog.csdn.net/A632189007/article/details/77989287

4.Keras中调用model.fit()会返回一个History对象,这个对象有一个成员history,它是一个字典,包含训练过程中的所有数据。

【我的python机器学习之路·5】几个常见问题的总结和电影评论分类_第1张图片

5.plt画图之legend()

显示图例

6.电影评论分类(二分类问题)

代码:

# -*- coding: utf-8 -*-
"""
Created on Mon Nov 26 16:34:29 2018

@author: zhengyuv
"""

from keras.datasets import imdb
import numpy as np
from keras import models
from keras import layers
import matplotlib.pyplot as plt

#read dataset
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(path='D:\datasets\imdb.npz', num_words=10000)
#print(train_data[0])
#print(len(train_data))
#print(len(test_data))

#vectorize the data
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences),dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

#model definition
model = models.Sequential()
model.add(layers.Dense(16,activation='relu',input_shape=(10000,)))
model.add(layers.Dense(16,activation='relu'))
model.add(layers.Dense(1,activation='sigmoid'))

#model compile
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

#split dataset
x_val = x_train[:10000]
x_train = x_train[10000:]
y_val = y_train[:10000]
y_train = y_train[10000:]

#train model
history = model.fit(x_train,
                    y_train,
                    epochs=10,
                    batch_size=512,
                    validation_data=(x_val,y_val))

#plot
history_dict = history.history
loss = history_dict['loss']
val_loss = history_dict['val_loss']
acc = history_dict['acc']
val_acc = history_dict['val_acc']

epochs = range(1, len(loss)+1)

plt.plot(epochs, loss, 'bo', label='training loss')
plt.plot(epochs, val_loss, 'r', label='validation loss')
plt.title('training and validation loss')
plt.xlabel('epochs')
plt.ylabel('loss')
plt.legend()
plt.show()

plt.clf()
plt.plot(epochs, acc,'bo', label='training accuracy')
plt.plot(epochs, val_acc, 'r', label='validation accuracy')
plt.title('training and validation accuracy')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend()
plt.show()

运行结果:

Using TensorFlow backend.
Train on 15000 samples, validate on 10000 samples
Epoch 1/10
15000/15000 [==============================] - 8s 554us/step - loss: 0.5084 - acc: 0.7813 - val_loss: 0.3797 - val_acc: 0.8684
Epoch 2/10
15000/15000 [==============================] - 3s 212us/step - loss: 0.3004 - acc: 0.9047 - val_loss: 0.3004 - val_acc: 0.8897
Epoch 3/10
15000/15000 [==============================] - 3s 216us/step - loss: 0.2179 - acc: 0.9285 - val_loss: 0.3085 - val_acc: 0.8711
Epoch 4/10
15000/15000 [==============================] - 3s 205us/step - loss: 0.1750 - acc: 0.9437 - val_loss: 0.2840 - val_acc: 0.8832
Epoch 5/10
15000/15000 [==============================] - 3s 206us/step - loss: 0.1427 - acc: 0.9543 - val_loss: 0.2841 - val_acc: 0.8872
Epoch 6/10
15000/15000 [==============================] - 3s 205us/step - loss: 0.1150 - acc: 0.9650 - val_loss: 0.3166 - val_acc: 0.8772
Epoch 7/10
15000/15000 [==============================] - 3s 204us/step - loss: 0.0980 - acc: 0.9705 - val_loss: 0.3127 - val_acc: 0.8846
Epoch 8/10
15000/15000 [==============================] - 3s 211us/step - loss: 0.0807 - acc: 0.9763 - val_loss: 0.3859 - val_acc: 0.8649
Epoch 9/10
15000/15000 [==============================] - 3s 207us/step - loss: 0.0661 - acc: 0.9821 - val_loss: 0.3635 - val_acc: 0.8782
Epoch 10/10
15000/15000 [==============================] - 3s 226us/step - loss: 0.0561 - acc: 0.9853 - val_loss: 0.3843 - val_acc: 0.8792

【我的python机器学习之路·5】几个常见问题的总结和电影评论分类_第2张图片

【我的python机器学习之路·5】几个常见问题的总结和电影评论分类_第3张图片

从损失图可以看出,大概第5个epoch后开始过拟合。

 

你可能感兴趣的:(学习日记,我的python机器学习之路)