在“卷积神经网络”中我们探究了如何使用二维卷积神经网络来处理二维图像数据。在之前的语言模型和文本分类任务中,我们将文本数据看作是只有一个维度的时间序列,并很自然地使用循环神经网络来表征这样的数据。其实,我们也可以将文本当作一维图像,从而可以用一维卷积神经网络来捕捉临近词之间的关联。
TextCNN 是利用卷积神经网络对文本进行分类的算法,由 Yoon Kim 在 《Convolutional Neural Networks for Sentence Classification》中提出。
TextCNN结构图:
第一层将单词嵌入到低维矢量中。下一层使用多个过滤器大小对嵌入的单词向量执行卷积。例如,一次滑动3,4或5个单词。接下来,将卷积层的结果最大池化为一个长特征向量,添加dropout正则,并使用softmax对结果进行分类。与传统图像的CNN网络相比, textCNN 在网络结构上没有任何变化(甚至更加简单了), 从图中可以看出textCNN 其实只有一层卷积,一层max-pooling, 最后将输出外接softmax 来n分类。
与图像当中CNN的网络相比,textCNN 最大的不同便是在输入数据的不同:
如下图所示, textCNN 首先将 “今天天气很好,出来玩” 分词成"今天/天气/很好/,/出来/玩, 通过word2vec或者GLOV 等embedding 方式将每个词成映射成一个5维(维数可以自己指定)词向量, 如 “今天” -> [0,0,0,0,1], “天气” ->[0,0,0,1,0], “很好” ->[0,0,1,0,0]等等。
这样做的好处主要是将自然语言数值化,方便后续的处理。从这里也可以看出不同的映射方式对最后的结果是会产生巨大的影响, nlp 当中目前最火热的研究方向便是如何将自然语言映射成更好的词向量。我们构建完词向量后,将所有的词向量拼接起来构成一个6*5的二维矩阵,作为最初的输入
卷积是一种数学算子。我们用一个简单的例子来说明一下
step.1 将 “今天”/“天气”/“很好”/"," 对应的4*5 矩阵 与卷积核做一个point wise 的乘法然后求和, 便是卷积操作:
feature_map[0] =0*1 + 0*0 + 0*1 + 0*0 + 1*0 + //(第一行)
0*0 + 0*0 + 0*0 + 1*0 + 0*0 + //(第二行)
0*1 + 0*0 + 1*1 + 0*0 + 0*0 + //(第三行)
0*1 + 1*0 + 0*1 + 0*0 + 0*0 //(第四行)
= 1
step.2 将窗口向下滑动一格(滑动的距离可以自己设置),“天气”/“很好”/","/“出来” 对应的4*5 矩阵 与卷积核(权值不变) 继续做point wise 乘法后求和
feature_map[1] =0*1 + 0*0 + 0*1 + 1*0 + 0*0 + //(第一行)
0*0 + 0*0 + 1*0 + 0*0 + 0*0 + //(第二行)
0*1 + 1*0 + 0*1 + 0*0 + 0*0 + //(第三行)
1*1 + 0*0 + 0*1 + 0*0 + 0*0 //(第四行)
= 1
step.3 将窗口向下滑动一格(滑动的距离可以自己设置) “很好”/","/“出来”/“玩” 对应的4*5 矩阵 与卷积核(权值不变) 继续做point wise 乘法后求和
feature_map[2] = 0*1 + 0*0 + 1*1 + 1*0 + 0*0 + //(第一行)
0*0 + 1*0 + 0*0 + 0*0 + 0*0 + //(第二行)
1*1 + 0*0 + 0*1 + 0*0 + 0*0 + //(第三行)
0*1 + 0*0 + 0*1 + 1*0 + 1*0 //(第四行)
= 2
feature_map 便是卷积之后的输出, 通过卷积操作 将输入的65 矩阵映射成一个 31 的矩阵,这个映射过程和特征抽取的结果很像,于是便将最后的输出称作feature map。一般来说在卷积之后会跟一个激活函数,在这里为了简化说明需要,我们将激活函数设置为f(x) = x
在CNN 中常常会提到一个词channel, 图三 中 深红矩阵与 浅红矩阵 便构成了两个channel 统称一个卷积核, 从这个图中也可以看出每个channel 不必严格一样, 每个4*5 矩阵与输入矩阵做一次卷积操作得到一个feature map. 在计算机视觉中,由于彩色图像存在 R, G, B 三种颜色, 每个颜色便代表一种channel。
根据原论文作者的描述, 一开始引入channel 是希望防止过拟合(通过保证学习到的vectors 不要偏离输入太多)来在小数据集合获得比单channel更好的表现,后来发现其实直接使用正则化效果更好。
不过使用多channel 相比与单channel, 每个channel 可以使用不同的word embedding, 比如可以在no-static(梯度可以反向传播) 的channel 来fine tune 词向量,让词向量更加适用于当前的训练。
对于channel在textCNN 是否有用, 从论文的实验结果来看多channels并没有明显提升模型的分类能力, 七个数据集上的五个数据集 单channel 的textCNN 表现都要优于 多channels的textCNN。
得到feamap = [1,1,2] 后, 从中选取一个最大值[2] 作为输出, 便是max-pooling。max-pooling 在保持主要特征的情况下, 大大降低了参数的数目, 从图五中可以看出 feature map 从三维变成了一维, 好处有如下两点:
pooling 本身无法带来平移不变性(图片有个字母A, 这个字母A 无论出现在图片的哪个位置, 在CNN的网络中都可以识别出来),卷积核的权值共享才能.
max-pooling的原理主要是从多个值中取一个最大值,做不到这一点。cnn 能够做到平移不变性,是因为在滑动卷积核的时候,使用的卷积核权值是保持固定的(权值共享), 假设这个卷积核被训练的就能识别字母A, 当这个卷积核在整张图片上滑动的时候,当然可以把整张图片的A都识别出来。
如上图所示, 我们将 max-pooling的结果拼接起来, 送入到softmax当中, 得到各个类别比如 label 为1 的概率以及label 为-1的概率。如果是预测的话,到这里整个textCNN的流程遍结束了。
如果是训练的话,此时便会根据预测label以及实际label来计算损失函数, 计算出softmax 函数,max-pooling 函数, 激活函数以及卷积核函数 四个函数当中参数需要更新的梯度, 来依次更新这四个函数中的参数,完成一轮训练 。
本次我们介绍的textCNN是一个应用了CNN网络的文本分类模型。
RNN与CNN比较-Seq2Seq的模型
CNN可以并行化,也就是可以并行训练和执行. 但RNN不行。
CNN可以经过多层的叠加实现和RNN一样的长序列依赖
使用Tensorflow的Keras搭建卷积神经网络来进行情感分析
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow as tf
import os
import numpy as np
max_word = 400
def get_dataset():
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data()
print(len(x_train[0]))
print(x_train[0])
print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)
X_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_word, padding='post')
X_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_word, padding='post')
print(X_train[0])
print(np.max(X_train[0]))
print(X_train.shape, ' ', y_train.shape)
print(X_test.shape, ' ', y_test.shape)
return X_train, y_train, X_test, y_test
def my_model(X_train):
vocab_size = np.max([np.max(X_train[i]) for i in range(X_train.shape[0])]) + 1 # 这里1 代表空格,其索引被认为是0。
print(vocab_size)
inputs = keras.Input(shape=(400), name='img')
x = keras.layers.Embedding(input_dim=vocab_size, output_dim=64, input_length=max_word)(inputs)
x = keras.layers.Reshape((400, 64,1), input_shape=(400,64))(x)
x1 = keras.layers.Conv2D(filters=32, kernel_size=(5, 64), padding='valid', activation='relu')(x)
x1 = layers.BatchNormalization()(x1)
x1 = keras.layers.GlobalMaxPool2D()(x1)
x2 = keras.layers.Conv2D(filters=32, kernel_size=(4, 64), padding='valid', activation='relu')(x)
x2 = layers.BatchNormalization()(x2)
x2 = keras.layers.GlobalMaxPool2D()(x2)
x3 = keras.layers.Conv2D(filters=32, kernel_size=(3, 64), padding='valid', activation='relu')(x)
x3 = layers.BatchNormalization()(x3)
x3 = keras.layers.GlobalMaxPool2D()(x3)
x= layers.Concatenate(axis=1)([x1, x2, x3])
outputs = layers.Dense(2, activation='softmax')(x)
model = keras.Model(inputs, outputs, name='txt_cnn')
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(),
#metrics=['accuracy'])
metrics=[keras.metrics.SparseCategoricalAccuracy()])
#model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
return model
def train_my_model(model, X_train, y_train):
if os.path.isfile('./weights_cnn2/model.h5'):
print('load weight')
model.load_weights('./weights_cnn2/model.h5')
def save_weight(epoch, logs):
print('save_weight', epoch, logs)
model.save_weights('./weights_cnn2/model.h5')
batch_print_callback = keras.callbacks.LambdaCallback(
on_epoch_end=save_weight
)
callbacks = [
tf.keras.callbacks.EarlyStopping(patience=4, monitor='loss'),
batch_print_callback,
# keras.callbacks.ModelCheckpoint('./weights/model.h5', save_best_only=True),
tf.keras.callbacks.TensorBoard(log_dir='logs_cnn2')
]
history = model.fit(X_train, y_train, batch_size=100, epochs=20, validation_split=0.1, callbacks=callbacks)
import matplotlib.pyplot as plt
plt.plot(history.history['sparse_categorical_accuracy'])
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.legend(['training', 'validation'], loc='upper left')
plt.show()
def test_my_module(model, X_test, y_test):
if os.path.isfile('./weights_cnn2/model.h5'):
print('load weight')
model.load_weights('./weights_cnn2/model.h5')
scores = model.evaluate(X_test, y_test)
print(scores)
if __name__ == '__main__':
X_train, y_train, x_test, y_test = get_dataset()
model = my_model(X_train)
train_my_model(model, X_train, y_train)
#print(x_test.shape)
#print(y_test.shape)
test_my_module(model,x_test, y_test)
训练集准确率100%, 验证集上准确率89.16%
22200/22500 [============================>.] - ETA: 0s - loss: 3.9943e-05 - sparse_categorical_accuracy: 1.0000
22300/22500 [============================>.] - ETA: 0s - loss: 3.9917e-05 - sparse_categorical_accuracy: 1.0000
22400/22500 [============================>.] - ETA: 0s - loss: 3.9868e-05 - sparse_categorical_accuracy: 1.0000save_weight 19 {
'loss': 3.9842186953238625e-05, 'sparse_categorical_accuracy': 1.0, 'val_loss': 0.4204959559440613, 'val_sparse_categorical_accuracy': 0.8916}
22500/22500 [==============================] - 73s 3ms/sample - loss: 3.9842e-05 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4205 - val_sparse_categorical_accuracy: 0.8916
用下面命令查看训练的整个过程。
tensorboard --logdir=logs_cnns
测试集上准确率是89%
24768/25000 [============================>.] - ETA: 0s - loss: 0.4107 - sparse_categorical_accuracy: 0.8901
24960/25000 [============================>.] - ETA: 0s - loss: 0.4110 - sparse_categorical_accuracy: 0.8901
25000/25000 [==============================] - 8s 311us/sample - loss: 0.4109 - sparse_categorical_accuracy: 0.8902
[0.4108552382296324, 0.89016]
关于生成训练的数据集。我之前的博客有写,请看下面这一篇我的博客。
[深度学习TF2][RNN-LSTM]文本情感分析包含(数据预处理-训练-预测)
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow as tf
import os
import numpy as np
root_folder = '.\cnn3'
def get_dataset():
train_set = np.load('./train_data_new1/train.npz')
X_train = train_set['x']
y_train = train_set['y']
test_set = np.load('./train_data_new1/test.npz')
X_test = test_set['x']
y_test = test_set['y']
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)
print("X_test:", X_test.shape)
print("y_test:", y_test.shape)
return X_train, y_train, X_test, y_test
def my_model():
embedding_matrix = np.load('./train_data_new1/embedding_matrix.npy')
inputs = keras.Input(shape=(200))
x = layers.Embedding(input_dim=30001, output_dim=50, input_length=200, weights=[embedding_matrix])(inputs)
x = keras.layers.Reshape((200, 50,1), input_shape=(200,50))(x)
x1 = keras.layers.Conv2D(filters=32, kernel_size=(5, 50), padding='valid', activation='relu')(x)
x1 = layers.BatchNormalization()(x1)
x1 = keras.layers.GlobalMaxPool2D()(x1)
x2 = keras.layers.Conv2D(filters=32, kernel_size=(4, 50), padding='valid', activation='relu')(x)
x2 = layers.BatchNormalization()(x2)
x2 = keras.layers.GlobalMaxPool2D()(x2)
x3 = keras.layers.Conv2D(filters=32, kernel_size=(3, 50), padding='valid', activation='relu')(x)
x3 = layers.BatchNormalization()(x3)
x3 = keras.layers.GlobalMaxPool2D()(x3)
x= layers.Concatenate(axis=1)([x1, x2, x3])
outputs = layers.Dense(2, activation='softmax')(x)
model = keras.Model(inputs, outputs, name='txt_cnn')
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(),
#metrics=['accuracy'])
metrics=[keras.metrics.SparseCategoricalAccuracy()])
#keras.utils.plot_model(model, 'textcnn.png', show_shapes=True)
#model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
return model
def train_my_model(model, X_train, y_train):
if os.path.isfile(root_folder+'\model.h5'):
print('load weight')
model.load_weights(root_folder+'\model.h5')
def save_weight(epoch, logs):
print('save_weight', epoch, logs)
model.save_weights(root_folder+'\model.h5')
batch_print_callback = keras.callbacks.LambdaCallback(
on_epoch_end=save_weight
)
callbacks = [
tf.keras.callbacks.EarlyStopping(patience=4, monitor='loss'),
batch_print_callback,
# keras.callbacks.ModelCheckpoint('./weights/model.h5', save_best_only=True),
tf.keras.callbacks.TensorBoard(log_dir=root_folder+'\logs')
]
history = model.fit(X_train, y_train, batch_size=100, epochs=20, validation_split=0.1, callbacks=callbacks)
import matplotlib.pyplot as plt
plt.plot(history.history['sparse_categorical_accuracy'])
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.legend(['training_accuracy', 'valivation_accuracy'], loc='upper left')
plt.show()
def test_my_module(model, X_test, y_test):
if os.path.isfile(root_folder+'\model.h5'):
print('load weight')
model.load_weights(root_folder+'\model.h5')
scores = model.evaluate(X_test, y_test)
print(scores)
def predict_my_module(model):
small_word_index = np.load('./train_data_new1/small_word_index.npy', allow_pickle=True)
review_index = np.zeros((1, 200), dtype=int)
review = "I don't like it"
#review = "this is bad movie "
#review = "This is good movie"
#review = "This isn‘t good movie"
#review = "i think this is bad movie"
counter = 0
for word in review.split():
try:
print(word, small_word_index.item()[word])
review_index[0][counter] = small_word_index.item()[word]
counter = counter + 1
except Exception:
print('Word error', word)
print(review_index.shape)
s = model.predict(x=review_index)
print(s)
if __name__ == '__main__':
X_train, y_train, x_test, y_test = get_dataset()
model = my_model()
train_my_model(model, X_train, y_train)
#print(x_test.shape)
#print(y_test.shape)
test_my_module(model,x_test, y_test)
#predict_my_module(model)
训练集上结果
测试集结果是86.7%
14112/15000 [===========================>..] - ETA: 0s - loss: 0.7871 - sparse_categorical_accuracy: 0.8673
15000/15000 [==============================] - 1s 76us/sample - loss: 0.7860 - sparse_categorical_accuracy: 0.8675
[0.7860029630283515, 0.8675333]
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
import tensorflow as tf
import os
import numpy as np
max_word = 400
def get_dataset():
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data()
print(len(x_train[0]))
print(x_train[0])
print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)
X_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_word, padding='post')
X_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_word, padding='post')
print(X_train[0])
print(np.max(X_train[0]))
print(X_train.shape, ' ', y_train.shape)
print(X_test.shape, ' ', y_test.shape)
return X_train, y_train, X_test, y_test
def my_model(X_train):
vocab_size = np.max([np.max(X_train[i]) for i in range(X_train.shape[0])]) + 1 # 这里1 代表空格,其索引被认为是0。
print(vocab_size)
model = keras.Sequential([
layers.Embedding(input_dim=vocab_size, output_dim=64, input_length=max_word),
keras.layers.Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'),
keras.layers.MaxPooling1D(pool_size=2),
keras.layers.Dropout(0.25),
keras.layers.Conv1D(filters=128, kernel_size=3, padding='same', activation='relu'),
keras.layers.MaxPooling1D(pool_size=2),
keras.layers.Dropout(0.25),
keras.layers.Flatten(),
keras.layers.Dense(64, activation='relu'),
keras.layers. Dense(32, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
#print(model.summary())
return model
def train_my_model(model, X_train, y_train):
if os.path.isfile('./weights_cnn/model.h5'):
print('load weight')
model.load_weights('./weights_cnn/model.h5')
def save_weight(epoch, logs):
print('save_weight', epoch, logs)
model.save_weights('./weights_cnn/model.h5')
batch_print_callback = keras.callbacks.LambdaCallback(
on_epoch_end=save_weight
)
callbacks = [
tf.keras.callbacks.EarlyStopping(patience=4, monitor='loss'),
batch_print_callback,
# keras.callbacks.ModelCheckpoint('./weights/model.h5', save_best_only=True),
tf.keras.callbacks.TensorBoard(log_dir='logs_cnn')
]
history = model.fit(X_train, y_train, batch_size=100, epochs=20, validation_split=0.1, callbacks=callbacks)
import matplotlib.pyplot as plt
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training', 'valivation'], loc='upper left')
plt.show()
def test_my_module(model, X_test, y_test):
if os.path.isfile('./weights_cnn/model.h5'):
print('load weight')
model.load_weights('./weights_cnn/model.h5')
scores = model.evaluate(X_test, y_test)
print(scores)
if __name__ == '__main__':
X_train, y_train, x_test, y_test = get_dataset()
model = my_model(X_train)
#train_my_model(model, X_train, y_train)
print(x_test.shape)
print(y_test.shape)
test_my_module(model,x_test, y_test)
参考资料
https://www.biaodianfu.com/textcnn.html
https://zh.gluon.ai/chapter_natural-language-processing/sentiment-analysis-cnn.html
https://www.cnblogs.com/ModifyRong/p/11319301.html