第一次发表博客,也是第一次写,大家多多包涵,都是有质量的东西。
模型实现代码:
// 基于Keras框架的textCNN模型,输入是bert的embedding_word,数据是sst2
import numpy as np
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Dense, Input
from keras.layers import Conv1D, Flatten, Dropout,MaxPooling1D
from keras.layers.merge import concatenate
from keras.callbacks import EarlyStopping
x_train=np.load('SST2/train_embedding_word.npy')
x_test=np.load('SST2/test_embedding_word.npy')
y_train=np.load('SST2/train_label.npy')
y_test=np.load('SST2/test_label.npy')
#构建TextCNN模型
#模型结构:embedding_word输入-卷积池化*3-拼接-全连接-dropout-全连接
def TextCNN(x_train,y_train,x_test,y_test,max_len,embedding_len):
main_input = Input(shape=(max_len,embedding_len), dtype='float')
# 词窗大小分别为3,4,5
cnn1 = Conv1D(filters=max_len * 2, kernel_size=3, padding='same', strides=1, activation='tanh')(main_input)
cnn1 = MaxPooling1D(pool_size=68)(cnn1)
#pool_size=max_len-kernel_size+1
cnn2 = Conv1D(filters=max_len * 2, kernel_size=4, padding='same', strides=1, activation='tanh')(main_input)
cnn2 = MaxPooling1D(pool_size=67)(cnn2)
cnn3 = Conv1D(filters=max_len * 2, kernel_size=5, padding='same', strides=1, activation='tanh')(main_input)
cnn3 = MaxPooling1D(pool_size=66)(cnn3)
# 合并三个模型的输出向量
cnn = concatenate([cnn1, cnn2, cnn3], axis=-1)
flat = Flatten()(cnn)
drop = Dropout(0.4)(flat)
main_output = Dense(2, activation='softmax')(drop)
model = Model(inputs=main_input, outputs=main_output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
print("Start training...")
train_labels = to_categorical(y_train, num_classes=2)
test_labels = to_categorical(y_test, num_classes=2)
early_stopping = EarlyStopping(monitor='val_acc', patience=50,mode='max')
history1= model.fit(x_train, train_labels, batch_size=20, epochs=200, verbose=1, validation_data=[x_test, test_labels],callbacks=[early_stopping])
score = model.evaluate(x_test, test_labels, verbose=2)
accy=history1.history['val_acc']
print("Accuracy:" + str(score))
print('val_acc',accy)
TextCNN(x_train,y_train,x_test,y_test,70,768)
1.dropout设置0.5是默认推荐的,本次实验根据数据选择的是0.4,实验结果还可以,可以根据自己的数据,进行修改
2.bert模型使用的是微调的uncase base 12层模型,max_len=70,取得是sst2文本的平均长度,多截少补,生成的是(70,768)的句子Wordembedding,另外bert是基于字的Wordembedding,所以bert中文没有词向量。
3.还生成了sst2的句向量,使用的是reduce_mean方法,表示二维的Wordembedding的每一列叠加取均值,然后得到的句向量,可以选择reduce_max,reduce_min方法
4.
cnn0 = Conv1D(filters=max_len * 2, kernel_size=2, padding=‘same’, strides=1, activation=‘tanh’)(main_input)
cnn0 = MaxPooling1D(pool_size=69)(cnn0)
后面通过实验,发现加入2颗粒度的卷积,实验结果更好,说明在英文中需要颗粒度较小的卷积特征提取,实际应用中,还是要考虑文本的特点,添加合适的卷积颗粒,不能按照论文一成不变。
##百度网盘链接
链接:https://pan.baidu.com/s/1Mk2FENIeKnDB5pAbY3ap6w
提取码:ohno