什么是Slot Filling?
Slot Filling是自然语言理解中的一个基本问题,是对语言含义的简单化处理,它的思想类似于语言学中框架主义的一派,先设定好特定的语言类型槽,再将输入的单词一一填入槽内,而获取言语含义的时候即是根据语义槽的含义进行提取和检索。我们这里的任务就是将表示定购航班(ATIS数据集)这一言语行为的一系列语句填入各种类型的语义槽中。
为什么使用SimpleRNN?
Slot Filling属于RNN应用中一对一的应用,通过训练模型,每个词都能被填到合适的槽中。
RNN和一般的神经网络的不同在于,在RNN中,我们在时间t的输出不仅取决于当前的输入和权重,还取决于之前的输入,而对于其他神经网络模型,每个时刻的输入和输出都是独立而随机的,没有相关性。放到我们要处理语义理解的问题上看,语言作为一种基于时间的线性输出,显然会受到前词的影响,因此我们选取RNN模型来进行解决这个问题。
这里选取SimpleRNN,是因为这个RNN比较简单,能达到熟悉框架的练习效果,之后可以选取其他有效的RNN模型,如LSTMS进行优化。
构建思路一览:
- 载入数据,使用的是chsasank修改的mesnilgr的load.py。
- 定义模型。采取Keras中的序列模型搭建,首先使用一个100维的word embedding层将输入的单词转化为高维空间中的一个向量(在这个空间中,语义和语法位置越近的单词的距离越小),然后我们构建一个dropout层防止过拟合,设置SimpleRNN层,设置TimeDistributed层以完成基于时间的反向传播。最后我们将这些层组织在一起,并确定optimizer和loss function。我们选取的optimizer是rmsprop,这样在训练后期依然能找到较有项,而选取categorical_crossentropy作为损失函数,则是因为处理的问题性质适合于此。
- 训练模型。出于对计算资源的考虑,我们一般使用minibtach的方法批量对模型进行训练。但是我们这里的数据是一句句话,如果按照一个固定的batch_size将其分裂,可能增加了不必要的联系(因为上下两句话是独立的),因此我们将一句话作为一个batch去进行训练、验证以及预测,并手动算出一个epoch的平均误差。
- 评估和预测模型。我们通过观察验证误差和预测F1精度来对模型进行评估。预测F1精度使用的是signsmile编写的conlleval.py。
- 保存模型。
import numpy as np
import pickle
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import SimpleRNN
from keras.layers.core import Dense,Dropout
from keras.utils import to_categorical
from keras.layers.wrappers import TimeDistributed
from matplotlib import pyplot as plt
import data.load
from metrics.accuracy import evaluate
Using TensorFlow backend.
Load Data
train_set,valid_set,dicts = data.load.atisfull()
# print(train_set[:1])
# dicts = {'label2idx':{},'words2idx':{},'table2idx':{}}
w2idx,labels2idx = dicts['words2idx'],dicts['labels2idx']
train_x,_,train_label = train_set
val_x,_,val_label = valid_set
idx2w = {w2idx[i]:i for i in w2idx}
idx2lab = {labels2idx[i]:i for i in labels2idx}
n_classes = len(idx2lab)
n_vocab = len(idx2w)
words_train = [[idx2w[i] for i in w[:]] for w in train_x]
labels_train = [[idx2lab[i] for i in w[:]] for w in train_label]
words_val = [[idx2w[i] for i in w[:]] for w in val_x]
# labels_val = [[idx2lab[i] for i in w[:]] for w in val_label]
labels_val =[]
for w in val_label:
for i in w[:]:
labels_val.append(idx2lab[i])
print('Real Sentence : {}'.format(words_train[0]))
print('Encoded Form : {}'.format(train_x[0]))
print('='*40)
print('Real Label : {}'.format(labels_train[0]))
print('Encoded Form : {}'.format(train_label[0]))
Real Sentence : ['i', 'want', 'to', 'fly', 'from', 'boston', 'at', 'DIGITDIGITDIGIT', 'am', 'and', 'arrive', 'in', 'denver', 'at', 'DIGITDIGITDIGITDIGIT', 'in', 'the', 'morning']
Encoded Form : [232 542 502 196 208 77 62 10 35 40 58 234 137 62 11 234 481 321]
========================================
Real Label : ['O', 'O', 'O', 'O', 'O', 'B-fromloc.city_name', 'O', 'B-depart_time.time', 'I-depart_time.time', 'O', 'O', 'O', 'B-toloc.city_name', 'O', 'B-arrive_time.time', 'O', 'O', 'B-arrive_time.period_of_day']
Encoded Form : [126 126 126 126 126 48 126 35 99 126 126 126 78 126 14 126 126 12]
Define and Compile the model
model = Sequential()
model.add(Embedding(n_vocab,100))
model.add(Dropout(0.25))
model.add(SimpleRNN(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.compile(optimizer = 'rmsprop',loss = 'categorical_crossentropy')
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 100) 57200
_________________________________________________________________
dropout_1 (Dropout) (None, None, 100) 0
_________________________________________________________________
simple_rnn_1 (SimpleRNN) (None, None, 100) 20100
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 127) 12827
=================================================================
Total params: 90,127
Trainable params: 90,127
Non-trainable params: 0
_________________________________________________________________
Train the model
def train_the_model(n_epochs,train_x,train_label,val_x,val_label):
epoch,train_avgloss,val_avgloss,f1s = [],[],[],[]
for i in range(1,n_epochs+1):
epoch.append(i)
## training
train_avg_loss =0
for n_batch,sent in enumerate(train_x):
label = train_label[n_batch]
# label to one-hot
label = to_categorical(label,num_classes=n_classes)[np.newaxis,:]
sent = sent[np.newaxis,:]
loss = model.train_on_batch(sent,label)
train_avg_loss += loss
train_avg_loss = train_avg_loss/n_batch
train_avgloss.append(train_avg_loss)
## evaluate&predict
val_pred_label,pred_label_val,val_avg_loss = [],[],0
for n_batch,sent in enumerate(val_x):
label = val_label[n_batch]
label = to_categorical(label,num_classes=n_classes)[np.newaxis,:]
sent = sent[np.newaxis,:]
loss = model.test_on_batch(sent,label)
val_avg_loss += loss
pred = model.predict_on_batch(sent)
pred = np.argmax(pred,-1)[0]
val_pred_label.append(pred)
val_avg_loss = val_avg_loss/n_batch
val_avgloss.append(val_avg_loss)
for w in val_pred_label:
for k in w[:]:
pred_label_val.append(idx2lab[k])
prec, rec, f1 = evaluate(labels_val,pred_label_val, verbose=False)
print('Training epoch {}\t train_avg_loss = {} \t val_avg_loss = {}'.format(i,train_avg_loss,val_avg_loss))
print('precision: {:.2f}% \t recall: {:.2f}% \t f1 :{:.2f}%'.format(prec,rec,f1))
print('-'*60)
f1s.append(f1)
# return epoch,pred_label_train,train_avgloss,pred_label_val,val_avgloss
return epoch,f1s,val_avgloss,train_avgloss
epoch,f1s,val_avgloss,train_avgloss = train_the_model(40,train_x,train_label,val_x,val_label)
输出:
Training epoch 1 train_avg_loss = 0.5546463992293973 val_avg_loss = 0.4345020865901363
precision: 84.79% recall: 80.79% f1 :82.74%
------------------------------------------------------------
Training epoch 2 train_avg_loss = 0.2575569036037627 val_avg_loss = 0.36228470020366654
precision: 86.64% recall: 83.86% f1 :85.22%
------------------------------------------------------------
Training epoch 3 train_avg_loss = 0.2238766908014994 val_avg_loss = 0.33974187403771694
precision: 88.03% recall: 85.55% f1 :86.77%
------------------------------------------------------------
……
------------------------------------------------------------
Training epoch 40 train_avg_loss = 0.09190682124901069 val_avg_loss = 0.2697056618613356
precision: 92.51% recall: 91.47% f1 :91.99%
------------------------------------------------------------
可视化
观察验证误差,选取合适的epoch。
%matplotlib inline
plt.xlabel=('epoch')
plt.ylabel=('loss')
plt.plot(epoch,train_avgloss,'b')
plt.plot(epoch,val_avgloss,'r',label=('validation error'))
plt.show()
print('最大f1值为 {:.2f}%'.format(max(f1s)))
最大f1值为 92.56%
保存模型
model.save('slot_filling_with_simpleRNN.h5')
结果分析
使用SimpleRNN最终得到的F1值为92.56%,和师兄的95.47%相比确实还相差很多。这主要是和我们模型的选取有关,SimpleRNN只能将前词的影响带入到模型中,但是语言中后词对前词也会有一定的影响,因此可以通过选择更加复杂的模型或者增加能够捕捉到后词信息的层来进行优化。
参考资料
- Keras Tutorial - Spoken Language Understanding
- pytorch-slot-filling
- liu946 AtisSlotLabeling
- 【Keras情感分类】训练过程中出现的问题汇总
- keras-SimpleRNN
- 机器学习中过拟合的解决办法