2018-NAACL- Bi-model based RNN Semantic Frame Parsing Model for Intent

In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).
本文利用两种关联的双向LSTMs（BLSTM），设计了一种基于RNN语义框架解析网络结构的新型双模框架解析网络结构，从而实现了意图检测和槽填充任务。

Abstract 摘要

Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
意图检测和插槽填充是构建口语理解（SLU）系统的两个主要任务。

Multiple deep learning based models have demonstrated good results on these tasks .
多个基于深度学习的模型在这些任务上已经显示出了良好的结果。

The most effective algorithms are based on the structures of sequence to sequence models (or ”encoder-decoder” models), and generate the intents and semantic tags either using separate models((Yao et al., 2014; Mesnil et al., 2015; Peng and Yao, 2015; Kurata et al., 2016; Hahn et al., 2011)) or a joint model ((Liu and Lane, 2016a; Hakkani-T¨ur et al., 2016; Guo et al., 2014)).
最有效的算法是基于序列模型（或“编码-解码”模型）的结构，并使用单独的模型生成意图和语义标记（姚等人，2014;Mesnil等人，2015年;彭和姚，2015;库拉塔等人，2016年;哈恩等人，2011年））或联合模型（刘和莱恩，2016a;Hakkani-T ur等人，2016年;郭等人，2014年））。

Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent.
然而，以前的大多数研究都将意图检测和槽填充作为两个独立的并行任务处理，或者使用序列来生成语义标签和意图。

Most of these approaches use one (joint) NN based model (including encoderdecoder structure) to model two tasks, hence may not fully take advantage of the cross impact between them.
这些方法中的大多数使用一个（联合）基于NN的模型（包括编码解码器结构）来对两个任务进行建模，因此可能无法充分利用它们之间的交叉影响。

In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM).

本文利用两种关联的双向LSTMs（BLSTM），设计了一种基于RNN语义框架解析网络结构的新型双模框架解析网络结构，从而实现了意图检测和槽填充任务。

Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data (Hemphill et al., 1990; Tur et al., 2010), with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.
我们的双模型结构和一个解码器实现了对基准的ATIS数据的最先进的结果（Hemphill等，1990;图尔等人，2010年），大约0.5%的意图精度提高和0.9%的插槽填充改进。

1 Introduction 介绍

The research on spoken language understanding (SLU) system has progressed extremely fast during the past decades.
在过去的几十年里，对口语理解（SLU）系统的研究进展非常迅速。

Two important tasks in an SLU system are intent detection and slot filling.
SLU系统中的两个重要任务是意图检测和插槽填充。

These two tasks are normally considered as parallel tasks but may have cross-impact on each other.

这两个任务通常被认为是并行任务，但可能相互影响。

The intent detection is treated as an utterance classification problem, which can be modeled using conventional classifiers including regression, support vector machines (SVMs) or even deep neural networks (Haffner et al., 2003; Sarikaya et al., 2011).

意图检测被视为一个话语分类问题，可以用传统的分类器来建模，包括回归、支持向量机（SVMs）甚至是深度神经网络（Haffner等人，2003;Sarikaya等人，2011年）。

The slot filling task can be formulated as a sequence labeling problem, and the most popular approaches with good performances are using conditional random fields (CRFs) and recurrent neural networks (RNN) as recent works (Xu and Sarikaya, 2013).

槽填充任务可以作为一个序列标签问题来制定，而最流行的具有良好性能的方法是使用有条件的随机字段（CRFs）和重复的神经网络（RNN）作为最近的工作（徐和Sarikaya，2013）。

Some works also suggested using one joint RNN model for generating results of the two tasks together, by taking advantage of the sequence to sequence(Sutskever et al., 2014) (or encoderdecoder) model, which also gives decent results as in literature(Liu and Lane, 2016a).

一些作品还建议使用一个联合RNN模型来生成这两个任务的结果，通过利用序列序列（Sutskever等人，2014）（或编码解码器）模型，这也提供了与文献相关的良好结果（Liu和Lane，2016a）。

In this paper, Bi-model based RNN structures are proposed to take the cross-impact between two tasks into account, hence can further improve the performance of modeling an SLU system.

本文提出了基于双模型的RNN结构，将两个任务之间的交叉影响考虑在内，从而进一步提高了SLU系统建模的性能。

These models can generate the intent and semantic tags concurrently for each utterance.

这些模型可以为每个话语同时生成意图和语义标记。

In our Bi-model structures, two task-networks are built for the purpose of intent detection and slot filling.
在我们的双模型结构中，两个任务网络是为了目的检测和插槽填充而构建的。

Each task-network includes one BLSTM with or without a LSTM decoder (Hochreiter and Schmidhuber, 1997; Graves and Schmidhuber, 2005).

每个任务网络包括一个有或没有LSTM解码器的BLSTM（Hochreiter和施密特，1997;格雷夫斯和施米德休伯，2005年）。

The paper is organized as following: In section 2, a brief overview of existing deep learning approaches for intent detection and slot fillings are given.

本文的组织如下：在第2部分中，简要概述了现有的用于意图检测和槽填充的深度学习方法。

The new proposed Bi-model based RNN approach will be illustrated in detail in section 3.
新提出的基于RNN方法的新模型将在第3部分中详细说明。

In section 4, two experiments on different datasets will be given.

在第4部分中，将给出两个不同数据集的实验。

One is performed on the ATIS benchmark dataset, in order to demonstrate a state-of-the-art result for both semantic parsing tasks.

其中一个是在ATIS基准数据集上执行的，目的是为了演示两个语义解析任务的最先进的结果。

The other experiment is tested on our internal multi-domain dataset by comparing our new algorithm with the current best performed RNN based joint model in literature for intent detection and slot filling.

另一个实验是在我们的内部多域数据集上进行的，通过比较我们的新算法和当前在文献中最有效的基于RNN的联合模型来进行意图检测和槽填充。

2 Background 背景

In this section, a brief background overview on using deep learning and RNN based approaches to perform intent detection and slot filling tasks is given.

在本节中，给出了使用深度学习和基于RNN的方法来执行意图检测和槽填充任务的简要背景概述。

The joint model algorithm is also discussed for further comparison purpose.
为了进一步的比较目的，还讨论了联合模型算法。

2.1 Deep neural network for intent detection 意图识别深度神经网络

Using deep neural networks for intent detection is similar to a standard classification problem, the only difference is that this classifier is trained under a specific domain.
使用深层神经网络进行意图检测类似于标准的分类问题，唯一的区别是这个分类器是在一个特定的领域中训练的。

For example, all data in ATIS dataset is under the flight reservation domain with 18 different intent labels.
例如，ATIS数据集中的所有数据都在航班预订域中，有18个不同的意图标签。

There are mainly two types of models that can be used: one is a feed-forward model by taking the average of all words’ vectors in an utterance as its input, the other way is by using the recurrent neural network which can take each word in an utterance as a vector one by one (Xu and Sarikaya, 2014).
主要有两种类型的模型,可以使用:一个是前馈模型通过所有单词的平均向量的话语作为其输入,另一种方法是通过使用递归神经网络,可以把每个单词在一个话语作为一个向量一个接一个(徐Sarikaya,2014)。

2.2 Recurrent Neural network for slot filling 循环神经网络的槽填充

The slot filling task is a bit different from intent detection as there are multiple outputs for the task, hence only RNN model is a feasible approach for this scenario.

插槽填充任务与意图检测稍有不同，因为任务有多个输出，因此只有RNN模型是这种场景的可行方法。

The most straight-forward way is using single RNN model generating multiple semanctic tags sequentially by reading in each word one by one (Liu and Lane, 2015; Mesnil et al., 2015; Peng and Yao, 2015).

最直接的方法是使用单个RNN模型，通过逐项逐项读取每个单词（Liu和Lane，2015;Mesnil等人，2015年;彭和姚，2015年）。

This approach has a constrain that the number of slot tags generated should be the same as that of words in an utterance.

这种方法有一个限制，即生成的槽标记的数量应该与话语中的单词相同。

Another way to overcome this limitation is by using an encoder-decoder model containing two RNN models as an encoder for input and a decoder for output (Liu and Lane, 2016a).

另一种克服这种限制的方法是使用一个编码-解码器模型，其中包含两个RNN模型作为输入的编码器和输出的解码器（Liu和Lane，2016a）。

The advantage of doing this is that it gives the system capability of matching an input utterance and output slot tags with different lengths without the need of alignment. Besides using RNN, It is also possible to use the convolutional neural network (CNN) together with a conditional random field (CRF) to achieve slot filling task (Xu and Sarikaya, 2013).

这样做的好处是，它提供了匹配输入的声音和输出槽标记的系统功能，而不需要对齐。除了使用RNN之外，还可以使用卷积神经网络（CNN）和一个条件随机场（CRF）来实现插槽填充任务（Xu和Sarikaya，2013）。

2.3 Joint model for two tasks 两个任务的联合模型
It is also possible to use one joint model for intent detection and slot filling (Guo et al., 2014; Liu and Lane, 2016a,b; Zhang and Wang, 2016; Hakkani-T¨ur et al., 2016). One way is by using one encoder with two decoders, the first decoder will generate sequential semantic tags and the second decoder generates the intent.
也可以使用一个联合模型来进行意图检测和插槽填充（郭等，2014;刘和莱恩，2016a，b;张和王，2016;Hakkani-T ur等人，2016年）。一种方法是使用一个编码器和两个解码器，第一个解码器将生成顺序语义标签，第二个解码器会产生意图。

Another approach is by consolidating the hidden states information from an RNN slot filling model, then generates its intent using an attention model (Liu and Lane, 2016a).
另一种方法是将隐藏状态信息从RNN槽填充模型中合并，然后使用注意力模型（Liu和Lane，2016a）生成其意图。

Both of the two approaches demonstrates very good results on ATIS dataset.

这两种方法都在ATIS数据集上显示了非常好的结果。

3 Bi-model RNN structures for joint semantic frame parsing 用于联合语义框架解析的双向RNN结构

Despite the success of RNN based sequence to sequence (or encoder-decoder) model on both tasks, most of the approaches in literature still use one single RNN model for each task or both tasks.
尽管在这两个任务上，基于RNN的序列（或编码-解码器）模型都取得了成功，但大多数文献中的方法仍然使用一个单一的RNN模型来完成每个任务或两个任务。

They treat the intent detection and slot filling as two separate tasks.

他们将意图检测和插槽填充视为两个独立的任务。

In this section, two new Bi-model structures are proposed to take their cross-impact into account, hence further improve their performance.
在本节中，提出了两个新的双模型结构，以将它们的交叉影响考虑在内，从而进一步提高它们的性能。

One structure takes the advantage of a decoder structure and the other doesn’t.
一种结构利用了解码器结构的优势，而另一种结构则没有。

An asynchronous training approach based on two models’ cost functions is designed to adapt to these new structures.
基于两个模型的成本函数的异步训练方法是为了适应这些新结构而设计的。

3.1 Bi-model RNN Structures 双向RNN模型架构

A graphical illustration of two Bi-model structures with and without a decoder is shown in Figure 1.
图1显示了两个具有和没有解码器的双模型结构的图形说明。

image.png

The two structures are quite similar to each other except that Figure 1a contains a LSTM based decoder, hence there is an extra decoder state st to be cascaded besides the encoder state ht.
这两个结构非常相似，只是图1a包含一个基于LSTM的解码器，因此除了编码器状态之外，还有一个额外的解码器状态。

Remarks: 评论
The concept of using information from multiplemodel/ multi-modal to achieve better performance has been widely used in deep learning (Dean et al., 2012; Wang, 2017; Ngiam et al., 2011; Srivastava and Salakhutdinov, 2012), system identification (Murray-Smith and Johansen, 1997; Narendra et al., 2014, 2015) and also reinforcement learning field recently (Narendra et al., 2016; Wang and Jin, 2018).
在深度学习中广泛使用了利用多模式/多模式信息来获得更好的性能的概念（Dean et al.，2012;王,2017;Ngiam等人，2011年;2012年斯利瓦斯塔瓦和萨拉赫哈丁诺夫，系统识别（Murray-Smith和约翰森，1997;纳伦德拉等人，2014年，2015年），以及最近的强化学习领域（纳伦德拉等人，2016年;王和金，2018年）。

Instead of using collective information, in this paper, our work introduces a totally new approach of training multiple neural networks asynchronously by sharing their internal state information.
我们的工作不是使用集体信息，而是引入了一种全新的方法，通过共享内部状态信息来异步地训练多个神经网络。

3.1.1 Bi-model structure with a decoder 基于解码结构的双模型

The Bi-model structure with a decoder is shown as in Figure 1a.
带有解码器的双模型结构如图1a所示。

There are two inter-connected bidirectional LSTMs (BLSTMs) in the structure, one is for intent detection and the other is for slot filling.
在结构中有两个相互连接的双向LSTMs（BLSTMs），一个用于意图检测，另一个用于槽填充。

Each BLSTM reads in the input utterance sequences forward and backward, and generates two sequences of hidden states and
每一个BLSTM都在输入的话语序列中向前和向后读取，并产生两个隐藏状态和的序列。

A concatenation of and forms a final BLSTM state ht = [,] at time step t.
在时间t的时候，和的串联形成了一个最终的BLSTM状态ht = [,]。

Hence, Our bidirectional LSTM fi(·) generates a sequence of hidden states (), where i = 1 corresponds the network for intent detection task and i = 2 is for the slot filling task.
因此，我们的双向LSTM fi（·）产生一系列隐藏状态（），其中i=1对应于意图检测任务的网络，而i=2则用于槽填充任务。

In order to detect intent, hidden state is combined together with from the other bidirectional LSTM f2(·) in slot filling task-network to generate the state of g1(·), , at time step t:

为了检测意图，隐藏状态与其他双向LSTM f2（）在槽填充任务网络中的结合在一起，以生成g1（）、的状态，在时间t：

where contains the predicted probabilities for all intent labels at the last time step n.

包含了最后一次步骤n中所有意图标签的预测概率。

For the slot filling task, a similar network structure is constructed with a BLSTM f2(·) and a LSTM g2(·). f2(·) is the same as f1(·), by reading in the a word sequence as its input.
对于槽填充任务,类似的网络结构是用BLSTM f2(·)和lstmg2(·)构建的。f2(·)与f1(·)相同,通过在一个单词序列中阅读作为输入。

The difference is that there will be an output at each time step t for g2(·), as it is a sequence labeling problem.
不同之处在于，对于g2（）的每一次，都会有一个输出，因为它是一个序列标签问题。

At each step t:

在每一步t：

where is the predicted semantic tags at time step t.
是时间t的预测语义标记。

3.1.2 Bi-Model structure without a decoder 无解码器的双模结构

The Bi-model structure without a decoder is shown as in Figure 1b.

没有解码器的双模型结构如图1b所示。

In this model, there is no LSTM decoder as in the previous model.

在这个模型中，没有像前一个模型那样的LSTM解码器。

For the intent task, only one predicted output label y1 intent is generated from BLSTM f1(·) at the last time step n, where n is the length of the utterance.

对于意图任务，只有一个预测的输出标签y1意图是从BLSTM f1（）在最后一次步骤n中生成的，其中n是话语的长度。

Similarly, the state value and output intent label are generated as:

类似地，国家价值和输出意图标签生成如下：

For the slot filling task, the basic structure of BLSTM f2(·) is similar to that for the intent detection task f1(·), except that there is one slot tag label generated at each time step t.
对于槽填充任务，BLSTM f2（）的基本结构与意图检测任务f1（）相似，只是在每个时间步骤t中生成一个槽标签标签。

It also takes the hidden state from two BLSTMs f1(·) and f2(·), i.e. and , plus the output tag together to generate its next state value and also the slot tag . To represent this as a function mathematically:
它还从两个BLSTMs f1（）和f2（）中获取隐藏状态，即和，加上输出标签一起产生下一个状态值 $h^2_t$ 和插槽标记。用数学方法来表示这个函数：

3.1.3 Asynchronous training 异步训练
One of the major differences in the Bi-model structure is its asynchronous training, which trains two task-networks based on their own cost functions in an asynchronous manner.
双模型结构的主要区别之一是它的异步训练，它以异步方式训练两个任务网络，以它们自己的成本函数为基础。

The loss function for intent detection task-network is , and for slot filling is . and are defined using cross entropy as:
意图检测任务网络的损失函数是，而槽填充是。和是用交叉熵定义的：

and

where k is the number of intent label types, m is the number of semantic tag types and n is the number of words in a word sequence.
当k是意图标签类型的数量时，m是语义标记类型的数量，而n是单词序列中的单词数量。

In each training iteration, both intent detection and slot filling networks will generate a groups of hidden states and from the models in previous iteration.
在每次训练迭代中，意图检测和插槽填充网络将在以前的迭代中产生一组隐藏状态和。

The intent detection task-network reads in a batch of input data and hidden states , and generates the estimated intent labels .
意图检测任务网络在一批输入数据和隐藏状态中读取，并生成估计的意图标签。

The intent detection task-network computes its cost based on function and trained on that.
意图检测任务网络根据功能计算其成本，并对其进行培训。

Then the same batch of data will be fed into the slot filling task network together with the hidden state from intent task-network, and further generates a batch of outputs for each time step.
然后，同样的数据将被输入到槽填充任务网络中，以及来自意图任务网络的隐藏状态，并在每一个时间步骤中进一步生成一批输出。

Its cost value is then computed based on cost function , and further trained on that.
然后根据成本函数来计算它的成本值，并对其进行进一步的培训。

The reason of using asynchronous training approach is because of the importance of keeping two separate cost functions for different tasks. Doing this has two main advantages:
使用异步培训方法的原因是，对于不同的任务，保持两个独立的成本函数是很重要的。这样做有两个主要优点：

It filters the negative impact between two tasks in comparison to using only one joint model, by capturing more useful information and overcoming the structural limitation of one model.
它通过捕获更有用的信息和克服一个模型的结构限制，来过滤两个任务之间的负面影响，而不是只使用一个联合模型。
The cross-impact between two tasks can only be learned by sharing hidden states of two models, which are trained using two cost functions separately.
两个任务之间的交叉影响只能通过共享两个模型的隐藏状态来学习，这两个模型分别使用两个成本函数进行训练。