语义理解：
1.领域分类：餐饮、旅游等
2.意图识别：LSTM/bi-LSTM输出相对独立，可以在上层加一个CRF层。
3.槽值抽取/实体抽取

对话管理：
1.状态管理：输入为状态，输出为状态
2.对话策略
rewards设计：超过一定轮数，给负；用户表达出消极的情绪，给负。

A Network-based End-to-End Trainable Task-oriented Dialogue System读书笔记

task-oriented对话系统存在的一个困难是领域特定、训练数据有限。为了解决这个问题，最近用机器学习来解决对话系统的方法都是将这个
问题看做POMDP问题，目的是用RL通过与真实用户交互来训练对话策略。然而，NLU和NLG模块仍然依赖监督学习，需要数据训练。因此，为了使RL
tractable，状态和动作空间必须仔细设计，这会限制模型的表达和学习能力。此外，reward很难设计，运行时也很难计算。
seq2seq在end-to-end non-task-oriented对话系统中比较成功，一个缺点是不方面与知识库交互回答特定领域的问题。
本篇论文将平衡这两者的优势和劣势，模型是end-to-end，不直接对user goal建模，但通过在每次对话提供relevant和appropriate的response来完成任务，
用户意图用分布式表示，从知识库中检索答案。

steps

dialogue history (modelled by a set of belief trackers)
At each turn, the system takes a sequence of tokens from the user as input and converts it into two internal representations:
a distributed representation generated by an intent network and a probability distribution over slot-value pairs called the belief state (Young et al., 2013) generated by a set of belief trackers.
然后，从belief state中选择最可能的值来形成一个访问数据库的query，然后将检索的答案和Intent network、Belief Tracker在Policy Network处结合，输入到Generation
network来生成一个response。

intent Network

job: encode a sequence of input tokens into a distributed vector representation at every turn.
way: 1. use LSTM, the last time step hidden layer is taken as the representation; 2. use CNN(Kim). Here investigate both.

Belief Trackers(Dialogue State tracking)

Current state-of-the-art belief trackers use discriminative models such as recurrent neural networks(RNN)
job：维护一个informable slot的多项式分布和一个requestable slot的二项分布，informable slot有多个value。知识库中的每个slot都有自己的tracker，每个tracker都是RNN，并用CNN做特征提取。
requestable slot不需要被tracked，也就是不需要填槽。

将本轮的用户回答和上轮的机器回答连接一起编码，用来对每轮的上下文建模。

Database Operator

Policy network

看作将系统各模块粘结在一起的胶水，输出是表示系统动作的向量，输入是由intent network、belief state和DB组成。

Generation Network

由LSTM生成一个类似模板的句子，句子中的slot和它的value被相应的替换。

Corpus

informable slot在CamRestOTGY.json文件中，这个文件包括每个slot-value pair。
Requestable slots

collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework，通过crowd-sourcing的方法收集对话语料。一共680条对话数据，在
CamRest676.json中。

具体运算

query用one-hot表示，

dialog system学习笔记和理解

https://www.csie.ntu.edu.tw/~yvchen/doc/DeepDialogue_Tutorial.pdf

https://web.stanford.edu/~jurafsky/slp3/30.pdf
dialog state tracker which maintains the current state of the dialog (which include the user’s most recent dialog act, plus the entire set of slot-filler constraints the user has expressed so far) and
the dialog policy, which decides what the system should do or say next.
dialog act, a tag which represents the interactive function of the sentence dialog act being tagged.

SimpleDS https://arxiv.org/pdf/1601.04574.pdf
deep reinforcement learning将feature learning和policy learning联合学习。Almost two decades ago, the (spoken) dialogue systems community adopted the Reinforcement Learning (RL) paradigm
since it offered the possibility to treat dialogue design as an optimisation problem, and because RL-based systems can improve their performance over time with experience.

2017年论文

最近有很多给予隐变量的相关论文，自己之前没怎么关注过，这里梳理下。对话或问答系统的趋势是让机器富有情感。

黄民烈 Emotional conversation generation with internal and external memory
Affective Neural Response Generation 这篇论文引用的文献要仔细看下

当前的神经对话模型都是lexico-syntactic级，忽略了情感内容，这篇论文提出3个方法将情感融合到encoder-decoder架构中，也是和黄民烈的做了比较。

Latent Intention Dialogue Models

Serban, 2017. A hierarchical latent variable encoder-decoder model for generating dialogues.

Serban et al. (2017) have introduced latent variables to the dialogue modelling framework, to model the underlying distribution over possible responses directly.

Kris Cao, Latent Variable Dialogue Models and their Diversity

End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning
LEARNING END-TO-END GOAL-ORIENTED DIALOG
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
ITERATIVE POLICY LEARNING IN END-TO-END TRAINABLE TASK-ORIENTED NEURAL DIALOG MODELS
http://www.sohu.com/a/135962876_465975
ADVERSARIAL ADVANTAGE ACTOR-CRITIC MODEL FOR TASK-COMPLETION DIALOGUE POLICY LEARNING

联合学习，包括NLU和对话管理、意图识别和填槽

END-TO-END JOINT LEARNING OF NATURAL LANGUAGE UNDERSTANDING AND DIALOGUE MANAGER
Attention-Based Recurrent Neural Network Models for Joint Intent Detectionand Slot Filling
A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding

multi-domain对话管理

研究小组

https://aps.arxiv.org/find/cs/1/au:+Liu_B/0/1/0/all/0/1
deng li: https://scholar.google.com/citations?hl=zh-CN&user=GQWTo4MAAAAJ&view_op=list_works&sortby=pubdate
http://alborz-geramifard.com/workshops/nips17-Conversational-AI/Main.html
剑桥对话组 http://dialogue.mi.eng.cam.ac.uk/

Experimental evaluation

如何评价How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

参考

simpleDS
A Network-based End-to-End Trainable Task-oriented Dialogue System http://mi.eng.cam.ac.uk/~sjy/papers/wgmv17.pdf
https://web.stanford.edu/~jurafsky/slp3/30.pdf
THE FIFTH DIALOG STATE TRACKING CHALLENGE

对话