A Survey on Conversational Recommender Systems(2021)阅读笔记

1.CRS架构

A Survey on Conversational Recommender Systems(2021)阅读笔记_第1张图片

1.1Dialogue Management System:

这是一个核心环节。
由于CRS实现的是多轮对话,所以可以说这一任务显示或者隐式地实现了某种形式的对话状态管理

①输入:It receives the processed inputs, e.g., the recognized intents,entities and preferences

②操作:correspondingly updates the dialogue state and user model

③输出:using a recommendation and reasoning engine and background knowledge, it determines the next action and returns appropriate content like a recommendation list, an explanation, or a question to the output generation component.

1.2User Modeling System:

can be a component of its own, in particular when there are long-term user preferences to be considered, or not. In some cases, the current preference profile is implicitly part of the dialogue system.

1.3Recommendation and Reasoning Engine

retrieving a set of recommendations, given the current dialogue state and preference model. This component might also implement other complex reasoning functionality, e.g., to generate explanations or to compute a query relaxation

1.4Knowledge Elements.

①Item Database
The Item Database is something that is present in almost all solutions, representing the set of recommendable items, sometimes including details about their attributes.
②different types of Domain and Background Knowledge Many approaches explicitly encode dialogue knowledge in different ways, e.g., in the form of pre-defined dialogue states, supported user intents, and the possible transitions between the states.

2.对于文章第四部分“UNDERLYING KNOWLEDGE AND DATA”的一些思考

2.1 user intents和dialog states

实际上对于一个CRS来说,其支持的user intents可能有:

A Survey on Conversational Recommender Systems(2021)阅读笔记_第2张图片
由于这么多的intents,以及维持多轮对话,最终完成推荐目标,需要有Dialogue States的管理,这可能是采用状态机、对话语法、或一些工具如Google’s DialogFlow等技术,状态之间的转换可能是预先定义好的或者是训练学习到。

只不过我读的论文基本上都是基于端到端训练,基于NLP,所以对话状态管理也被隐式建模到最终的网络模型中了。

“in some NLP-based conversational preference elicitation systems such as References [29, 172], there are mainly two phases: asking questions, in this case in an adaptive way, and presenting a recommendation list.” in the NLP-based end-to-end learning CRS proposed in Reference [75], the dialogue states are in some ways also modeled implicitly, but in a different way. This system is based on a corpus of recorded human conversations (between crowdworkers) centered around movie recommendations. This corpus is used to train a complex neural model, which is then used to react to utterances by users. Looking at the conversation examples, these conversations, besides some chit-chat, mainly consist of interactions where one communication partner asks the other if she or he likes a certain movie. The sentiment of the answer of the movie seeker is then analyzed to make another recommendation, again mostly in the form of a question. The dialogue model is therefore relatively simple and encoded in the neural model. It seemingly does not support many other types of intents or information requests that do not contain movie names (e.g., “I would like to see a sci-fi movie”

2.2 对于User Modeling

也就是说“The interactive elicitation of the user’s current preferences or needs and constraints”
最终目的是要将用户的偏好形成explicit form within a user profile。

这也有很多的形式和方法。两种主要表达用户偏好的方式:
A Survey on Conversational Recommender Systems(2021)阅读笔记_第3张图片
还有一些工作中同时考虑用户的长期偏好

2.3 Background Knowledge

①Item-related Information.
Such a database can contain item ratings, metadata that can be presented to the user (e.g., the genre of a movie or the director), community-provided tags, or extracted keyphrases.

(有时间可以去看看MovieLens or Netflix)这种类型的数据集

②Dialogue Corpora Created to Build CRS
NLP-based dialogue systems are usually based on training data that consist of recorded and often annotated conversations between humans (interaction histories). Note that in some cases when building a CRS, these dialogue corpora are combined with other knowledge bases

③Logged Interaction Histories
④Lexicons and World Knowledge.
Researchers often use additional knowledge bases to support the entity recognition process in NLP-based systems.

2.4 小结

CRS can be knowledge-intensive or data-intensive systems.
(也即一个CRS可以是根据很多先验知识如支持哪些意图,有哪些对话状态进行设计的,也可以是依靠大量数据进行训练的)

CRS approaches that use forms and buttons as the only interaction mechanism, the interaction flow is typically pre-defined in the form of the possible dialogue states, the set of supported user intents, and the user profile attributes to acquire. NLP-based systems, in contrast, are usually more dynamic in terms of the dialogue flow, and they rely on additional knowledge sources like dialogue corpora and answer templates as well as lexicon and word knowledge bases. Nonetheless, these systems typically require the manual definition of additional background knowledge, e.g., with respect to the supported user intents. Pure “end-to-end” learning only from recorded dialogues seems challenging. In most existing approaches the set of supported interaction patterns is implicitly or explicitly predefined, e.g.,in the form of “user provides preferences, systems recommends.” To a certain extent, also the collection of human-to-human dialogues can be designed to support possible system responses like in Reference [75], where the crowdworkers were given specific instructions regarding the expected dialogues. As a result, the range of supported dialogue utterances can be relatively narrow. The system presented in Reference [75], for example, cannot handle a query like “good sci-fi movie please.

总之,2.1 2.2 2.3 是一个CRS模型在设计时需要考虑的基础方面。只不过我之前读的几篇工作基本上都是属于pure end-to end,所以可能只是在整个模型中隐式地包含了这些方面。
以后读论文的过程中也应该着重从这几方面去关注作者是如何做的。

3 对于文章第五部分“COMPUTATIONAL TASKS”的一些总结

总结几个主要任务:

3.1Request

也即获取用户偏好。比如以前的slot-filling方法。
重点看最近的基于深度学习的方法,但是还是感觉这一任务主要是用在面向任务的CRS中,决定“要问什么属性?推荐哪个项目?什么时候推荐项目、询问属性?”

Instead of using heuristics for attribute selection and static dialogue state transition rules, a number of more recent systems rely on learning-based approaches, e.g., using reinforcement learning[86, 139, 146]. In Reference [139], for example, the authors use a deep policy network to decide on the system action. Based on the current dialogue state, as modeled by a belief tracker, the system
either makes a request for a pre-defined facet or generates a recommendation to be shown to the user. An alternative
learning-based way to determine the question order
was proposed in Reference [30]. In their work, the authors design a recommender for YouTube that leverages past watching histories of the user community and a Recurrent Neural Network architecture to rank the questions (topics) that are shown to the user in a conversational step. An
alternative to asking users about attribute-based preferences is to
ask them to give feedback on selected items. This can be done either
by asking them to rate individual items (e.g., by like/dislike
statements) or by asking them to express their preference for item
pairs or entire sets of items [81]. The computational task in this
context is to determine the most informative item(s) to present to the
user. Possible strategies include the selection of popular or diverse
items in the cold-start phase, items that are different in terms of
their past ratings or attributes, or itemsets that represent a balance
of popularity and diversity [17, 93, 101, 120]. However, not only item
features might be relevant for the selection of the items. In
Reference [17], the authors found that a user’s willingness to give
feedback on an item can depend on additional factors.

比较新的工作参考:
Daisuke Tsumita and Tomohiro Takagi. 2019. Dialogue based recommender system that flexibly mixes utterances and recommendations. In WI’19. 51–58
Lei W, Zhang G, He X, et al. Interactive path reasoning on graph for conversational recommendation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 2073-2083.

3.2Recommend

还是主要看了一下最近基于机器学习的方法:

More recent works rely on machine learning models and background datasets for the recommendation task. One common approach is to train a model on the traditional user-item interaction matrix, e.g., based on probabilistic matrix factorization [29], and to then combine the user’s current interactions with the trained user and item embeddings. In another approach [4], the authors rely on a content-based method based on item features and the user profile in the cold-start stage, and then switch to a Restricted Boltzmann Machine collaborative filtering method once a sufficient number of preference signals is available. In Reference [172], a hybrid multimemory network with attention mechanism was trained to find suitable recommendations based on item embeddings and the user’s query embedding. Here, the item embedding was based on the item’s textual description, and the user’s query embedding encoded the user’s initial request and the follow-up conversations during the interaction. A hybrid model was also proposed in Reference [139], which used Factorization Machines to combine the dialogue state—represented with an LSTM-based belief tracker for each item facet—user information, and item information to train the recommendation model.In the video recommender system presented in Reference [30], finally, anRNN-based model was built for making recommendations, based on the topics selected by the users and their watching history.

总之在推荐任务中,CRS可以使用很多方法all sorts of approaches—collaborative, content-based, hybrid—can be used within CRS.
还有一个问题就是现在CRS大多数工作中都只是基于短期偏好。稍微新一点的考虑了长期偏好的文章:
A. Argal, S. Gupta, A. Modi, P. Pandey, S. Shim, and C. Choo. 2018. Intelligent travel chatbot for predictive recommendation in Echo platform. In CCWC’18. 176–183

5.3 Explain.

比较近的文献:
Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. OpenDialKG: Explainable conversational reasoning with attention-based walks over knowledge graphs. In ACL’19. 845–854

Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Matsuyama, and Justine Cassell. 2019. A model of social explanations for a conversational movie recommendation system. In HAI’19. 135–143.

5.4 Respond.

This category of tasks is relevant in user-driven or mixed-initiative NLP-based CRS, where the user can actively ask questions to the system, actively make preference statements, or issue commands. The system’s goal is to properly react to user utterances that do not fall in the above-mentioned categories “Recommend” and “Explain.”

一种方法是:

One approach—also commonly used in chatbots—is to map the utterances to pre-defined intents, such as the ones mentioned in Table 1, e.g., Obtain Details or Restart. The system’s answers to these pre-defined intents can be implemented in the system with the help of templates.

另外就是基于学习的方法:

An alternative technical approach is to select or generate the system’s responses by automatically training a machine learning model from dialogue corpora and other knowledge sources like in “end-to-end” learning systems

参考:Liqiang Nie, Wenjie Wang, Richang Hong, Meng Wang, and Qi Tian. 2019. Multimodal dialog system: Generating responses via adaptive decoders. In MM’19. 1098–1106

5.5 supporting tasks

(1)意图和实体识别

Neural networks were used also in other recent intent and entity recognition approaches [105, 146]. For example, a Multilayer Perceptron was used to predict the probability distribution on a set of pre-defined intent categories in Reference [105]. A sequence-to-sequence model was used in Reference [166] to reframe the user’s query (e.g., “How to protect my iphone screen”) into key-words (e.g., “iphone screen protector”) that are then used in the recommendation process to identify candidate items.

Liqiang Nie, Wenjie Wang, Richang Hong, Meng Wang, and Qi Tian. 2019. Multimodal dialog system: Generating responses via adaptive decoders. In MM’19. 1098–1106

(2)情感识别
Andrea Iovine, Fedelucio Narducci, and Giovanni Semeraro. 2020. Conversational recommender systems and natural language: A study through the ConveRSE framework. Decis. Supp. Syst. 131 (2020), 113250–113260.

Guoshuai Zhao, Hao Fu, Ruihua Song, Tetsuya Sakai, Zhongxia Chen, Xing Xie, and Xueming Qian. 2019. Personalized reason generation for explainable song recommendation. ACM Trans. Intell. Syst. Technol. 10, 4 (2019), 1–21.

Liqiang Nie, Wenjie Wang, Richang Hong, Meng Wang, and Qi Tian. 2019. Multimodal dialog system: Generating responses via adaptive decoders. In MM’19. 1098–1106.

5.6 总结

在看CRS的工作时,要着重关注它在实现以上任务时用了哪些方法。
如何进行推荐,属于基于什么的推荐?
如何进行回复生成?在回复生成中支持什么intent?
是否有一些额外的功能比如解释 情感识别?

Finally, from a technical and methodological perspective, we ask: “How far do we get with pure end-to-end learning approaches, i.e., by creating systems where, besides the item database, only a corpus of past conversations serves as input. Tremendous advances were made in NLP technology in recent years, but it stands to question if today’s learning-based CRS are actually useful, see Reference [59]. In part, the problem of assessing this aspect is tied to how we evaluate such systems. Computational metrics like BLEU can only answer certain aspects of the question. But also the human evaluations in the reviewed papers are sometimes not too insightful, in particular when a newly proposed system is evaluated relative to a previous system by a few human judges. We therefore should revisit our evaluation practice and also investigate what users actually expect
from a CRS
, how tolerant they are with respect to misunderstandings or poor recommendations, how we can influence these expectations, and how useful the systems are considered on an absolute scale. Technically, combining learning techniques with other sorts of structured knowledge seems to be key to more usable, reliable and also predictable conversational recommender systems in the future.

你可能感兴趣的:(会话推荐,nlp,推荐系统)