ai人工智能对话了_对话式人工智能-关键技术和挑战-第1部分

ai人工智能对话了

The voice interface and conventional system are the practical implementations of AI technology in the industry. This article will explore the basic knowledge and techniques then extend to the challenges faced in different business use cases.

语音接口和常规系统是业界AI技术的实际实现。 本文将探索基本知识和技术,然后扩展到不同业务用例中面临的挑战。

1.会话系统 (1. Conversational System)

What is the conversational system or a virtual agent? One of the best-known fictional agents is Jarvis from Iron Man. It can think independently and help Tony do almost anything, including running chores, processing massive data sets, making intelligent suggestions, and providing emotional support. The most impressive feature of Jarvis is the chat capability, you can talk to him like an old friend, and he can understand you without ambiguity. The technology behind the scene is conversational AI.

什么是会话系统或虚拟代理? 最著名的虚构人物之一是钢铁侠的贾维斯(Jarvis)。 它可以独立思考并帮助Tony做几乎所有事情,包括运行杂务,处理海量数据集,提出明智的建议以及提供情感支持。 Jarvis最令人印象深刻的功能是聊天功能,您可以像老朋友一样与他交谈,并且他可以毫无歧义地了解您。 幕后的技术是对话式AI。

The core of Conversational AI is a smartly designed voice user interface(VUI). Compared with the traditional GUI (Graphic User Interface), VUI free user’s hands by allowing them to perform nested queries via simple voice control (not ten clicks on the screen).

对话式AI的核心是设计精巧的语音用户界面(VUI)。 与传统的GUI(图形用户界面)相比,VUI允许用户通过简单的语音控制(而不是在屏幕上单击十次)来执行嵌套查询,从而解放了用户的双手。

However, I have to admit that there’s still a big gap between the perfect virtual agent Jarvis and the existing conversational AI platforms’ capabilities.

但是,我必须承认,理想的虚拟代理Jarvis与现有的对话式AI平台的功能之间仍然存在很大差距。

Human and machine conversations have received tons of tractions from academia and industry over the past decade. In the research lab, we saw the following movement:

在过去的十年中,人机对话在学术界和业界引起了无数的关注。 在研究实验室中,我们看到了以下动作:

  1. Natural language understanding has moved from manual annotation and linguistic analysis to deep learning and sequenced language modeling.

    自然语言理解已经从人工注释 语言分析转变为深度学习 顺序语言建模。

  2. The dialog management system has moved from rule-based policies to supervised learning and reinforcement learning.

    对话管理系统已经从基于规则的 政策 监督学习和强化学习

  3. The language generation engine has moved from the pre-defined template and syntax parsing to end-to-end language transformer and attention mechanisms.

    语言生成引擎已经从预定义的模板 语法解析转移到了端到端的语言转换器 注意力机制。

In addition, we also saw conversational products spring up in the cross-market domain. All the big players have their signature virtual agent, for instance, Siri for Apple, Alexa for Amazon, Cortana for Microsoft and Dialogflow for Google. (Diagram below are out of date, please use it a reference only)

此外,我们还看到了对话产品在跨市场领域涌现。 所有大型企业都有其签名的虚拟代理,例如,Apple的Siri,亚马逊的Alexa,微软的Cortana和Google的Dialogflow。 (以下图表已过时,请仅用作参考)

Recast.AI Recast.AI的报告

2.会话系统的关键组件 (2. Key Components of a Conversational System)

There are few main components in the conversational platform, 1) ASR: Automatic Speech Recognition, 2) NLU: Natural Language Understanding, 3) Dialog Management, 4)NLG: Natural Language Generation, 5) TTS: Text to Speech. (Additional components could include public API, integration gateway, action fulfillment logic, Language model training stack, versioning, and chat simulation, etc.)

对话平台中的主要组件很少,1)ASR:自动语音识别,2)NLU:自然语言理解,3)对话管理,4)NLG:自然语言生成,5)TTS:文本到语音。 (其他组件可能包括公共API,集成网关,动作执行逻辑,语言模型训练堆栈,版本控制和聊天模拟等)

For simplicity, let’s explore the basics now.

为简单起见,让我们现在探讨基础知识。

Simple Dialog System by Catherine Wang 凯瑟琳·王(Catherine Wang)的简单对话系统

2.1. ASR: Automatic speech recognition is a model trained on speaker voice record and transcript, then fine-tuned to recognize the unseen voice queries. Most of the conversational platforms offer this feature as an embedded element. Thus developers can leverage the state of the art ASR on their product(e.g., voice input, voice search, real-time translation, and smart home devices).

2.1。 ASR:自动语音识别是一种针对演讲者语音记录和笔录进行训练的模型,然后进行微调以识别看不见的语音查询。 大多数对话平台都将此功能作为嵌入式元素提供。 因此,开发人员可以在其产品(例如,语音输入,语音搜索,实时翻译和智能家居设备)上利用最新的ASR。

2.2. NLU: Indisputably, the most important part of a conversational system. ASR will only transcribe what you have said, but NLU will understand exactly what do you mean? Natural Language Understanding can be seen as a subset of Natural Language Processing. The relationship can be loosely described as below.

2.2。 NLU:毫无疑问,对话系统中最重要的部分。 ASR只会记录您所说的内容,但是NLU会确切地理解您的意思? 自然语言理解可以看作自然语言处理的一个子集。 可以如下大致描述该关系。

By SciforceSciforce

Both NLP and NLU are board topics, so instead of going too deep into the topic, I will explain the high-level concept by using practical examples from the virtual agent use case.

NLP和NLU都是董事会的主题,因此,我不会使用这个主题,而是通过使用虚拟代理用例中的实际示例来解释高级概念。

Generally speaking, NLU and NLP were structured around the following problems:

一般而言,NLU和NLP围绕以下问题构建:

  • Tokenisation and Tagging. They are text preprocessing techniques. Tokenisation is the first step that needs to apply to both the traditional linguistic analysis and deep learning models. It split a sentence into words (or n_grams), and those words will be later used to build the vocabulary or train the word embedding algorithm. Tagging is sometimes optional, and it will label each token (words) into lexical categories. (e.g., ADJ, ADV, NOUN, NN)

    令牌化和标记。 它们是文本预处理技术。 令牌化是需要应用于传统语言分析和深度学习模型的第一步。 它将一个句子分解成单词(或n_grams),这些单词将在以后用于构建词汇表或训练单词嵌入算法。 标记有时是可选的,它会将每个标记(单词)标记为词汇类别。 (例如,ADJ,ADV,NOUN,NN)
  • Dependency and Syntactic Parsing. A popular technique in linguistic analysis, it parses a sentence into its grammatical structure. Before the age of deep learning, those syntax trees are used to constitute a new sentence or a sequence of words.

    依赖性和语法解析。 它是语言分析中的一种流行技术,它将一个句子解析成其语法结构。 在深度学习时代之前,这些语法树用于构成新的句子或单词序列。
From Stanford NLP 从斯坦福大学自然语言处理
  • Name Entity Recognition. It was used to extract or identify a set of predefined word entities. The output of NER can sometimes look quite similar to POS tagging. The results are also stored in a Python tuple,e.g. (US, ‘GPE’). The main differences are 1) the NER model can be trained by new annotation to pick up domain-specific word entities. 2) NER focuses more on semantic meaning, whereas POC tagging is more on grammar structure.

    名称实体识别。 它用于提取或识别一组预定义的单词实体。 NER的输出有时看起来与POS标记非常相似。 结果也存储在Python元组中,例如(US,“ GPE”)。 主要区别在于1)可以通过新注释来训练NER模型,以选择特定领域的单词实体。 2)NER更侧重于语义,而POC标记更侧重于语法结构。
  • Phrase and Pattern matching. The simplest implementation of phrase matching is using a rules-based regular expression. Don’t get me wrong, the regular expression is still beneficial in the unlabeled dataset. An adequately defined fuzzy pattern can match up to hundreds of similar sentences. However, this rule-based method is hard to maintain and scale-up. A more advanced approach involves using POC tags or dependency labels as the sequence for matching, or using vector distances.

    短语和模式匹配。 短语匹配的最简单实现是使用基于规则的正则表达式。 别误会,正则表达式在未标记的数据集中仍然很有用。 适当定义的模糊模式可以匹配多达数百个相似的句子。 但是,这种基于规则的方法很难维护和扩展。 一种更高级的方法涉及使用POC标签或相关性标签作为匹配序列,或使用矢量距离。
  • Word Vectorization and Embedding. Word embedding marks the dawn of NLP, and it introduces the concept of distributed representation of a word. Before deep learning, linguistics uses the dense representation to capture the structure of the text and use the statistical model to understand the relationship. The drawback of this method is the lack of the capability of representing the contextual meaning and word inference. Word embedding offers a solution to learn the parameters that best represent a word in a particular context from a higher-dimensional space. For practical use, you can find pre-trained word embedding models like Word2Vec, GloVe, or if you need, you can always fine-tune those models on your new set of vocab and training corpus.

    词向量化和嵌入。 词嵌入标志着NLP的兴起,它引入了词的分布式表示的概念。 在深度学习之前,语言学使用密集表示法来捕获文本的结构,并使用统计模型来理解这种关系。 这种方法的缺点是缺乏表达上下文含义和单词推断的能力。 单词嵌入提供了一种解决方案,可以从更高维度的空间中学习在特定上下文中最能代表单词的参数。 在实际使用中,你可以找到预先训练字嵌入模型,如Word2Vec 手套 ,或者如果你需要,你可以随时调整你的新词汇集和训练语料的那些模型。

Word Embedding By Catherine Wang 凯瑟琳·王的文字嵌入
  • Sequence Vectorization and Embedding. Similar concept, but instead of vectorizing every single word, sequence embedding focuses on finding the best representation for longer text as a whole. This technique improves specific NLP tasks that need to understand a longer chunk of texts, for instance, text translation, text generation, reading comprehension, natural questions & longer answers, etc.

    序列向量化和嵌入。 类似的概念,但是序列向量化不是将每个单词向量化,而是着眼于为整个较长的文本找到最佳的表示形式。 该技术改进了需要理解较长文本的特定NLP任务,例如,文本翻译,文本生成,阅读理解,自然问题和较长答案等。
Sequence Modeling by Catherine Wang 凯瑟琳·王(Catherine Wang)的序列建模
  • Sentiment Analysis. The task of analyzing if an expression is positive or negative (can be understood as binary classification, 1-positive, 0-negative)? One of the most common tasks in NLP, in the use of conversational AI, sentiment analysis could provide a benchmark for the virtual customer agent to identify customers’ emotions and intention then provides a different emotional response suggestion.

    情绪分析。 分析表达式是正还是负(可以理解为二进制分类,1阳性,0阴性)的任务? NLP中最常见的任务之一是,使用对话式AI,情感分析可以为虚拟客户代理提供基准,以识别客户的情感和意图,然后提供不同的情感响应建议。
  • Topic Modeling. It leverages unsupervised ML techniques to find the groups of the topics in a broad set of unlabeled documents. It helps us to understand the theme of a collection of unseen corpus quickly. In the use case of conversational AI, topic modeling acts as the first filter that triage the user queries into higher-level topics then mapped to more granular intents and actions.

    主题建模。 它利用无监督的ML技术在大量未标记的文档中找到主题组。 它有助于我们快速理解未见语料库的主题。 在对话式AI的用例中,主题建模充当第一个过滤器,将用户查询分类为更高级别的主题,然后映射到更精细的意图和操作。
  • Text Classification and Intent Matching. Both of those tasks use supervised learning, and the quality of the model would largely depend on how you prep the training data. Compared with Topic Modeling, text classification and intent matching are more granular and deterministic. You can understand the relationship with the image shown below. When facing unseen customer queries, your conversational AI system will use topic modeling to filter your query to a broad topic and then use text classification and intent matching to map it to a specific action.

    文本分类和意图匹配。 这些任务都使用监督学习,并且模型的质量在很大程度上取决于您准备训练数据的方式。 与主题建模相比,文本分类和意图匹配更加精确和确定。 您可以通过下面显示的图像了解这种关系。 当遇到看不见的客户查询时,您的会话式AI系统将使用主题建模将查询过滤到较宽的主题,然后使用文本分类和意图匹配将其映射到特定操作。
Intent Matching by Catherine Wang 凯瑟琳·王的意图匹配
  • Language Modeling. A trendy topic in deep learning and NLP. All the state-of-the-art models you have heard of are based on this concept (the BERT family: ALBERT, RoBERTa; the multitask learners and few-shot learning: GPT-2). To let the machine understand the human language better, scientists trained it to build vocabulary and statistical models to predict the likelihood of each word in the context.

    语言建模。 深度学习和NLP中的一个热门话题。 您听说过的所有最新模型都基于此概念(BERT系列:ALBERT,RoBERTa;多任务学习器和一次性学习器:GPT-2)。 为了让机器更好地理解人类语言,科学家对其进行了培训,以建立词汇和统计模型以预测上下文中每个单词的可能性。
Language Model by Catherine Wang 王凯瑟琳的语言模型
  • Multi-Turn Dialog System. This is an advanced topic in NLU and conversational AI. It refers to the techniques that track and identity change of a topic/intent in a conversational system. How we can better pick up the information in each dialog and draw a comprehensive logic behind the user’s compound intent.

    多转对话系统。 这是NLU和对话式AI中的高级主题。 它指的是在对话系统中跟踪和标识主题/意图变化的技术。 我们如何更好地获取每个对话框中的信息,并在用户的复合意图背后绘制全面的逻辑。
Modeling Multi-turn Conversation with Deep Utterance Aggregation 使用深度话语聚合为多回合会话建模

In the use case of conversational AI, NLU is aiming to resolute the language confusion, ambiguity, generalize verbal understanding, identify domains and intentions from humans to machine dialog, then extract critical semantic information.

在对话式AI的使用案例中,NLU旨在消除语言的混乱,歧义,概括语言理解,识别人与机器对话的领域和意图,然后提取关键的语义信息。

Apart from using the key technologies I mentioned above, the AI system needs to find a useful semantic representation of user queries. The most successful one is “Frame Semantics,” which uses Domain, Intent, Entity, and Slot to formulate semantic results.

除了使用我上面提到的关键技术外,AI系统还需要找到用户查询的有用语义表示。 最成功的一种是“框架语义”,它使用域,意图,实体和插槽来表达语义结果。

  • Domain: Can be linked to topic modeling, it groups the queries and knowledge resources into different business categories, goals, and corresponding services. For example, “Pre-sale”, “Post-sale”, or “Order and Transaction”.

    :可以链接到主题建模,将查询和知识资源分为不同的业务类别,目标和相应的服务。 例如,“预售”,“售后”或“订单和交易”。

  • Intent: Can be linked to intent matching and classification. It refers to particular tasks or business processes within a domain. It usually is written in a verb-object phrase. e.g. “search for songs”, “play the music”, or “favorite the playlist” in the music player domain.

    意图 :可以链接到意图匹配和分类。 它指的是域中的特定任务或业务流程。 它通常用动词-宾语短语写成。 例如音乐播放器域中的“搜索歌曲”,“播放音乐”或“收藏播放列表”。

  • Entity and Slot: Can be used as parameters to extract critical information from domain and intent. e.g. “song name”, “ singer”.

    实体和广告位 :可以用作从域和意图中提取关键信息的参数。 例如“歌曲名称”,“歌手”。

A sentence “ What is the weather for Melbourne tomorrow? ” can be transposed into the blow structure,

一句话“明天墨尔本天气如何? 可以转换为打击结构

- Domain: “ Weather”

-域 :“天气”

- Intent: “ Check the Weather”

-目的 :“检查天气”

- Entity and Slot: (“City”: “Melbourne”, “Date”: “Tomorrow”)

-实体和位置 :(“城市”:“墨尔本”,“日期”:“明天”)

Then the follow-up actions will be fulfilled by parsing the above-structured data.

然后,将通过解析上述结构的数据来执行后续操作。

2.3. Dialog Management: another critical part of the Conversational AI system. It controls the flow of the dialog between user and agent. In the simplest version, a DM engine will remember the history dialog context, tracks the state of the current dialog, then applies dialog policy.

2.3。 对话管理:对话式AI系统的另一个关键部分。 它控制用户和代理之间的对话流程。 在最简单的版本中,DM引擎将记住历史对话框上下文,跟踪当前对话框的状态,然后应用对话框策略。

  • Dialog Context: During the session of a user-agent conversation, all the back and forth dialogs will be remembered in the context. Critical information like domain, intent, entity, and slot will be saved in a message queue for in-memory search and retrieve. After the conversation, the dialog context can be preserved in the database for further analysis.

    对话框上下文:在用户代理会话期间,所有来回对话框都将在上下文中记住。 诸如域,意图,实体和插槽之类的关键信息将保存在消息队列中,以便在内存中进行搜索和检索。 对话后,对话框上下文可以保留在数据库中以供进一步分析。

  • Dialog State Tracking: Dialog state tracker will remember the logic flow in the conversation. It will make the agent more intelligent and flexible by tacking the logic tuning point in different dialogs, then suggesting a response based on the long term memory.

    对话状态跟踪 :对话状态跟踪器将记住对话中的逻辑流程。 通过在不同对话框中添加逻辑调整点,然后根据长期记忆提出响应,这将使代理变得更加智能和灵活。

  • Dialog Policy: Based on the context and logic flow of the conversation, the agent needs to prioritize services, trigger certain events, and request fulfillment. The fulfillment actions could include retrieving user information from the database, searching for content in the knowledge base system, or triggering third-party API.

    对话策略:基于对话的上下文和逻辑流,代理需要确定服务的优先级,触发某些事件并执行请求。 履行动作可以包括从数据库检索用户信息,在知识库系统中搜索内容或触发第三方API。

For example:

例如:

Q: I want to order pizza delivery (intent=order_pizza, entity_time=null, entity_address = null, entity_type=null). A: what type of pizza do you want to order? (slot=type, slot=date, slot=address)

:我要订购比萨饼递送 (意向= order_pizza,entity_time = null,entity_address = null,entity_type = null)。 A: 您要订购哪种比萨饼? (广告位=类型,广告位=日期,广告位=地址)

Q: Margherita.(intent=order_pizza, entity_time=null, entity_address = null, entity_type=Margherita) A: What time you want your pizza to be diliver? (slot=date, slot=address)

:玛格丽特。 (intent = order_pizza,entity_time = null,entity_address = null,entity_type = Margherita) :您希望什么时候分送比萨饼? (广告位=日期,广告位=地址)

Q: ASAP.(intent=order_pizza, entity_time=ASAP , entity_address = null, entity_type=Margherita) A: Is there anything else you would want to order with your {Margherita} Pizza? (follow_up_intent: additional_prodcut)

:尽快。 (intent = order_pizza,entity_time = ASAP,entity_address = null,entity_type = Margherita) :您还想与{Margherita}披萨一起点吗? (follow_up_intent:Additional_prodcut)

Q: A bottole of Coke . (intent=order_pizza, entity_time=ASAP , entity_address = null, type=Margherita, additional =coke) A: What is the address for us to delier your pizza? ( slot=address)

问: 可乐的底部。 (intent = order_pizza,entity_time = ASAP,entity_address = null,type = Margherita,其他= coke) A:我们为您的披萨送餐的地址是什么? (插槽=地址)

Q: xx.xxx . (intent=order_pizza, entity_time=ASAP , entity_address = xx.xxx, type=Margherita, additional =coke) A: Thanks, so you ordered {type} Pizza with {*additional} and deliver to {entity_address} {ASAP}. (fulfillment: update_order, call_delivery_services)

问: xx.xxx 。 (意图= order_pizza,entity_time = ASAP,entity_address = xx.xxx ,type = Margherita,其他= coke) A: 谢谢,所以您订购了{ type }比萨饼和{* Additional }, 并交付给{ entity_address } { ASAP }。 (实现:update_order,call_delivery_services)

As you can see, slot and entity need to be filled during the conversation, and parent intent can trigger follwo_up intent, then action fulfillment will be activated base on the state of the conversation.

如您所见,在对话期间需要填充插槽和实体,并且父意图可以触发follwo_up意图,然后将根据对话的状态激活操作执行。

2.4. NLG: The natural language generation engine has different implementation and technology stack based on the type of chat system. For a task-oriented close domain conversation system, NLG is implemented via the response template with inter-replaceable parameters from “slot” and “entities” extracted from the conversation session. For an open domain chat system, text generation would be based on information retrieval, machine comprehension, knowledge graph, etc.

2.4。 NLG :自然语言生成引擎根据聊天系统的类型具有不同的实现和技术堆栈。 对于面向任务的近域对话系统,NLG是通过响应模板实现的,该模板具有从对话会话中提取的“ 时隙 ”和“ 实体 ”中可互换的参数。 对于开放域聊天系统,文本生成将基于信息检索,机器理解,知识图等。

2.5. TTS: Text to speech engine is performing the task exactly opposite to ASR. It transforms the plain text to voice record and plays it back with the synthetic voice to the end-user.

2.5。 TTS :文本到语音引擎正在执行与ASR完全相反的任务。 它将纯文本转换为语音记录,并将其与合成语音一起回放给最终用户。

Based on the above discussion, the below image offers a more comprehensive and realistic view of the Conversational AI system.

基于以上讨论,下图提供了会话式AI系统的更全面,更真实的视图。

Conversational AI System by Catherine Wang 对话式人工智能系统Catherine Wang

3.语音用户界面和用户体验设计 (3. Voice User Interface and User Experience Design)

GUI (Graphic User Interface) is dominating the human-machine interaction. It’s the game-changer in the PC world, and catalyst the massive adoption of digital devices in everyday life. Now we are facing the screen and interact with them all the time.

GUI(图形用户界面)主导着人机交互。 它是PC世界中的游戏规则改变者,并催化了日常生活中数字设备的大量采用。 现在我们面对屏幕并一直与他们互动。

But in the next decade, with the advances of AI, human and machine interaction will be shifting to voice. Voice User Interface will be the new entry point of smart and IoT devices. For example, when you say “Hey Google,” Googe home will be awake and start a conversation with you. In this case, the voice will become the new mouse and figure.

但是在接下来的十年中,随着AI的发展,人机交互将转向语音。 语音用户界面将成为智能和物联网设备的新切入点。 例如,当您说“ Hey Google”时,Googe home将会醒来并与您开始对话。 在这种情况下,声音将成为新的鼠标和图形。

unsplash 脱颖而出

In the GUI design, all the user interactions are pre-defined and guided by a series of clicks or swaps on the screen. But the VUI system, firstly, user’s behaviors are unpredictable and can diverge from the main storyline. Secondly, in the open conversation, a user might change the topic anytime and the user’s request might have compound intents that need to be fulfilled. Lastly, the voice interaction requires constant attention from users and agents because both parties need to remember what they said in the previous turns.

在GUI设计中,所有用户交互都是预先定义的,并在屏幕上进行一系列单击或交换来引导。 但是,VUI系统首先是用户的行为无法预测,并且可能与主要故事情节有所不同。 其次,在公开对话中,用户可以随时更改主题,并且用户的请求可能具有需要实现的复合意图。 最后,语音交互需要用户和代理不断关注,因为双方都需要记住他们在前几轮中所说的话。

The most successful Conversational AI system would consider voice and graphics complementary in their UI and UX design. A mature system should combine both traits to offer end-user a richer and immersive experience.

最成功的会话式AI系统在其UI和UX设计中将语音和图形视为互补。 一个成熟的系统应结合这两个特征,为最终用户提供更丰富和身临其境的体验。

  1. Modeling Multi-turn Conversation with Deep Utterance Aggregation. arXiv:1806.09102 [cs.CL]

    使用深度话语聚合为多回合会话建模。 arXiv:1806.09102 [cs.CL]

About me, I am a who is living in Melbourne, Australia. I studied computer science and applied statistics. I am passionate about general-purpose technology. Working in a Global Consulting firm as an AI Engineer lead‍, helping the organization to integrate AI solutions and harness its innovation power. See more about me on LinkedIn.

关于我,我是a ,住在澳大利亚墨尔本。 我学习了计算机科学和应用统计学。 我对通用技术充满热情。 在全球咨询公司担任AI工程师领导 ‍,帮助组织集成AI解决方案并利用其创新能力。 在LinkedIn上查看有关我的更多信息

翻译自: https://towardsdatascience.com/conversational-ai-key-technologies-and-challenges-part-1-a08345fc2160

ai人工智能对话了

你可能感兴趣的:(人工智能,python,机器学习)