语义网络的构建_使用语义ML构建语言支持的应用

语义网络的构建

In this post, I’ll show you how to use beginner-friendly ML tools — Semantic Reactor and TensorFlow.js--to build an app that’s powered by natural language.

在本文中,我将向您展示如何使用对初学者友好的ML工具(语义React器和TensorFlow.js)来构建由自然语言提供支持的应用程序。

Update: Semantic Reactor is live! Add it to Google Sheets here.

更新:语义React堆已上线! 在此处 将其添加到Google表格中

Most people are better at describing the world in language than they are at describing the world in code (well… most people). It would be nice, then, if machine learning could help bridge the gap between the two.

与用代码描述世界相比,大多数人更擅长使用语言描述世界(嗯…… 大多数人)。 那么,如果机器学习可以帮助缩小两者之间的差距,那就太好了。

That’s where “Semantic ML” comes in, an umbrella term for machine learning techniques that capture the semantic meaning of words or phrases. In this post, I’ll show you how you can use beginner-friendly tools (Semantic Reactor and Tensorflow.js) to prototype language-powered apps fast.

这就是“语义ML”出现的地方,它是捕获单词或短语的语义含义的机器学习技术的总称。 在本文中,我将向您展示如何使用对初学者友好的工具(语义React堆和Tensorflow.js)来快速构建基于语言的应用程序的原型。

Scroll down to dive straight into the code and tools, or keep reading for some background:

向下滚动以直接研究代码和工具,或者继续阅读一些背景知识:

了解语义ML (Understanding Semantic ML)

什么是嵌入? (What are Embeddings?)

One simple (but powerful) way Semantic ML can help us build natural-language-powered software is through a technique called embeddings.

一种简单(但功能强大)的语义ML可以通过一种称为嵌入的技术来帮助我们构建自然语言支持的软件。

In machine learning, embeddings are a learned way of representing data in space (i.e. points plotted on an n-dimensional grid) such that the distances between points are meaningful. Word vectors are one popular example:

在机器学习中,嵌入是一种学习的方式来表示空间中的数据(即,绘制在n维网格上的点),从而使点之间的距离有意义。 单词向量是一个流行的例子:

语义网络的构建_使用语义ML构建语言支持的应用_第1张图片
Visualization of word embeddings. This image comes from this Medium article . 可视化单词嵌入。 此图像来自 这篇中型文章

The picture above shows words (“England,” “he,” “fast”) plotted in space such that similar words (“dog” and “dogs”, “Italy” and “Rome”, “woman” and “queen”) are near each other. Each word is represented by a set of coordinates (or a vector), so the word “queen” might be represented by the coordinates [0.38, 0.48, 0.99, 0.48, 0.28, 0.38].

上图显示了在空间中绘制的单词(“ England”,“ he”,“ fast”),使得相似的单词(“ dog”和“ dogs”,“ Italy”和“ Rome”,“ woman”和“ queen”)彼此靠近。 每个单词都由一组坐标(或向量)表示,因此单词“ queen”可能由坐标[0.38、0.48、0.99、0.48、0.28、0.38]表示。

Where do these numbers come from? They’re learned by a machine learning model through data. In particular, the model learns which words tend to occur in the same spots in sentences. Consider these two sentences:

这些数字从何而来? 通过机器学习模型通过数据来学习它们。 特别地,该模型学习哪些单词倾向于出现在句子中的相同位置。 考虑以下两个句子:

My mother gave birth to a son.

我母亲生了一个儿子。

My mother gave birth to a daughter.

我母亲生了一个女儿

Because the words “daughter” and “son” are often used in similar contexts, the machine learning model will learn that they should be represented close to each other in space.

由于单词“ daughter”和“ son”通常在相似的上下文中使用,因此机器学习模型将学习它们在空间中应彼此靠近表示。

Word embeddings are extremely useful in natural language processing. They can be used to find synonyms (“semantic similarity”), to do clustering, or as a preprocessing step for a more complicated nlp model.

单词嵌入在自然语言处理中非常有用。 它们可用于查找同义词(“语义相似性”),进行聚类,或用作更复杂的nlp模型的预处理步骤。

嵌入整个句子 (Embedding Whole Sentences)

It turns out that entire sentences (and even short paragraphs) can be effectively embedded in space too, using a type of model called a universal sentence encoder. Using sentence embeddings, we can figure out if two sentences are similar. This is useful, for example, if you’re building a chatbot and want to know if a question a user asked (i.e. “When will you wake me up?”) is semantically similar to a question you-the chatbot programmer-have anticipated and written a response to (“What time is my alarm?”).

事实证明,使用一种称为通用句子编码器的模型,整个句子(甚至是简短的段落)也可以有效地嵌入到空间中。 使用句子嵌入,我们可以找出两个句子是否相似。 这很有用,例如,如果您正在构建聊天机器人,并且想知道用户提出的问题(即“什么时候唤醒我?”)在语义上类似于您(聊天机器人程序员)所预期的问题并写了一封答复(“我的闹钟是几点?”)。

语义React堆:Google表格中具有语义ML的原型 (Semantic Reactor: Prototype with Semantic ML in a Google Sheet)

Alright, now on to the fun part-building things!

好了,现在开始有趣的部分构建工作!

First, some inspiration: I originally became excited by Semantic ML when Anna Kipnis (former Game Designer at Double Fine, now at Stadia/Google) showed me how she used it to automate video game interactions. Using a sentence encoder model, she built a video game world that infers how the environment should react to player inputs using ML. It blew my mind. Check out our chat here:

首先,是一些启发:当安娜·基普尼斯 ( Anna Kipnis ,Double Fine的前游戏设计师,现为Stadia / Google)向我展示她如何使用它自动执行视频游戏互动时,我最初对语义ML感到兴奋。 她使用句子编码器模型构建了一个视频游戏世界,该世界可以推断环境如何使用ML对玩家的输入做出React。 这让我震惊。 在这里查看我们的聊天记录:

In Anna’s game, players interact with a virtual fox by asking any question they think of:

在安娜的游戏中,玩家通过问任何他们想到的问题来与虚拟狐狸互动:

“Fox, can I have some coffee?”

“狐狸,我可以喝杯咖啡吗?”

Then, using Semantic ML, the game engine (or the utility system) considers all of the possible ways the game might respond:

然后,使用语义ML,游戏引擎(或实用程序系统 )考虑游戏可能做出响应的所有可能方式:

Using a sentence encoder model, the game decides what the best response is and executes it (in this case, the best response is Fox brings you mug, so the game animates the Fox bringing you a mug). If that sounds a little abstract, definitely watch the video I linked above.

使用句子编码器模型,游戏确定最佳响应并执行(在这种情况下,最佳响应是Fox为您带来杯子 ,因此游戏为Fox为您带来杯子动画)。 如果听起来有点抽象,一定要看我上面链接的视频。

One of the neatest aspects of this project was that Anna prototyped it largely in a Google Sheet using a tool called Semantic Reactor.

该项目最艰难的方面之一是,安娜使用名为Semantic Reactor的工具在很大程度上将其原型制作在了Google表格中。

Semantic Reactor is a plugin for Google Sheets that allows you to use sentence encoder models right on your own data, in a sheet. Access it here. It’s a really great way to prototype Semantic ML apps fast, which you can then turn into code using TensorFlow.js models (but more on that in a minute).

Semantic Reactor是Google表格的插件,可让您直接在表格中的数据上使用句子编码器模型。 在这里访问。 这是快速制作语义ML应用程序原型的好方法,然后您可以使用TensorFlow.js模型将其转换为代码(稍后再介绍)。

Here’s a little gif of what the tool looks like:

这是该工具的外观的一些gif信息:

语义网络的构建_使用语义ML构建语言支持的应用_第2张图片
Gif of using Semantic Reactor, by Dale Markowitz. Dale Markowitz撰写的使用语义React堆的Gif图片。

To use Semantic Reactor, create a new Google sheet and write some sentences in the first column. Here, I’ll loosely recreate Anna’s fox demo (for all the nitty gritties, check out her original post). I put these sentences in the first column of my Google sheet:

要使用语义React器,请创建一个新的Google表格,并在第一栏中写一些句子。 在这里,我将轻松地重新创建Anna的fox演示(对于所有棘手的问题,请查看她的原始帖子 )。 我将这些句子放在Google工作表的第一栏中:

You’ll have to use your imagination here and think of these “actions” that a potential character (e.g. a chatbot or an actor in a video game) might take.

您必须在这里发挥您的想象力,并考虑潜在角色(例如,聊天机器人或视频游戏中的演员)可能采取的这些“行动”。

Once you’ve installed Semantic Reactor, you’ll be able to enable it by clicking on “Add-ons -> Semantic Reactor -> Start”.

一旦安装了语义React器,就可以通过单击“附加组件->语义React器->开始”来启用它。

语义网络的构建_使用语义ML构建语言支持的应用_第3张图片
Enable Semantic Reactor from the Add-ons drop down (Dale Markowitz) 从附加组件下拉列表中启用语义React堆(Dale Markowitz)

Clicking “Start” will open a panel that allows you to type in an input and hit “React”:

单击“开始”将打开一个面板,允许您键入输入并单击“React”:

语义网络的构建_使用语义ML构建语言支持的应用_第4张图片
Try out the Input/Response rankings (Dale Markowitz) 尝试输入/响应排名(Dale Markowitz)

When you hit “React”, Semantic Reactor uses a model to embed all of the responses you’ve written in that first column, calculate a score (how good a response is this sentence to the query?), and sorts the results. For example, when my input was “I want some coffee,” the top ranked responses from my spreadsheet were, “I go to the mug” and “I bring you the mug.”

当您点击“React”时,语义React器会使用一个模型将您编写的所有响应嵌入到第一栏中,计算分数(此句子对查询的响应有多好?),并对结果进行排序。 例如,当我输入“我要喝咖啡”时,电子表格中排名最高的响应是:“我去了杯子”和“我带了你杯子”。

You’ll also notice that there are two different ways to rank sentences using this tool: “Input/Response” and “Semantic Similarity.” As the name implies, the former ranks sentences by how good they are as responses to the given query, whereas “Semantic Similarity” simply rates how similar the sentences are to the query.

您还将注意到使用此工具对句子进行排名的两种不同方法:“输入/响应”和“语义相似性”。 顾名思义,前者按句子对给定查询的响应的程度对句子进行排名,而“语义相似性”只是对句子与查询的相似程度进行评分。

使用TensorFlow.js从电子表格到代码 (From Spreadsheet to Code with TensorFlow.js)

Underneath the hood, Semantic Reactor is powered by the open-source TensorFlow.js models found here.

引擎盖下的Semantic Reactor由此处提供的开源TensorFlow.js模型提供支持。

Let’s take a look at how to use those models in JavaScript, so that you can convert your spreadsheet prototype into a working app.

让我们看一下如何在JavaScript中使用这些模型,以便将电子表格原型转换为可正常使用的应用程序。

  1. Create a new Node project and install the module:

    创建一个新的Node项目并安装模块:
npm init
npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

2. Create a new file (use_demo.js) and require the library:

2.创建一个新文件( use_demo.js )并需要该库:

require('@tensorflow/tfjs');const encoder = require('@tensorflow-models/universal-sentence-encoder');

3. Load the model:

3.加载模型:

const model = await encoder.loadQnA();

4. Encode your sentences and query:

4.对句子进行编码并查询:

const input = {
queries: ["I want some coffee"],
responses: [
"I grab a ball",
"I go to you",
"I play with a ball",
"I go to school.",
"I go to the mug.",
"I bring you the mug."]
};const embeddings = await model.embed(input);

5. Voila! You’ve transformed your responses and query into vectors. Unfortunately, vectors are just points in space. To rank the responses, you’ll want to compute the distance between those points (you can do this by computing the dot product, which gives you the squared Euclidean distance between points):

5.瞧! 您已经将响应和查询转换为向量。 不幸的是,向量只是空间中的点。 要对响应进行排名,您需要计算这些点之间的距离(您可以通过计算点积来实现 ,该乘积可以为您提供点之间的平方欧几里得距离 ):

// zipWith :: (a -> b -> c) -> \[a\] -> \[b\] -> \[c\]const zipWith =
(f, xs, ys) => {
const ny = ys.length;
return (xs.length <= ny ? xs : xs.slice(0, ny))
.map((x, i) => f(x, ys\[i\]));
} // Calculate the dot product of two vector arrays.
const dotProduct = (xs, ys) => {
const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined; return xs.length === ys.length ?
sum(zipWith((a, b) => a * b, xs, ys))
: undefined;
}

If you run this code, you should see output like:

如果运行此代码,则应该看到如下输出:

[
{ response: 'I grab a ball', score: 10.788130270345432 },
{ response: 'I go to you', score: 11.597091717283469 },
{ response: 'I play with a ball', score: 9.346379028479209 },
{ response: 'I go to school.', score: 10.130473646521292 },
{ response: 'I go to the mug.', score: 12.475453722603106 },
{ response: 'I bring you the mug.', score: 13.229019199245684}
]

Check out the full code sample here.

在此处查看完整的代码示例。

And that’s it: that's how you go from a Semantic ML spreadsheet to code fast!

就是这样:这就是您从语义ML电子表格快速进行编码的方法!

Pretty cool, right? If you build something with these tools, make sure you let me know in the comments below or on Twitter.

很酷吧? 如果您使用这些工具进行构建,请确保在下面的评论中或在Twitter上让我知道。

Tweet @dalequark or follow @dale_on_ai on Instagram.

鸣叫@dalequark或在Instagram上关注@dale_on_ai 。

Originally published at https://daleonai.com on August 5, 2020.

最初于 2020年8月5日 发布在 https://daleonai.com 上。

翻译自: https://towardsdatascience.com/build-apps-powered-by-language-with-semantic-ml-c6f01fdf0e94

语义网络的构建

你可能感兴趣的:(网络,python,java,人工智能,机器学习)