openai-gpt_GPT-3底漆

openai-gpt

GPT-3 is likely the most computationally-expensive machine learning model. The neural network’s 175 billion parameters make it about ten times larger than the previous largest language model (Turing NLG, 17 billion parameters, released by Microsoft in February 2020). The 430GB of text GPT-3 was trained on was drawn widely from the internet and supplemented with text from books. The model works by seeing some amount of text that has come previously (up to a maximum of about 2,000 words) and predicting the next word to generate novel text.

GPT-3可能是计算量最大的机器学习模型。 神经网络的1,750亿个参数使其比以前的最大语言模型( Turing NLG ,170亿个参数,由Microsoft在2020年2月发布)大大约十倍。 经过培训的430GB文本GPT-3是从互联网上广泛获取的,并补充了书籍中的文本。 该模型通过查看先前产生的一些文本(最多约2000个单词)并预测下一个单词以生成新颖的文本来工作。

Users interact with the model by providing a prompt. An example prompt for a chatbot-style interaction from OpenAI (the organization that created GPT-3) is “The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly”. In addition to supplying a prompt, users are able to specify certain parameters for things like how long the output should be, how likely words are to be repeated, or the randomness of the output.

用户通过提供提示来与模型进行交互。 来自OpenAI(创建GPT-3的组织)进行聊天机器人式交互的示例提示是“以下是与AI助手的对话。 助手是乐于助人,富有创造力,聪明且非常友善的。” 除了提供提示之外,用户还可以为某些参数指定某些参数,例如输出应多长时间,重复单词的可能性或输出的随机性。

它能做什么? (What can it do?)

GPT-3 demonstrates reasonable proficiency on almost all standard natural language processing benchmarks, including state-of-the-art performance on a few of them. The benchmarks include challenges such as using a paragraph of context to predict the last word of a related sentence and determining which noun a grammatically ambiguous but contextually unambiguous pronoun refers to. Other benchmarks involve translating between languages and answering general knowledge questions. This proficiency was achieved without the task-specific fine-tuning that most cutting-edge models use. GPT-3 is capable of being fine-tuned, and further fine-tuning would almost certainly improve the results of the model on each of the specific benchmarks (at the expense of worse performance outside of the task it was fine-tuned on).

GPT-3证明了几乎所有标准自然语言处理基准的合理熟练程度,包括其中一些基准的最新性能。 基准测试包括一些挑战,例如使用上下文段落来预测相关句子的最后一个单词,以及确定语法上歧义但上下文上明确的代词所指的名词。 其他基准包括在语言之间进行翻译并回答常识性问题。 无需大多数尖端模型使用的特定于任务的微调就可以达到这种熟练程度。 GPT-3可以进行微调,并且进一步的微调几乎可以肯定会改善每个特定基准上的模型结果(以微调任务之外的性能较差为代价)。

OpenAI also tested GPT-3 on some non-standard tasks:

OpenAI还对一些非标准任务测试了GPT-3:

Generating News Articles

生成新闻文章

A sample of around 80 people was asked to distinguish between real articles and articles with the last 200 words generated by GPT-3. The participants were unable to reliably distinguish between the real articles and those completed by GPT-3 (participants correctly categorized 52% of the articles they saw, 50% was within the 95% confidence interval). The participants did not improve their accuracy when the amount of text generated by the model was increased to 500 words (accuracy stayed at 52%).

要求大约80个人的样本来区分真实文章和GPT-3生成的最后200个单词的文章。 参与者无法可靠地区分真实文章和GPT-3完成的文章(参与者正确分类了他们看到的文章的52%,其中50%在95%置信区间内)。 当模型生成的文本量增加到500个单词时,参与者没有提高他们的准确性(准确性保持在52%)。

SAT Analogies

SAT类比

When asked to complete SAT analogy problems, the model correctly answered 14% more problems than an average college applicant.

当被要求完成SAT类比问题时,该模型正确回答的问题比普通大学申请者多出14%。

Arithmetic

算术

The chart below shows the accuracy of the model when it is prompted with several example math problems and then asked to answer one. The results for the model I’ve been referring to as GPT-3 is on the far right (175B). OpenAI created several versions of the model to test how performance varied across different model sizes. Larger models show a marked improvement.

下图显示了模型的一些示例数学问题提示然后要求回答时的准确性。 我一直称为GPT-3的模型的结果在最右边(175B)。 OpenAI创建了该模型的多个版本,以测试不同模型大小之间的性能差异。 较大的型号显示出明显的改进。

Overall, the model is able to successfully answer two-digit addition and subtraction problems reliably. For all other problem types, the model is not able to consistently give the correct answer but is significantly better than chance.

总体而言,该模型能够可靠地成功回答两位数的加减问题。 对于所有其他问题类型,该模型无法始终如一地给出正确答案,但比偶然性要好得多。

Metrics are one thing, but the best way to get a feel for the capabilities of the model is to see the outputs. Many people are demonstrating potential use cases for GPT-3. Here are some highlights:

指标是一回事,但是了解模型功能的最好方法是查看输出。 许多人正在演示GPT-3的潜在用例。 以下是一些要点:

Creating layouts in JavaScript (video here)

用JavaScript创建布局( 此处有视频 )

Creating an API in Python (video here)

用Python创建API( 此处是视频 )

Creating functions in Python (video here)

在Python中创建函数( 此处有视频 )

Summarizing an NDA for a second grader (video here)

总结二年级学生的保密协议( 此处有视频 )

演示地址

Writing like an attorney

像律师一样写作

“Search engine”… that doesn’t actually search (video here)

“搜索引擎”…实际上并没有进行搜索( 此处有视频)

Writing poetry

写诗

More project links here

更多项目链接 在这里

Of course, it’s hard to judge the model based solely on a few cherry-picked examples. It seems to be relatively easy to demonstrate impressive capabilities. Generating results that are reliably good enough to use in some sort of production setting (ie. as a customer service bot) is a very different story. It’s likely that the model will be most useful in either systems with a human in the loop (perhaps generating a suggested response for a human to approve or edit), or for applications that don’t require consistently good results (such as generating fun fictional stories like AI Dungeon).

当然,仅根据一些精心挑选的例子来判断模型是很困难的。 展示令人印象深刻的功能似乎相对容易。 产生可靠可靠的结果以用于某种生产环境(即作为客户服务机器人)是完全不同的故事。 该模型可能会在以下情况下最有用:在有人在循环中的系统中(可能会生成建议的响应以供人批准或编辑),或者在不需要持续良好结果的应用中(例如产生有趣的虚构内容)诸如AI地牢的故事)。

如何使用? (How can I use it?)

The model will be available through an API. OpenAI currently has a private beta release of the API with a waitlist you can sign up for here. Pricing information for the API hasn’t been announced yet, but we know that the electricity costs of generating 100 pages of content from the model are a few cents. A cost of using the API in the range of $0.50 to $5 per 100 pages generated would seem to be reasonable in order to pay back the initial costs of creating the model, but it’s hard to say.

该模型将通过API提供 。 OpenAI当前有一个API的私有beta版本,其中包含您可以在此处注册的候补列表。 该API的定价信息尚未公布,但我们知道从该模型生成100页内容的电费为几美分。 为了偿还创建模型的初始成本,使用API​​的成本每100页$ 0.50到$ 5,这似乎是合理的,但是很难说。

Alternatively, you can access the model through AI Dungeon. Note that the free tier of AI Dungeon uses text generated by GPT-2, not GPT-3. In order to use GPT-3, you will need to sign up for the paid version (though the first 7 days are free). After signing up, you will need to change the settings to use the “Dragon” model (aka GPT-3) as opposed to the “Griffin” model (aka GPT-2). The paid version also includes an option for custom prompts (“scenarios”) which means you don’t need to use the standard story prompts.

或者,您可以通过AI Dungeon访问模型。 请注意,AI Dungeon的免费层使用的是GPT-2(而不是GPT-3)生成的文本。 为了使用GPT-3,您需要注册付费版本(尽管前7天是免费的)。 注册后,您需要更改设置以使用“ Dragon”模型(aka GPT-3),而不是“ Griffin”模型(aka GPT-2)。 付费版本还包括一个自定义提示(“场景”)选项,这意味着您不需要使用标准的故事提示。

这里有什么新消息? (What’s new here?)

First, the wide-ranging capabilities of the model exceed what is publicly available. It’s difficult to predict what people will be able to make with the model, but it’s likely the model will be used in new ways and improve results in areas where language models are already used.

首先,该模型的广泛功能超出了公开可用的功能。 很难预测人们将使用该模型做什么,但是很可能该模型将以新的方式使用,并在已经使用语言模型的领域中改善结果。

In addition to the practical new uses of the model, there are some interesting takeaways from the research:

除了该模型的实际新用途外,该研究还有一些有趣的收获:

Bigger models are better

型号越大越好

Perhaps the most important point is that larger models continue to perform better. Prior to GPT-3, researchers had observed a power-law relationship between model size and performance. They saw that there were diminishing returns to using additional computational power during the training of models, but still significant performance gains for more expensive models. Despite the trend at lower levels of computation, there was some debate about how far that trend could be extrapolated. After GPT-3 it’s still not clear where the limits of that trend may be, but we haven’t reached them yet. Despite GPT-3 being ten times larger than the previous largest model, it’s performance is what would be expected from the previously observed trend.

也许最重要的一点是,较大的模型将继续表现更好。 在GPT-3之前,研究人员已经观察到模型尺寸与性能之间的幂律关系。 他们看到在模型训练期间使用额外的计算能力的收益递减,但是对于更昂贵的模型,仍然可以显着提高性能。 尽管趋势出现在较低的计算级别上,但人们仍在争论该趋势可以推断多远。 在GPT-3之后,尚不清楚该趋势的极限在哪里,但我们尚未达到极限。 尽管GPT-3比以前的最大模型大十倍,但它的性能仍是以前观察到的趋势所期望的。

GPT-3 paper) GPT-3论文 )

The above graph shows model performance (lower is better) across a range of model sizes and computational expenditure. GPT-3 is the yellow line, and the power-law represented by the dotted line seems to be holding at all the model sizes OpenAI tested.

上图显示了在各种模型尺寸和计算支出范围内的模型性能(越低越好)。 GPT-3是黄线,用虚线表示的幂律似乎在OpenAI测试的所有模型尺寸上都保持不变。

There are multiple estimates for how much it cost OpenAI to train GPT-3. One estimate says $4.6 million. Another says $12 million. Neither includes researcher compensation. Regardless of the true number, the takeaway doesn’t change. GPT-3 was extraordinarily cheap to produce given its potential applications, and larger models will likely follow. Google spent much more on food in 2008 than OpenAI just spent to create a state-of-the-art language model with commercial applications. There’s plenty of money to push towards larger models if that direction is deemed promising enough. After GPT-3 it’s hard to argue against larger models being significantly more effective. Funding is not the only constraint towards creating more powerful models. There is a significant amount of novel engineering that needs to be done to train this kind of model, but OpenAI is not the only organization with the talent to accomplish that.

关于OpenAI训练GPT-3的费用,有多种估计。 一项估计为460万美元。 另一个说1200万美元。 两者都不包括研究人员的报酬。 无论真实数字如何,外卖都不会改变。 鉴于其潜在的应用,GPT-3的生产非常便宜,并且可能会出现更大的型号。 Google 在食品上的花费比OpenAI在创建具有商业应用程序的最新语言模型方面的花费要多得多。 如果认为有足够的前途,那么有大量的钱来推动更大的模型。 在GPT-3之后,很难反对大型模型的有效性大大提高。 资金并不是创建功能更强大的模型的唯一限制。 要训​​练这种模型,需要进行大量的新颖工程设计,但是OpenAI并不是唯一具有才能做到这一点的人才。

Meta-learning

元学习

The fact that GPT-3 has the ability to do arithmetic, when only very few of the specific problems it was tested on were likely to be in the training data, implies the model is somehow actually learning how to do the mathematical operations. That point is further supported by the authors of the paper stating, “inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a ‘1’, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table”. GPT-3 also correctly answers about 20% of single digit combined operations (for example, 9*(7+5)) — a rate much better than random chance. It is remarkable that a model trained simply to predict the next word in a text appears to be learning how to do math in order to better predict the next word. These results raise questions about what new capabilities models might acquire at a significantly larger scale. For example, could a sufficiently powerful language model read thousands of scientific papers and use that data to successfully predict the results of novel experiments?

当只有GPT-3所测试的特定问题很少出现在训练数据中时,GPT-3才具有算术的能力,这表明该模型实际上在某种程度上学习了如何进行数学运算。 该论文的作者进一步指出了这一点,“检查不正确的答案表明该模型经常会犯一些错误,例如不携带'1',这表明该模型实际上是在尝试执行相关的计算,而不是记住表格” 。 GPT-3还可以正确回答约20%的一位数组合运算(例如9 *(7 + 5)),这一比率比随机机会要好得多。 值得注意的是,仅训练以预测文本中下一个单词的模型似乎正在学习如何进行数学运算以更好地预测下一个单词。 这些结果引起了人们对新功能模型可能在更大范围内获得的疑问。 例如,一个功能强大的语言模型能否阅读成千上万篇科学论文,并使用这些数据成功地预测新颖实验的结果?

Few-shot learning

快速学习

Most large, publicly available machine learning systems take the approach of doing a large amount of training on some sort of generalized data and then fine-tuning the model on domain-specific data. GPT-3 demonstrates proficiency in many domains by replacing the fine-tuning step with what OpenAI has dubbed “few-shot learning”. Few-shot learning is simply showing the model a few successful examples of what you want it to do in the prompt the model is given. For example, a prompt to get the model to successfully answer general-knowledge questions might look like this, with the last question being the one you want GPT-3 to answer.

大多数大型的公开可用的机器学习系统都采用对某种通用数据进行大量训练,然后对特定于领域的数据进行模型微调的方法。 GPT-3通过用OpenAI称之为“少量学习”的微调步骤代替了微调步骤,从而证明了在许多领域的熟练程度。 很少有的学习只是简单地向模型显示了一些成功的示例,这些示例说明了在给出模型后您希望它执行的操作。 例如,提示让模型成功回答一般知识的问题可能看起来像这样,最后一个问题就是您希望GPT-3回答的问题。

this post 这篇文章中的API屏幕截图

It is also possible to use the model by providing a prompt with no examples (“zero-shot”) or one example (“one shot”), but the model generally performs better the more examples it sees.

也可以通过不提供任何示例(“零镜头”)或一个示例(“一个镜头”)的提示来使用模型,但是看到的示例越多,模型的性能通常就越好。

The few-shot learning approach has several benefits:

快速学习方法有几个好处:

  • First, few-shot learning may make machine-learning more accessible. The pool of people who would feel comfortable entering a prompt like the one above is MUCH larger than the pool of people who have the technical knowledge to fine-tune a model.

    首先,少量学习可以使机器学习更加容易。 愿意输入上述提示的人比那些具有微调模型技术知识的人大得多。
  • Second, prompting models in this way may enable machine-learning models to be used in domains where acquiring the large amounts of structured training data necessary for fine-tuning is infeasible.

    其次,以这种方式进行提示的模型可以使机器学习模型可以在无法获取微调所需的大量结构化训练数据的领域中使用。
  • Lastly, few-shot learning makes the model more flexible. With the typical fine-tuning approach, the underlying model weights are actually changed for a specific task, so fine-tuning sacrifices generalizable performance for better performance on a particular application of the model. By contrast, a model that uses the few-shot learning approach does not change the underlying model.

    最后,少量学习使模型更加灵活。 使用典型的微调方法,实际上针对特定任务更改了基础模型权重,因此微调牺牲了通用性能,以在模型的特定应用程序上获得更好的性能。 相比之下,使用快速学习方法的模型不会更改基础模型。

As the graph below shows, few-shot learning works better the larger the model is. Few-shot learning is not just a viable alternative to fine-tuning with the current state of machine learning, it will continue to get more effective with larger future models. The increasing effectiveness of few-shot learning combined with the direct performance gains from increasing model size will likely cause a trend towards larger models that use few-shot learning.

如下图所示,模型越大,学习次数越少效果越好。 快速学习不仅是对当前机器学习状态进行微调的可行选择,而且在更大的未来模型中将继续获得更高的效率。 一次性学习的有效性不断提高,再加上模型尺寸的增加直接带来了性能提升,这很可能会导致使用一次性学习的大型模型的发展趋势。

GPT-3 paper GPT-3纸上的图表

语言模型有多强大? (How powerful can language models get?)

This paper by OpenAI investigates the scaling of language models. The researchers treat model performance as a function of the size of the model, the amount of training data, and the computational power used to train the model. They find a power-law relationship where more scaling the inputs reliably leads to better performance. Although the paper was written prior to GPT-3, the new model is consistent with the relationship they found even though it is at a scale much greater than they were able to test. The researchers extrapolate the trend to find the point at which a model (using the optimal ratio of inputs) would reach the theoretical maximum performance of a similar language model — a point where all of the information has been extracted from text. It’s entirely possible that this pattern will break for unforeseen reasons before reaching that point. If the trend holds however, the researchers estimate that maximum performance would be reached by a model with about 1 trillion parameters, trained on 1 trillion tokens (about 1.4 terabytes), and using about 10,000 petaflop/s-days of compute (pg. 17).

OpenAI的这篇论文研究了语言模型的扩展。 研究人员将模型性能视为模型大小,训练数据量以及用于训练模型的计算能力的函数。 他们发现了幂律关系,在这种关系中,更多地缩放输入可可靠地带来更好的性能。 尽管该论文是在GPT-3之前撰写的,但新模型与他们发现的关系是一致的,即使它的规模比他们能够测试的要大得多。 研究人员推断了趋势,以找到模型(使用最佳输入比率)将达到类似语言模型的理论最大性能的点-从文本中提取所有信息的点。 在到达该点之前,这种模式很可能会因无法预料的原因而中断。 但是,如果趋势保持下去,研究人员估计,具有约1万亿个参数,使用1万亿令牌(约1.4 TB)训练,并使用约10,000 petaflop / s-days的计算量的模型将达到最大性能(第17页) )。

The paper cautions, “the numerical values are highly uncertain, varying by an order of magnitude in either direction depending on the precise values of the exponents from the power-law fits. The most obvious interpretation is that our scaling laws break down at or before we reach this point, which is still many orders of magnitude away in both compute and model size”. That was written before GPT-3, and GPT-3 is within an order of magnitude now. The equation from that paper predicts training loss to be 1.75 with 10,000 petaflop/s-days of compute, while the updated equation from the GPT-3 paper predicts a training loss of 1.65. After updating the trend line with the newest data from GPT-3, the theoretical best language model appears more achievable than the previous paper (and the numbers here) show.

该论文警告说:“数值是高度不确定的,根据幂定律拟合中的指数的精确值,其数值在两个方向上的变化幅度都是一个数量级。 最明显的解释是,我们的缩放定律在达到这一点或达到这一点之前就崩溃了,这在计算和模型大小上仍相差很多数量级。 那是在GPT-3之前写的,而GPT-3现在在一个数量级之内。 该论文中的方程式预测,以10,000 petaflop / s天的计算,训练损失为1.75,而GPT-3论文中更新的方程式预测训练损失为1.65。 用来自GPT-3的最新数据更新趋势线后,理论上最好的语言模型似乎比以前的论文(和此处的数字)更容易实现。

It’s worth noting that, assuming the trend doesn’t break down, it likely underestimates future performance. The relationship does not account for future improvements in training techniques. OpenAI has used a consistent process for training various versions of their GPT model, but other researchers have continued to improve the training process of similar models. GPT-3 was not trained in a cutting-edge way.

值得注意的是,假设趋势没有破裂,可能会低估未来的表现。 这种关系并不能说明培训技术的未来改进。 OpenAI使用一致的过程来训练其GPT模型的各种版本,但其他研究人员仍在继续改进类似模型的训练过程。 GPT-3并未经过前沿培训。

GPT-3 paper, GPT-3纸 , GPT-2 paper, and GPT-2纸和 the scaling paper 定标纸的数据

If a next-generation model scales as much as GPT-3 did, it will be well beyond the theoretical best model predicted by the power-law that has been observed so far. If the trend breaks, we’ll get important information about the limits of current approaches. If the trend doesn’t break, we’ll be living in a very different world.

如果下一代模型的缩放比例与GPT-3一样大,那么它将远远超出迄今为止所观察到的幂定律所预测的理论最佳模型。 如果趋势破裂,我们将获得有关当前方法限制的重要信息。 如果趋势没有破灭,我们将生活在一个截然不同的世界中。

进一步探索: (Further exploration:)

GPT-3 paper

GPT-3纸

OpenAI blogpost

OpenAI博客文章

Gwern’s post

Gwern的帖子

Lambda Labs post

Lambda Labs帖子

Lambda Labs aggregates and summarizes other content

Lambda Labs汇总并总结其他内容

Good overview

好的概述

Slatestarcodex post

Slatestarcodex发布

Analysis of potential constraints to scaling future models

分析扩展未来模型的潜在约束

Examples of uses, and details about API parameters

使用示例,以及有关API参数的详细信息

Computerfile video

计算机文件视频

Collection of more demos and articles

收集更多演示和文章

翻译自: https://towardsdatascience.com/gpt-3-primer-67bc2d821a00

openai-gpt

你可能感兴趣的:(python,java)