gpt模型
什么是GPT-3?(What is GPT-3?)
In February 2019, the artificial intelligence research lab OpenAI sent shockwaves through the world of computing by releasing the GPT-2 language model. Short for “Generative Pretrained Transformer 2,” GPT-2 is able to generate several paragraphs of natural language text — often impressively realistic and internally coherent — based on a short prompt.
2019年2月,人工智能研究实验室OpenAI通过发布GPT-2语言模型在计算世界中发出了冲击波。 GPT-2是“ Generative Pretrained Transformer 2”的缩写,它能够根据简短提示生成自然语言文本的多个段落,这些段落通常令人印象深刻且内部连贯。
Scarcely a year later, OpenAI has already outdone itself with GPT-3, a new generative language model that is bigger than GPT-2 by orders of magnitude. The largest version of the GPT-3 model has 175 billion parameters, more than 100 times the 1.5 billion parameters of GPT-2. (For reference, the number of neurons in the human brain is usually estimated as 85 billion to 120 billion, and the number of synapses is roughly 150 trillion.)
几乎一年后,OpenAI已经outdon与GPT-3,新生成的语言模型,该模型是数量级比GPT-2大E自身。 GPT-3模型的最大版本具有1,750亿个参数,是GPT-2的15亿个参数的100倍以上。 (作为参考,人脑中神经元的数量通常估计为850亿至1200亿,而突触的数量约为150万亿。)
Just like its predecessor GPT-2, GPT-3 was trained on a simple task: given the previous words in a text, predict the next word. This required the model to consume very large datasets of Internet text, such as Common Crawl and Wikipedia, totalling 499 billion tokens (i.e. words and numbers).
就像其前身GPT-2一样,GPT-3也接受了一项简单的任务训练:给定文本中的前一个单词,然后预测下一个单词。 这要求模型使用非常大的Internet文本数据集,例如Common Crawl和Wikipedia ,总共需要4,990亿个令牌(即单词和数字)。
But how does GPT-3 work under the hood? Is it really a major step up from GPT-2? And what are the possible implications and applications of the GPT-3 model?
但是GPT-3如何在引擎盖下工作? 这真的比GPT-2迈出了一大步吗? GPT-3模型的潜在含义和应用是什么?
GPT-3如何工作? (How Does GPT-3 Work?)
Building GPT-3 required a monumental effort from OpenAI researchers. The details of the GPT-3 model are discussed in the May 2020 paper “Language Models are Few-Shot Learners,” which is 74 pages long and has more than 30 authors. Chuan Li, chief science officer at Lambda Labs, estimates that it would cost $4.6 million, and take 355 years, to run a single training cycle for the GPT-3 model on an NVIDIA Tesla V100 GPU.
构建GPT-3需要OpenAI研究人员做出巨大的努力。 GPT-3模型的详细信息在2020年5月的论文“语言模型很少学习”中进行了讨论,该论文长74页,拥有30多位作者。 Lambda Labs首席科学官Chuan Li估计,在NVIDIA Tesla V100 GPU上运行GPT-3模型的单个培训周期将花费460万美元,并且需要355年。
Perhaps the most impressive feat of GPT-3 is its ability to generate highly realistic sentences and paragraphs of text. GPT-3’s text generation feature is beyond even the capabilities of GPT-2, which were already a major advancement in the field.
GPT-3最令人印象深刻的壮举也许是它能够生成高度逼真的句子和文本段落。 GPT-3的文本生成功能甚至超出了GPT-2的功能,而GPT-2已经是该领域的重大进步。
Based on a prompt containing a brief title and subtitle, the GPT-3 model is able to “write” a roughly 200-word news article that appears at least superficially plausible. Below is the GPT-3-generated article that human judges had the most difficulty distinguishing from reality:
根据包含简短标题和副标题的提示,GPT-3模型能够“撰写”大约200字的新闻文章,至少看起来表面上看似合理。 以下是GPT-3生成的文章,人类法官最难将其与现实区分开:
Title: United Methodists Agree to Historic Split
标题:卫理公会同意历史性分裂
Subtitle: Those who oppose gay marriage will form their own denomination
副标题:反对同性恋婚姻的人将形成自己的宗派
Article: After two days of intense debate, the United Methodist Church has agreed to a historic split — one that is expected to end in the creation of a new denomination, one that will be “theologically and socially conservative,” according to The Washington Post. The majority of delegates attending the church’s annual General Conference in May voted to strengthen a ban on the ordination of LGBTQ clergy and to write new rules that will “discipline” clergy who officiate at same-sex weddings. But those who opposed these measures have a new plan: They say they will form a separate denomination by 2020, calling their church the Christian Methodist denomination.
文章:经过两天的激烈辩论,卫理公会联合会同意进行历史性的分裂-据《华盛顿邮报》报道,该分裂有望以新教派的建立而告终,该教派将“在神学和社会上是保守的”。 。 参加5月教堂年度大会的大多数代表投票表决,加强了对LGBTQ神职人员的戒律的禁令,并制定了新的规则,以“纪律”主持同性婚礼的神职人员。 但是那些反对这些措施的人有一个新计划:他们说到2020年将形成一个单独的教派,称他们的教会为基督教卫理公会派。
Note that this article is based on a kernel of truth: in January 2020, the United Methodist Church proposed a split as a result of disagreements over LGBT issues such as same-sex marriage. This seeming verisimilitude was likely key to how this passage convinced so many judges. However, GPT-3’s generated article gets a few notable facts wrong: the name of the new denomination has not been suggested, the proposal was not made at the church’s General Conference, and the Washington Post citation is not based on a real quote.
请注意,本文基于事实的真相:由于对同性婚姻等LGBT问题的分歧,联合卫理公会于2020年1月提出了分裂。 这种看似真实的态度可能是这段话如何说服众多法官的关键。 但是,GPT-3生成的文章中有一些值得注意的事实是错误的:未建议使用新教派的名称,该提议未在教堂的大会上提出,《华盛顿邮报》的引用也不基于真实报价。
Perhaps even more impressive, though, is GPT-3’s performance on a number of common tasks in natural language processing. Even compared with GPT-2, GPT-3 represents a significant step forward for the NLP field. Remarkably, the GPT-3 model can demonstrate very high performance, even without any special training or fine-tuning for these tasks.
不过,也许更令人印象深刻的是GPT-3在自然语言处理中执行许多常见任务的性能。 即使与GPT-2相比,GPT-3也代表了NLP领域的一大进步。 值得注意的是,即使没有针对这些任务的任何特殊培训或微调,GPT-3模型也可以表现出很高的性能。
For one, GPT-3 achieves very strong performance on “cloze” tests, in which the model is tasked with filling in the blank words in a sentence. Given the sentence below, for example, most people would insert a word such as “bat” in the blank space:
一方面,GPT-3在“完形填空”测试中取得了非常出色的性能,该模型的任务是填充句子中的空白单词。 例如,给定以下句子,大多数人会在空白处插入诸如“ bat”之类的词:
George bought some baseball equipment: a ball, a glove, and a _____.
乔治买了一些棒球装备:一个球,一个手套和一个_____。
The GPT-3 model can also easily adapt to new words introduced to its vocabulary. The example below demonstrates how, given a prompt that defines the new word, GPT-3 can generate a plausible sentence that even uses the word in past tense:
GPT-3模型还可以轻松地适应其词汇表中引入的新词。 下面的示例演示在给定定义新单词的提示的情况下,GPT-3如何生成合理的句子,甚至使用过去时使用该单词:
Prompt: To “screeg” something is to swing a sword at it. An example of a sentence that uses the word screeg is:
提示: “刮”东西就是在上面挥舞剑。 使用单词screeg的句子的示例是:
Answer: We screeghed at each other for several minutes and then we went outside and ate ice cream.
答:我们互相尖叫了几分钟,然后我们出去吃冰淇淋。
Surprisingly, GPT-3 is also able to perform simple arithmetic with a high degree of accuracy, even without being trained for this task. With a simple question such as “What is 48 plus 76?” GPT-3 can supply the correct answer almost 100 per cent of the time with two-digit numbers, and roughly 80 per cent of the time with three-digit numbers.
出乎意料的是,即使没有经过培训,GPT-3仍可以高精度地执行简单的算术运算。 提出一个简单的问题,例如“ 48加76是什么?” GPT-3可以在几乎100%的时间内用两位数提供正确的答案,而在大约80%的时间内用三位数可以提供正确的答案。
一般而言,GPT-3是什么意思? (What Does GPT-3 Mean, in General?)
In the weeks since the release of GPT-3, many experts have discussed the impact that the model might have on the state of deep learning, artificial intelligence, and NLP.
自GPT-3发布以来的几周内,许多专家讨论了该模型可能对深度学习,人工智能和NLP状态产生的影响。
First, GPT-3 demonstrates that it’s not necessary to have a task-specific dataset, or to fine-tune the model’s architecture, in order to achieve very good performance on specific tasks. For example, you don’t need to train the model on millions of addition and subtraction problems in order to get the right answer to a math question. Essentially, GPT-3 achieved its strong results primarily through brute force, scaling up the model to an incredible size.
首先,GPT-3证明,不必具有特定于任务的数据集或微调模型的体系结构,即可在特定任务上获得非常好的性能。 例如,您无需对数百万个加减法问题进行模型训练即可获得数学问题的正确答案。 本质上,GPT-3主要通过蛮力获得了出色的成绩,将模型扩大到了令人难以置信的大小。
This approach has earned mixed reviews from analysts. According to UCLA assistant computer science professor Guy Van den Broeck, the GPT-3 model is analogous to “some oil-rich country being able to build a very tall skyscraper.” While acknowledging the knowledge, skill, and effort required to build GPT-3, Van den Broeck claims that “there is no scientific advancement per se,” and that the model will not “fundamentally change progress in AI.”
这种方法赢得了分析师的不同评价。 根据UCLA助理计算机科学教授Guy Van den Broeck的说法,GPT-3模型类似于“一些石油资源丰富的国家能够建造一个很高的摩天大楼”。 Van den Broeck在承认构建GPT-3所需的知识,技能和精力的同时,声称“本身没有科学上的进步”,并且该模型不会“从根本上改变AI的进步”。
One issue is that the raw computing power required to train models like GPT-3 is simply out of reach for smaller companies and academia. Deep learning researcher Denny Britz compares GPT-3 to a particle collider in physics: a cutting-edge tool only accessible to a small group of people. However, Britz also suggests that the computing limitations of less well-endowed researchers will be a net positive for AI research, forcing them to think about why the model works and alternative techniques for achieving the same effects.
一个问题是,训练像GPT-3这样的模型所需的原始计算能力对于小型公司和学术界来说根本无法实现。 深度学习研究员Denny Britz将GPT-3与物理学中的粒子对撞机进行了比较:这是一小部分人只能使用的尖端工具。 但是,Britz还建议,欠缺less赋的研究人员的计算局限性将对AI研究产生积极影响,迫使他们思考为什么该模型起作用以及实现相同效果的替代技术。
Despite the impressive results, it’s not entirely clear what’s going on with GPT-3 under the hood. Has the model actually “learned” anything, or is it simply doing very high-level pattern matching for certain problems? The authors note that GPT-3 still exhibits notable weaknesses with tasks such as text synthesis and reading comprehension.
尽管取得了令人印象深刻的结果,但目前尚不清楚GPT-3到底发生了什么。 该模型是否实际上“学习了”任何东西,还是只是对某些问题进行了非常高级的模式匹配? 作者指出,GPT-3在诸如文本合成和阅读理解之类的任务上仍表现出明显的弱点。
What’s more, is there a natural limit to the performance of models like GPT-3, no matter how large we scale them? The authors also briefly discuss this concern, mentioning the possibility that the model “may eventually run into (or could already be running into) the limits of the pretraining objective.” In other words, brute force can only get you so far.
而且,无论我们将其缩放多大,GPT-3之类的模型的性能都有自然的限制吗? 作者还简要地讨论了这种担忧,并提到该模型“可能最终会(或可能已经)进入了预训练目标的极限。” 换句话说,蛮力只能带你走远。
Unless you have a few hundred spare GPUs lying around, the answer to these questions will have to wait until the presumed release of GPT-4 sometime next year.
除非您有几百个备用GPU,否则这些问题的答案将必须等到明年某个时候GPT-4的预计发布。
GPT-3对客户服务意味着什么? (What Does GPT-3 Mean for Customer Service?)
Although there’s still much more to learn about how GPT-3 works, the release of the model has wide-ranging implications for a number of industries — in particular, chatbots and customer service. The ability of GPT-3 to generate paragraphs of seemingly realistic text should appeal to anyone interested in creating more convincing, “human-like” AIs.
尽管还有更多关于GPT-3如何工作的知识,但该模型的发布对许多行业(尤其是聊天机器人和客户服务)具有广泛的影响。 GPT-3生成看似真实的文本段落的能力应该吸引那些对创建更具说服力的“类人” AI感兴趣的人。
Tech companies have tried for years to build chatbots that can effectively simulate conversations with their human interlocutors. Yet despite their best efforts, chatbots still aren’t able to simulate the conversational fluency and knowledge of a real human being over a sustained period of time. According to a 2019 survey, 86 per cent of people prefer to speak with humans instead of chatbots, and 71 per cent say they would be less likely to use a brand if there were no human agents available.
高科技公司已经尝试了多年的聊天机器人,可以有效地模拟与人类对话者的对话。 然而,尽管聊天机器人付出了最大的努力,但仍然无法在持续的时间内模拟真实人类的对话流畅性和知识。 根据2019年的一项调查,有86%的人更喜欢与人交谈而不是聊天机器人,还有71%的人说如果没有人的代理人,他们使用品牌的可能性就会降低。
Of course, GPT-3 was trained to generate articles and text, not to have a lifelike conversation. But there are indications that models like GPT-3 are approaching human-like language abilities — at least for shallow interactions, as would be involved in a chatbot conversation. The GPT-3 authors found that human judges could only identify the model’s fake articles 52 per cent of the time, which is little better than chance.
当然,GPT-3经过培训可以生成文章和文本,而不要进行逼真的对话。 但是有迹象表明,像GPT-3这样的模型正在接近人类的语言能力-至少对于浅层交互而言,就像聊天机器人对话中所涉及的那样。 GPT-3的作者发现,人类法官只能在52%的时间内识别出该模型的伪造品,这比偶然性好。
It’s not only the realism of GPT-3, but also the advanced tasks it’s able to perform, that differentiate it from the current field of chatbots. Many chatbots on companies’ websites are simply intended as a customer service quality filter, suggesting some common solutions for users before transferring them to a human agent if necessary.
不仅是GPT-3的真实性,还有GPT-3能够执行的高级任务,都将其与当前的聊天机器人领域区分开。 公司网站上的许多聊天机器人仅用作客户服务质量过滤器,可在必要时为用户提供一些常见的解决方案,然后再将其转移给人工代理。
Meanwhile, in terms of natural language processing, GPT-3 is much closer to an “artificial general intelligence” than any chatbot built thus far (although it’s still far from a true AGI). It’s conceivable that one day, highly advanced models like GPT-3 could parse users’ complex queries and solve their problems automatically, without a human agent ever needing to step in.
同时,就自然语言处理而言,GPT-3比到目前为止构建的任何聊天机器人都更接近“人工通用情报” (尽管距离真正的AGI还很远)。 可以想象,有一天,像GPT-3这样的高级模型可以解析用户的复杂查询并自动解决他们的问题,而无需人工干预。
Furthermore, groundbreaking conversational AIs such as Google’s Meena and Facebook’s BlenderBot, both released in 2020, have also demonstrated that the “brute force” approach is effective when applied specifically to chatbots. Meena and BlenderBot have 2.6 billion and 9.4 billion parameters, respectively, which are only tiny fractions of GPT-3’s 175 billion. It may only be a matter of time before these models pass the Turing test by expanding to the scale of GPT-3, making them virtually indistinguishable from humans in short text conversations.
此外,开创性的对话式AI(例如Google的Meena和Facebook的BlenderBot )都在2020年发布,它们也证明了“蛮力”方法特别适用于聊天机器人。 Meena和BlenderBot分别具有26亿和94亿个参数,仅占GPT-3的1,750亿的很小一部分。 通过扩展到GPT-3的规模,使这些模型通过Turing测试只是时间问题,这使得它们在短文本对话中几乎无法与人类区分开。
OpenAI hasn’t yet released the full model or source code for GPT-3, as they did gradually with GPT-2 last year. This puts GPT-3 out of reach for any companies interested in the model’s practical applications (at least for now). But this isn’t the last we’ll hear about GPT-3 by a long shot. We live in exciting times — and whatever research comes next down the pipeline, it will be sure to advance our understanding of the capabilities (and limits) of AI.
OpenAI尚未发布GPT-3的完整模型或源代码,就像去年他们逐步使用GPT-2一样。 对于任何对模型的实际应用感兴趣的公司(至少现在),这都使GPT-3望尘莫及。 但是,这并不是我们长期听到的有关GPT-3的最新消息。 我们生活在激动人心的时代-无论下一步进行什么研究,都一定会增进我们对AI功能(和局限性)的理解。
If I managed to retain your attention to this point, leave a comment describing how this story made a difference for you or subscribe to my weekly newsletter to receive more of this content:
如果我设法保持您对这一点的关注,请留下描述此故事对您有何影响的评论,或订阅我的每周时事通讯以获取更多此类内容:
演示地址
Dr Mark van Rijmenam is the founder of Datafloq and Mavin, he is a globally recognised speaker on big data, blockchain, AI and the Future of Work. He is a strategist and author of 3 management books: Think Bigger, Blockchain and The Organisation of Tomorrow. You can read a free preview of his latest book here. Connect with him on LinkedIn or say hi on Twitter mentioning this story.
Mark van Rijmenam博士是Datafloq和Mavin的创始人,他是大数据,区块链,人工智能和未来工作领域的全球知名演讲者。 他是战略家,着有三本管理书籍:《 Think Bigger》 ,《 Blockchain》和《 The Organization of Tomorrow》 。 您可以在这里免费阅读他最新著作的预览。 在LinkedIn上与他联系,或者在Twitter上打个招呼,提及这个故事。
If you would like to talk Mark to about any advisory work or virtual speaking engagements and webinars, then you can contact him at https://vanrijmenam.nl
如果您想与Mark谈谈任何咨询工作或虚拟演讲活动和网络研讨会,则可以通过https://vanrijmenam.nl与他联系。
翻译自: https://medium.com/ai-in-plain-english/the-gpt-3-model-what-does-it-mean-for-chatbots-and-customer-service-be51eced4895
gpt模型