openai-gpt
Imagine that we sent a robot-controlled spaceship out to the far reaches of the galaxy to contact other life forms. On the ship, we placed a copy of three years of all the text on the internet over the last three years so intelligent alien races would be able to learn something about us. After traveling twelve light-years, the ship enters the solar system around the star Luyten where it is boarded by aliens. The Luytenites retrieve the copy of the internet text and try to make sense of it.
想象一下,我们派出了一个机器人控制的飞船到银河的远处,以接触其他生命形式。 在船上,我们将过去三年中所有文本的三年副本放在互联网上,以便聪明的外星人种族可以了解一些关于我们的信息。 航行十二光年后,飞船进入恒星Luyten周围的太阳系,在那里被外星人登上。 Luytenites检索Internet文本的副本并尝试使其有意义。
They ask their top linguists to interpret these strange symbols but make little progress. The Luytenites were in the same position as eighteenth-century archaeologists who kept discovering stones with ancient Egyptian hieroglyphs. Finally, in 1799, archaeologists discovered the Rosetta stone which had both Egyptian hieroglyphs and ancient Greek text. Because they had what turned out to be the same decree in two languages, they were finally able to figure out the meanings of the hieroglyphs.
他们要求顶级语言学家解释这些奇怪的符号,但进展甚微。 Luytenites与18世纪的考古学家处于同一位置,他们不断发现带有古埃及象形文字的石头。 最终,在1799年, 考古学家发现了同时具有埃及象形文字和古希腊文字的Rosetta石。 因为他们发现两种语言的指令是相同的,所以他们最终能够弄清象形文字的含义。
But no such luck for our Luytenites. The internet text contained English, French, Russian, and other languages, but, of course, no Luytenitian text.
但是,对于我们的Luytenites来说,却没有这种运气。 互联网文本包含英语,法语,俄语和其他语言,但当然没有Luytenitian文本。
The best they could do was to analyze the statistical patterns of the symbols in the text. From this analysis, they were able to generate new text with similar statistical patterns. For example, they generated this piece of text:
他们所能做的最好的就是分析文本中符号的统计模式。 通过这种分析,他们能够生成具有类似统计模式的新文本。 例如,他们生成了这段文本:
After two days of intense debate, the United Methodist Church has agreed to a historic split — one that is expected to end in the creation of a new denomination, one that will be “theologically and socially conservative,” according to The Washington Post. The majority of delegates attending the church’s annual General Conference in May voted to strengthen a ban on the ordination of LGBTQ clergy and to write new rules that will “discipline” clergy who officiate at same-sex weddings. But those who opposed these measures have a new plan: They say they will form a separate denomination by 2020, calling their church the Christian Methodist denomination. The Post notes that the denomination, which claims 12.5 million members, was in the early 20th century the “largest Protestant denomination in the U.S.,” but that it has been shrinking in recent decades. The new split will be the second in the church’s history. The first occurred in 1968, when roughly 10 percent of the denomination left to form the Evangelical United Brethren Church. The Post notes that the proposed split “comes at a critical time for the church, which has been losing members for years,” which has been “pushed toward the brink of a schism over the role of LGBTQ people in the church.” Gay marriage is not the only issue that has divided the church. In 2016, the denomination was split over ordination of transgender clergy, with the North Pacific regional conference voting to ban them from serving as clergy, and the South Pacific regional conference voting to allow them.
经过两天的激烈辩论,卫理公会联合会已同意进行历史性分裂-据《华盛顿邮报》报道,该分裂有望以新教派的建立而告终,该教派将“在神学和社会上是保守的”。 参加5月教堂年度大会的大多数代表投票表决,加强了对LGBTQ神职人员的戒律的禁令,并制定了新的规则,以“纪律”主持同性婚礼的神职人员。 但是那些反对这些措施的人有一个新计划:他们说到2020年将形成一个单独的教派,称他们的教会为基督教卫理公会派。 邮报指出,这个拥有1250万成员的教派在20世纪初是“美国最大的新教教派”,但在最近几十年中一直在缩小。 新的分裂将是教堂历史上的第二次分裂。 第一次发生在1968年,当时约有10%的教派留下来,成立了福音派联合兄弟会。 《邮报》指出,拟议的分裂“是在教会已经失去成员多年的关键时刻到来的”,它已“被推向对LGBTQ人在教会中的角色分裂的边缘。” 同性婚姻不是使教会分裂的唯一问题。 2016年,跨性别神职人员的教职划分不同,北太平洋区域会议投票禁止他们担任神职人员,南太平洋区域会议投票允许他们。
The Luytenitians had no idea what this generated text meant and wondered if it would be meaningful to the race that had created the text.
Luytenitians不知道生成的文本是什么意思,并想知道这对创建文本的种族是否有意义。
This text was actually created by GPT-3, the largest machine learning system ever developed. GPT-3 was developed by OpenAI which has received billions of dollars of funding to create artificial general intelligence (AGI) systems that can acquire commonsense world knowledge and commonsense reasoning rules. GPT-3 has 175 billion parameters and reportedly cost $12 million to train.
该文本实际上是由GPT-3 (有史以来开发的最大的机器学习系统)创建的。 GPT-3由OpenAI开发,该公司已获得数十亿美元的资金来创建可获取常识世界知识和常识推理规则的人工智能(AGI)系统。 GPT-3的参数为1,750亿, 据称训练费用为1200万美元。
GPT-3 (GPT-3)
The OpenAI team used GPT-3 to generate eighty pieces of text like the one above and mixed those in with news texts generated by people. They did a study in which they asked workers recruited using Amazon’s Mechanical Turk to determine whether each article was generated by a person or a computer. The articles generated by GPT-3 were identified as machine-generated 52% of the time or only 2% better than chance. Essentially, these hired workers could not tell the difference between human-generated text and text generated by GPT-3. In fact, the news article shown above was identified as human-generated by 88% of the workers.
OpenAI团队使用GPT-3生成了上述文本的80条文本,并将这些文本与人们生成的新闻文本混合在一起。 他们进行了一项研究,要求使用Amazon的Mechanical Turk招聘的工人确定每篇文章是由人还是由计算机生成的。 由GPT-3生成的文章被确定为52%的时间是机器生成的,或者比偶然性好2%。 从本质上讲,这些雇用的工人无法分辨出人工生成的文本与GPT-3生成的文本之间的区别。 实际上,上面显示的新闻文章被88%的工人标识为人为产生。
Statistical models of text like GPT-3 are termed language models. GPT-3 is the latest in a line of increasingly powerful language models. The first GPT model, released in 2018, had about 150 million parameters. GPT-2, released in 2019, had 1.5 billion parameters which was an order of magnitude more parameters than the original GPT but two orders magnitude fewer than GPT-3.
像GPT-3这样的文本统计模型被称为语言模型 。 GPT-3是一系列功能越来越强大的语言模型中的最新版本。 第一个GPT模型于2018年发布,具有约1.5亿个参数。 GPT-2于2019年发布,具有15亿个参数,比原始GPT多了一个数量级,但比GPT-3少了两个数量级。
Some researchers have suggested that language models somehow magically learn commonsense knowledge about the world and learn to reason based on this commonsense knowledge. They argue that language models can use this commonsense knowledge and reasoning to generate texts. More importantly, this commonsense knowledge might serve as a foundation for the development of AGI capabilities.
一些研究人员建议,语言模型以某种方式神奇地学习有关世界的常识知识,并基于这种常识知识学习推理。 他们认为语言模型可以使用这种常识知识和推理来生成文本。 更重要的是,这种常识性知识可以作为AGI功能开发的基础。
事实检查 (Fact Check)
However, while the article generated by GPT-3 sounds plausible, if you make even a small attempt to validate the facts in the above text generated by GPT-3, you quickly realize that most of the important facts are wrong. What really happened was a January 2020 news story that was reported by many news outlets, including The Washington Post. The story was that officials of The United Methodist Church were proposing a split of the church that was to be voted on at the May 2020 General Conference.
但是,尽管由GPT-3生成的文章听起来似乎合理,但是,即使您稍作尝试来验证由GPT-3生成的上述文本中的事实,您也会很快意识到大多数重要事实都是错误的。 真正发生的是2020年1月的新闻报道,包括《华盛顿邮报》在内的许多新闻媒体都对此进行了报道。 故事是,联合卫理公会教堂的官员提议对教堂进行拆分,并在2020年5月的大会上进行投票。
The GPT-3 article presumably obtained most of its word patterns from these news articles. However, GPT-3 merged these word patterns into sentences that had most of its facts wrong:
GPT-3文章大概是从这些新闻文章中获得的。 但是,GPT-3将这些单词模式合并到了大多数事实都不正确的句子中:
- The General Conference takes place every four years not annually. 大会每四年举行一次,而不是每年举行一次。
- At the time of training, the vote at the General Conference was scheduled for May 2020. It had not happened yet. 培训时,大会投票定于2020年5月进行。
- The new rules to discipline clergy had not been voted on. There were, however, a set of previously proposed rules that had triggered the split discussion. 惩戒神职人员的新规则尚未获得表决。 但是,有一组先前提出的规则引发了分裂讨论。
- The 1968 split never happened. In fact, the 1968 event was a merger, not a split. 1968年的分裂从未发生。 实际上,1968年的事件是合并,而不是分裂。
- The church does not divide the General Conference (or any other conference that I could find information about) into North Pacific and South Pacific conferences with separate voting. 教会不会将大会(或我能找到有关信息的任何其他大会)分为具有单独投票权的北太平洋和南太平洋会议。
I do not have access to GPT-3 but everyone has access to its predecessor GPT-2 at the site https://talktotransformer.com/. You can type a starting text and GPT-2 creates follow-on text.
我没有访问GPT-3的权限,但是每个人都可以在https://talktotransformer.com/上访问其先前的GPT-2。 您可以输入起始文本,然后GPT-2会创建后续文本。
For example, when I entered “ Traffic in Connecticut…” , GPT-2 produced this text:
例如,当我输入“ 康涅狄格州的交通…”时, GPT-2生成了以下文本:
Traffic in Connecticut and New York is running roughly at capacity, with many Long Island Expressway and Long Island Rail Road interchanges carrying tolls. That could impact the rest of the year as drivers try to figure out whether their trip will be all right. In New Jersey, drivers can expect to be paying more than $1,000 for the first time to use the Port Authority’s new toll-free bridge across the Hudson River.
康乃狄克州 和纽约州的 交通 大致处于满负荷运转,许多长岛高速公路和长岛铁路立交桥处都设有收费站。 由于驾驶员试图弄清楚他们的旅行是否还可以,这可能会影响到今年的剩余时间。 在新泽西州,司机可以期望首次支付超过1,000美元,才能使用港口管理局新的横跨哈德逊河的免费电话桥。
The first sentence starts fine, but then it starts talking about tolls at Long Island Railroad interchanges. However, this violates our commonsense knowledge because we know that railroad cars do not stop for tolls. The second sentence is ok though it is hard to ascertain its meaning. The third sentence is where it goes off the rails. Tolls in New York and New Jersey are high, but they are not anywhere near $1,000.
第一个句子开头很好,但随后开始谈论长岛铁路立交桥处的通行费。 但是,这违反了我们的常识,因为我们知道有轨电车不会停车收费。 第二句可以,尽管很难确定其含义。 第三句话是它脱离歧途的地方。 纽约和新泽西的通行费很高,但都不到$ 1,000。
Why do GPT-3 and other language models get their facts wrong? Because GPT-3, like the fictitious Luytenitians, has no commonsense understanding of the meaning of its input texts or the text that is generated. It is just a statistical model.
为什么GPT-3和其他语言模型弄错了事实? 因为GPT-3与虚拟Luytenitians一样,对其输入文本或生成的文本的含义也没有常识。 这只是一个统计模型。
NYU Professor Gary Marcus has written many papers and given many talks criticizing the interpretation that GPT-2 acquires commonsense knowledge and reasoning rules. As he puts it: “…upon careful inspection, it becomes apparent the system has no idea what it is talking about…”. See also this New Yorker Magazine article that describes stories generated by GPT-2 after being trained on the magazine’s vast archives.
纽约大学教授加里·马库斯(Gary Marcus)发表了许多论文,并进行了多次演讲,批评了GPT-2获得常识性知识和推理规则的解释。 正如他所说:“……经过仔细检查,很明显该系统不知道它在说什么 ……”。 另请参阅《纽约客》杂志的这篇文章 ,其中介绍了GPT-2在接受该杂志的大量档案培训后所产生的故事。
结论 (Conclusion)
GPT-3 is learning statistical properties about word co-occurrences. On the occasions it gets its facts right, GPT-2 is probably just regurgitating some memorized sentence fragments. When it gets its facts wrong, it is because it is just string words together based on the statistical likelihood that one word will follow another word.
GPT-3正在学习有关单词共现的统计属性 。 在某些情况下,GPT-2可能只是在反省一些记忆的句子片段。 如果事实不正确,那是因为根据统计的可能性,一个单词将跟随另一个单词,只是将单词串在一起。
The lack of commonsense reasoning does not make language models useless. On the contrary, they can be quite useful. Google uses language models in its Smart Compose features in its Gmail system. Smart Compose predicts the next words a user will type, and the user can accept them by hitting the TAB key.
缺乏常识性推理并不会使语言模型失效。 相反,它们可能非常有用。 Google在其Gmail系统的Smart Compose功能中使用语言模型。 Smart Compose会预测用户将键入的下一个单词,并且用户可以通过按TAB键接受它们。
However, GPT-3 does not appear to be learning commonsense knowledge and learning to reason based on that knowledge. As such, it cannot jumpstart the development of AGI systems that apply commonsense reasoning to their knowledge of the world like people.
但是,GPT-3似乎不是在学习常识知识,也不是在学习基于该知识的推理。 因此,它无法Swift启动将常识性推理应用于他们像人一样的世界知识的AGI系统的开发。
Feel free to visit AI Perspectives where you can find a free online AI Handbook with 15 chapters, 400 pages, 3000 references, and no advanced mathematics.
随时访问 AI Perspectives ,您可以在其中找到免费的在线AI手册,其中包含15章,400页,3000份参考文献,并且没有高级数学知识。
Originally published at https://www.aiperspectives.com on July 6, 2020.
最初于 2020年7月6日 在 https://www.aiperspectives.com 上 发布 。
翻译自: https://towardsdatascience.com/gpt-3-has-no-idea-what-it-is-saying-95d4c1bad4a8
openai-gpt