openai-gpt
Like Sam’s tweets, there is a lot of hype in the tech community about the latest GPT-3 released by OpenAI in June 2020, but it is still powerful and impressive when you interact with it. GPT-3 is the largest language model ever trained and achieved good results on several NLP tasks like language generation and language translation, with huge potentials for many other creative and functional tasks.
就像Sam的推文一样,技术界也对OpenAI于2020年6月发布的最新GPT-3进行了大肆宣传,但是当您与之互动时,它仍然功能强大且令人印象深刻。 GPT-3是有史以来最大的语言模型,在一些NLP任务(如语言生成和语言翻译)上获得了良好的培训,并取得了良好的效果,在许多其他创造性和功能性任务中也具有巨大的潜力。
Here we will go over a couple of highlights and have a more clear view of what the model can and cannot do, and how to utilize it to empower various applications with examples.
在这里,我们将重点介绍几个亮点,并更清晰地了解模型可以做什么和不能做什么,以及如何利用它通过示例为各种应用程序提供支持。
为什么具有破坏性? (Why it is disruptive?)
巨大 (Gigantic)
GPT-3 use the Transformer framework and Attention architecture, which is the same model architecture as GPT-2. It comes with different sizes, the largest (or “GPT-3”) has 175B trainable parameters, 96 layers, 96 heads in each layer, each head with a dimension of128. Even the batch size is huge at 3.2M.
GPT-3使用了Transformer框架和Attention架构 ,该架构与GPT-2相同。 它具有不同的大小,最大的(或“ GPT-3”)具有175B可训练参数,96层,每层96个磁头,每个磁头尺寸为128。 甚至批量也很大,为3.2M。
It is trained on huge text corpus from sources like
它通过诸如
filtered Common Crawl(410B tokens * 0.44)
过滤的通用抓取(410B令牌* 0.44)
WebText2(19B * 2.9)
WebText2(19B * 2.9)
2 internet Books(12B * 1.9 + 55B * 0.43)
2本书(12B * 1.9 + 55B * 0.43)
English Wikipedia(3B * 3.4)
英文Wikipedia(3B * 3.4)
According to the paper, to train the largest GPT-3(175B parameters) needs 36400 petaflop/s-day, and 1 petaflop/s-day is equivalent to 8 V100 GPUs at full efficiency of a day.
根据该论文 ,训练最大的GPT-3(175B参数)需要36400 petaflop / s-day,而一天运行1 petaflop / s相当于8个V100 GPU。
“The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server,” check out how Microsoft built the GPU-accelerated supercomputer system for OpenAI
“为OpenAI开发的超级计算机是一个单一系统,每个GPU服务器具有超过285,000个 CPU内核, 10,000个 GPU和每秒400吉比特的网络连接,”请查看Microsoft如何为OpenAI构建GPU加速的超级计算机系统
很少有射击学习者 (Few shots learner )
This is the main breakthrough of the GPT-3, it means the model can quickly learn a task by observing zero, one, or few examples of the task, makes it more capable of predicting for tasks that it has never seen. The bigger model can learn the task better with contextual information from few examples.
这是GPT-3的主要突破,这意味着该模型可以通过观察零个,一个或几个示例来快速学习任务,从而使其能够预测从未见过的任务。 更大的模型可以通过少量示例从上下文信息中更好地学习任务。
When comparing with traditional task agnostic architecture (BERT), which requires tens of thousands task specific examples to fine tune on a downstream task. GPT-3 enables us to perform on a new task (classification, question answering, translation, reasoning and many others) from just a few example or instructions without fine tuning. This is a huge advantage over other mainstream language models like BERT, ELMO, XLM.
与传统的任务不可知体系结构(BERT)进行比较时,它需要成千上万个特定于任务的示例来微调下游任务。 GPT-3使我们能够仅通过几个示例或说明就可以执行新任务(分类,问题解答,翻译,推理等),而无需进行微调。 与其他主流语言模型(例如BERT,ELMO和XLM)相比,这是一个巨大的优势。
Even more, OpenAI released the GPT-3 API. Sending the text prompt, the API will return the completion, attempting to match the pattern you gave it in the examples. The API is so simple that developers can just plug and play to bring the intelligence of remote model to their products, without worry about ML pipeline, infrastructure, and hosting. This is a huge breakthrough with many benefits:
此外,OpenAI还发布了GPT-3 API 。 发送文本提示后,API将返回完成内容,尝试匹配示例中提供的模式。 该API非常简单,因此开发人员可以即插即用以将远程模型的智能引入其产品,而不必担心ML管道,基础结构和托管。 这是一项巨大的突破,具有许多优势:
- It makes the State-of-the-Art model more accessible to developers for exploring potential use cases. 开发人员可以使用最新模型来探索潜在的用例。
- Significantly lowered the barrier of adopting NLP/machine intelligence. Developers no longer need to spend extra efforts on acquiring task specific dataset, mastering the complicated ML and fine tuning process, handling the ML infrastructure. 大大降低了采用NLP /机器智能的障碍。 开发人员不再需要花费额外的精力来获取特定于任务的数据集,掌握复杂的ML和微调过程,处理ML基础架构。
The API is not ready for mass production use, OpenAI and developers are still learning and evaluating various applications and the potential social impact.
该API尚未准备好用于批量生产,OpenAI和开发人员仍在学习和评估各种应用程序以及潜在的社会影响。
用例 (Use cases)
Many creative prototypes enabled by GPT-3 are popping up in the past few weeks, a lot of impressive demos can be found on Twitter. Now, let’s have a deeper dive of how GPT-3 helps with traditional NLP tasks as well as some of the creative applications from community.
在过去的几周中,出现了许多由GPT-3支持的创意原型,可以在Twitter上找到很多令人印象深刻的演示。 现在,让我们更深入地探讨GPT-3如何帮助完成传统的NLP任务以及社区中的一些创意应用程序。
** In the following examples, we will use blue color to indicate the context or prompt (task descriptions, examples) as a prefix. Then the orange color to indicate the users’s input and the rest will be the model’s predictions.
** 在以下示例中,我们将使用 蓝色 表示上下文或提示(任务描述,示例)作为前缀。 然后, 橙色 表示用户的输入,其余将作为模型的预测。
语言生成 (Language generation)
One of the most impressive use cases is language generation/text completion. Here is an example of having GPT-3 to help us expand one sentence to create a short essay. It even provides concrete examples of how to express gratitude.
最令人印象深刻的用例之一是语言生成/文本完成。 这是使用GPT-3帮助我们扩展一个句子以创建一篇短文的示例。 它甚至提供了如何表达感激的具体例子。
问题回答 (Question Answering)
With just 5 QA examples, the model is able to answer all the following questions with accurate answers while maintaining the context information.
仅需5个质量检查示例,该模型就可以在保持上下文信息的同时以准确答案回答以下所有问题。
聊天机器人 (Chatbot)
With description of the task and one example, the model is able to carry on a helpful, creative, and relevant chat experience. But in this example, the bot made up the response of Sam Altman is a co-founder of Y Combinator, while in reality he is the president. This is probably because I used a higher temperature (one of the api parameter) for higher creativeness.
通过任务描述和一个示例,该模型能够进行有益,创新和相关的聊天体验。 但在此示例中,由Sam Altman做出回应的机器人是Y Combinator的共同创始人,而实际上他是总裁。 这可能是因为我使用了更高的温度(api参数之一)来提高创造力。
文字分类 (Text Classification)
For this task, we provided descriptions of the task and 3 examples of company name, business category, and their latest market cap. Then we tested by providing 4 new names like Unilever, McDonald’s, Google, and Apple.
对于此任务,我们提供了该任务的描述以及3个公司名称,业务类别及其最新市值的示例。 然后,我们通过提供4个新名称(例如联合利华,麦当劳,谷歌和苹果)进行了测试。
The model finished the business category classification most accurately, and also able to estimate the market cap numbers. But due to the model is trained with data cut off in Oct 2019, so I guess the predicted market cap is outdated.
该模型最准确地完成了业务类别分类,并且还能够估计市值。 但是由于该模型是在2019年10月截断数据的情况下进行训练的,所以我认为预测的市值已经过时了。
翻译 (Translation)
With just 5 English to Chinese translation examples, the model did a pretty good job on finishing the rest of the translations.
仅提供了5个英语到中文的翻译示例,该模型就很好地完成了其余翻译。
虚拟助手 (Virtual Assistant)
This example is to demonstrate the model’s text to action capabilities. A comparable use cases like Alexa or Siri where user give voice queries and we need to extract the semantic meanings of the query by parsing each query into an intent and multiple slots for actions.
此示例旨在演示模型的文本到操作功能。 一个类似的用例,例如Alexa或Siri,其中用户发出语音查询,我们需要通过将每个查询解析为一个意图和多个用于操作的插槽来提取查询的语义。
The following example presents you a text to actions NLU engine with few shots. It is amazing that with just a couple of examples, the model is able to help create a new intent “open-app” and extract all the slots correctly for “open Netflix and search house of cards”.
以下示例为您提供了一些动作NLU引擎的文本。 令人惊讶的是,仅通过几个示例,该模型就可以帮助创建一个新的意图“ open-app”,并正确提取所有用于“ open Netflix and card house of card”的广告位。
还有什么? (What else?)
Beyond traditional NLP tasks, there are many other amazing things we can build with this API.
除了传统的NLP任务,我们还可以使用此API构建许多其他令人惊奇的东西。
Like create fiction, writing songs, generating movie scripts, generating stories ends up to different directions, writing emails, help you generate ideas, telling kids a story, or even generate images …
像创作小说 ,写歌,生成电影脚本,生成故事会导致方向不同,写电子邮件,帮助您产生想法,告诉孩子一个故事,甚至产生图像 ……
The following is a collection of some creative prototypes from the community, you will find huge potentials in different domains and modalities.
以下是社区中一些创意原型的集合,您将在不同的领域和方式中发现巨大的潜力。
使用自然语言构建照片应用 (Built a photo app with natural language)
编码教练 (Coding coaching)
自动化您的Google电子表格 (Automate your Google spreadsheet)
帮助您更好地了解食物 (Helps you know better about your food)
将法律条款变成可以理解的人类语言 (Turn legal clause into understandable human language )
一些进一步的想法 (Some Further thoughts)
GPT-3 creates a paradigm shift on developing AI products
GPT-3在开发AI产品方面带来了范式转变
Traditionally, as a ML product manager or ML engineer, you will spend more efforts on curating the training dataset, selecting the models, fine tuning, and running the evaluation. ML infra needs to orchestrate the hosting of multiple models for inference and scaling.
传统上,作为ML产品经理或ML工程师,您将花费更多的精力来整理训练数据集,选择模型,微调和运行评估。 机器学习基础设施需要协调多个模型的托管,以进行推理和扩展。
With GPT-3 API, some of the above efforts can be skipped. Once you figured out what problem GPT-3 API can help solve, how you design the prompt and how you choose API parameters (temperature, response length and etc.) will impact the completion quality and success of the product. And product owners and developers also need to dive deep to explore other possible sources and modalities of the prompts, like user’s likes, clicks, sharing, and comments, even the transcript from the audio or semantic segmentations from the image/video.
使用GPT-3 API,可以跳过上述工作。 一旦确定了GPT-3 API可以解决什么问题,如何设计提示以及如何选择API参数(温度,响应长度等)将影响产品的完成质量和成功。 产品所有者和开发人员还需要深入研究提示的其他可能来源和方式,例如用户的喜欢,点击,共享和评论,甚至包括音频的笔录或图像/视频的语义片段。
Social impacts
社会影响
GPT-3 is a good start, though there are some concerns on fairness, bias and social impacts. The more time users spend on using the products, the more it’ll affect the quality of the dataset. This will lead to potential bias of AI products from different providers, which are based on the same GPT-3 models.
GPT-3是一个良好的开端,尽管对公平,偏见和社会影响有些担忧。 用户花在使用产品上的时间越多,对数据集质量的影响就越大。 这将导致不同提供商基于相同GPT-3模型的AI产品存在潜在偏差。
Some ideas to tackle the issues above:
解决上述问题的一些想法:
Create a trustable reputation system for ML models.
为机器学习模型创建可信赖的信誉系统。
Provide the tracking data to the models to improve the quality of answers. The users can have control on which provider to select, based on their need and quality.
向模型提供跟踪数据,以提高答案的质量。 用户可以根据他们的需求和质量来控制选择哪个提供商。
到底 (In the end)
The efforts in training the models and evaluating the AI products are still critical. But GPT-3 still provides a new way of development AI products, with potential benefits on time, cost and quality.
训练模型和评估AI产品的工作仍然至关重要。 但是GPT-3仍然提供了开发AI产品的新方法,具有在时间,成本和质量上的潜在好处。
GPT-3 is powerful and super-intelligent. Now it is time to turn on our imagination to find all the possibilities.
GPT-3功能强大且超级智能。 现在是时候激发我们的想象力,找到所有可能性。
Feel free to suggest ideas or say hi.
随时提出想法或者说喜 。
At the beginning I used “Demystify GPT3 with real use cases” as the headline. Before publishing this article, I feel it is so plain, so I asked GPT-3 for help.
一开始,我使用“用实际用例对GPT3进行解密”作为标题。 在发布本文之前,我觉得它很简单,所以我向GPT-3寻求帮助。
The headline and several paragraph of this article are generated by GPT-3.
本文的标题和几段由GPT-3生成。
翻译自: https://towardsdatascience.com/gpt3-the-dream-machine-in-real-world-c99592d4842f
openai-gpt