作为高级开发工程师,如果你需要开发一个使用ChatGPT的应用程序并部署到生产环境上,那么在此之前,你需要提前考虑完善各项工作。比如如何做好相应的成本控制、并发性能监控,如何持续评估和迭代机器学习模型,以及数据安全性和合规性等方面。
值得一提的是,OpenAI关于ChatGPT的生产最佳实践官方指南覆盖了以上所有内容。相信这一最佳实践指南能够帮助我们从0到1打造出一个高水平的产品。
This guide provides a comprehensive set of best practices to help you transition from prototype to production. Whether you are a seasoned machine learning engineer or a recent enthusiast, this guide should provide you with the tools you need to successfully put the platform to work in a production setting: from securing access to our API to designing a robust architecture that can handle high traffic volumes. Use this guide to help develop a plan for deploying your application as smoothly and effectively as possible.
本指南提供了一套全面的最佳实例,可帮助您从原型过渡到生产。无论您是经验丰富的机器学习工程师还是最近的爱好者,本指南都应该为您提供成功将平台投入生产环境所需的工具:从保护对我们API的访问到设计一个可以处理高流量的强大架构。使用本指南可以帮助您制定尽可能平稳有效地部署应用程序的计划。
Once you log in to your OpenAI account, you can find your organization name and ID in your organization settings. The organization name is the label for your organization, shown in user interfaces. The organization ID is the unique identifier for your organization which can be used in API requests.
登录OpenAI帐户后,您可以在组织设置中找到您的组织名称和ID。组织名称是组织的标签,显示在用户界面中。组织ID是您的组织的唯一标识符,可用于API请求。
Users who belong to multiple organizations can pass a header to specify which organization is used for an API request. Usage from these API requests will count against the specified organization’s quota. If no header is provided, the default organization will be billed. You can change your default organization in your user settings.
属于多个组织的用户可以传递一个标头,以指定哪个组织用于API请求。这些API请求的使用量将计入指定组织的配额。如果未提供标题,则将对默认组织开单。您可以在用户设置中更改默认组织。
You can invite new members to your organization from the members settings page. Members can be readers or owners. Readers can make API requests and view basic organization information, while owners can modify billing information and manage members within an organization.
您可以从成员设置页面邀请新成员加入组织。成员可以是读者或所有者。读者可以发出API请求并查看基本组织信息,而所有者可以修改计费信息并管理组织内的成员。
New free trial users receive an initial credit of $5 that expires after three months. Once the credit has been used or expires, you can choose to enter billing information to continue your use of the API. If no billing information is entered, you will still have login access but will be unable to make any further API requests.
新的免费试用用户将获得5美元的初始信用,三个月后到期。信用额度用完或到期后,您可以选择输入账单信息以继续使用API。如果未输入任何计费信息,您仍将具有登录访问权限,但将无法进行任何进一步的API请求。
Once you’ve entered your billing information, you will have an approved usage limit of $120 per month, which is set by OpenAI. To increase your quota beyond the $120 monthly billing limit, please submit a quota increase request.
一旦您输入了账单信息,您将获得每月120美元的批准使用限额,这是由OpenAI设置的。要将您的配额增加到超过每月120美元的账单限额,请提交配额增加请求。
If you’d like to be notified when your usage exceeds a certain amount, you can set a soft limit through the usage limits page. When the soft limit is reached, the owners of the organization will receive an email notification. You can also set a hard limit so that, once the hard limit is reached, any subsequent API requests will be rejected. Note that these limits are best effort, and there may be 5 to 10 minutes of delay between the usage and the limits being enforced.
如果您希望在使用量超过一定数量时收到通知,您可以通过使用限制页面设置软限制。当达到软限制时,组织的所有者将收到电子邮件通知。您还可以设置硬限制,以便一旦达到硬限制,将拒绝任何后续API请求。请注意,这些限制是尽力而为的,在使用和强制执行的限制之间可能有5到10分钟的延迟。
The OpenAI API uses API keys for authentication. Visit your API keys page to retrieve the API key you’ll use in your requests.
OpenAI API使用API密钥进行身份验证。访问您的API密钥页面以检索您将在请求中使用的API密钥。
This is a relatively straightforward way to control access, but you must be vigilant about securing these keys. Avoid exposing the API keys in your code or in public repositories; instead, store them in a secure location. You should expose your keys to your application using environment variables or secret management service, so that you don’t need to hard-code them in your codebase. Read more in our Best practices for API key safety.
这是控制访问的一种相对简单的方法,但您必须对保护这些密钥保持警惕。避免在代码或公共存储库中暴露API密钥;而是将它们存储在安全位置。您应该使用环境变量或秘密管理服务将密钥公开给应用程序,这样就不需要在代码库中硬编码它们。请阅读我们的API密钥安全最佳实践。
As you scale, you may want to create separate organizations for your staging and production environments. Please note that you can sign up using two separate email addresses like [email protected] and [email protected] to create two organizations. This will allow you to isolate your development and testing work so you don’t accidentally disrupt your live application. You can also limit access to your production organization this way.
随着扩展,您可能希望为临时环境和生产环境创建单独的组织。请注意,您可以使用两个单独的电子邮件地址(如[email protected]和[email protected])注册,以创建两个组织。这将允许您隔离开发和测试工作,这样您就不会意外地中断活动应用程序。您还可以通过这种方式限制对生产组织的访问。
If you haven’t gone through the quickstart guide, we recommend you start there before diving into the rest of this guide.
如果您还没有浏览过快速入门指南,我们建议您在深入阅读本指南的其余部分之前先从快速入门指南开始。
For those new to the OpenAI API, our playground can be a great resource for exploring its capabilities. Doing so will help you learn what’s possible and where you may want to focus your efforts. You can also explore our example prompts.
对于那些OpenAI API的新手来说,我们的游乐场可以成为探索其功能的绝佳资源。这样做可以帮助你了解什么是可能的,以及你可能想把精力集中在哪里。您也可以浏览我们的示例提示。
While the playground is a great place to prototype, it can also be used as an incubation area for larger projects. The playground also makes it easy to export code snippets for API requests and share prompts with collaborators, making it an integral part of your development process.
虽然游乐场是一个很好的原型制作场所,但它也可以用作大型项目的孵化区。游乐场还可以轻松导出API请求的代码段,并与协作者共享提示,使其成为开发过程中不可或缺的一部分。
Even with careful planning, it’s important to be prepared for unexpected issues when using GPT-3 in your application. In some cases, the model may fail on a task, so it’s helpful to consider what you can do to improve the reliability of your application.
即使经过仔细的规划,在应用程序中使用GPT-3时,为意外问题做好准备也很重要。在某些情况下,模型可能会在某个任务上失败,因此考虑如何提高应用程序的可靠性是很有帮助的。
If your task involves logical reasoning or complexity, you may need to take additional steps to build more reliable prompts. For some helpful suggestions, consult our Techniques to improve reliability guide. Overall the recommendations revolve around:
如果您的任务涉及逻辑推理或复杂性,则可能需要采取其他步骤来构建更可靠的提示。有关一些有用的建议,请参阅我们的提高可靠性技术指南。总的来说,这些建议围绕着:
One of the most important aspects of developing a system for production is regular evaluation and iterative experimentation. This process allows you to measure performance, troubleshoot issues, and fine-tune your models to improve accuracy and efficiency. A key part of this process is creating an evaluation dataset for your functionality. Here are a few things to keep in mind:
开发用于生产的系统的最重要方面之一是定期评估和迭代实验。此过程允许您测量性能、解决问题并微调模型以提高准确性和效率。此过程的关键部分是为您的功能创建评估数据集。以下是需要牢记的几点:
By conducting regular evaluation and iterative experimentation, you can ensure that your GPT-powered application or prototype continues to improve over time.
通过进行定期评估和迭代实验,您可以确保GPT驱动的应用程序或原型随着时间的推移不断改进。
Language models can be difficult to evaluate because evaluating the quality of generated language is often subjective, and there are many different ways to communicate the same message correctly in language. For example, when evaluating a model on the ability to summarize a long passage of text, there are many correct summaries. That being said, designing good evaluations is critical to making progress in machine learning.
语言模型可能很难评估,因为评估生成的语言的质量通常是主观的,并且有许多不同的方法可以用语言正确地传达相同的消息。例如,当评估一个模型总结一长段文本的能力时,有许多正确的总结。话虽如此,设计良好的评估对于机器学习取得进展至关重要。
An eval suite needs to be comprehensive, easy to run, and reasonably fast (depending on model size). It also needs to be easy to continue to add to the suite as what is comprehensive one month will likely be out of date in another month. We should prioritize having a diversity of tasks and tasks that identify weaknesses in the models or capabilities that are not improving with scaling.
一个eval套件需要全面、易于运行,并且相当快(取决于模型大小)。它还需要很容易继续添加到套件中,因为一个月的全面内容可能在另一个月就过时了。我们应该优先考虑任务的多样性,这些任务可以识别模型中的弱点或无法随着扩展而改进的功能。
The simplest way to evaluate your system is to manually inspect its outputs. Is it doing what you want? Are the outputs high quality? Are they consistent?
评估系统的最简单方法是手动检查其输出。它在做你想做的事吗?产出是否高质量?它们是一致的吗?
The best way to test faster is to develop automated evaluations. However, this may not be possible in more subjective applications like summarization tasks.
加快测试速度的最佳方法是开发自动评估。然而,这在更主观的应用(如摘要任务)中可能是不可能的。
Automated evaluations work best when it’s easy to grade a final output as correct or incorrect. For example, if you’re fine-tuning a classifier to classify text strings as class A or class B, it’s fairly simple: create a test set with example input and output pairs, run your system on the inputs, and then grade the system outputs versus the correct outputs (looking at metrics like accuracy, F1 score, cross-entropy, etc.).
当很容易将最终输出分为正确或不正确时,自动评估工作最好。例如,如果你正在微调一个分类器,将文本字符串分类为A类或B类,这相当简单:使用示例输入和输出对创建一个测试集,在输入上运行系统,然后将系统输出与正确的输出进行比较(查看准确性,F1得分,交叉熵等指标)。
If your outputs are semi open-ended, as they might be for a meeting notes summarizer, it can be trickier to define success: for example, what makes one summary better than another? Here, possible techniques include:
如果您的输出是半开放式的,就像会议记录摘要器一样,那么定义成功可能会更棘手:例如,是什么让一个总结比另一个更好?这里,可能的技术包括:
For very open-ended tasks, such as a creative story writer, automated evaluation is more difficult. Although it might be possible to develop quality metrics that look at spelling errors, word diversity, and readability scores, these metrics don’t really capture the creative quality of a piece of writing. In cases where no good automated metric can be found, human evaluations remain the best method.
对于非常开放式的任务,例如创造性的故事作者,自动评估就比较困难。尽管可以开发出质量指标来衡量拼写错误、单词多样性和可读性得分,但这些指标并不能真正反映一篇文章的创造性质量。在无法找到良好的自动化指标的情况下,人工评估仍然是最佳方法。
As an example, let’s consider the case of building a retrieval-based Q&A system.
作为一个例子,让我们考虑构建一个基于检索的问答系统的情况。
A retrieval-based Q&A system has two steps. First, a user’s query is used to rank potentially relevant documents in a knowledge base. Second, GPT-3 is given the top-ranking documents and asked to generate an answer to the query.
基于检索的Q&A系统有两个步骤。首先,使用用户的查询来对知识库中的潜在相关文档进行排名。第二,GPT-3被赋予最高排名的文档,并被要求生成查询的答案。
Evaluations can be made to measure the performance of each step.
可以进行评估以测量每个步骤的性能。
For the search step, one could:
对于搜索步骤,可以:
For the question answering step, one could:
对于问题回答步骤,可以:
Second, use the test set to grade the system’s performance
其次,使用测试集对系统的性能进行分级
N=100
is just an example, and in early stages, you might start with a smaller set that’s easier to generate, and in later stages, you might invest in a larger set that’s more costly but more statistically reliable.N=100
只是一个例子,在早期阶段,你可能会从一个更容易生成的较小集合开始,在后期阶段,你可能会投资一个更大的集合,成本更高,但在统计上更可靠。When designing your application or service for production that uses our API, it’s important to consider how you will scale to meet traffic demands. There are a few key areas you will need to consider regardless of the cloud service provider of your choice:
在设计使用我们的API的生产应用或服务时,重要的是要考虑如何扩展以满足流量需求。无论您选择哪种云服务提供商,都需要考虑以下几个关键领域:
When using our API, it’s important to understand and plan for rate limits.
在使用我们的API时,了解和规划速率限制非常重要。
Latency is the time it takes for a request to be processed and a response to be returned. In this section, we will discuss some factors that influence the latency of our text generation models and provide suggestions on how to reduce it.
延迟是处理请求和返回响应所花费的时间。在本节中,我们将讨论影响文本生成模型延迟的一些因素,并提供有关如何减少延迟的建议。
The latency of a completion request is mostly influenced by two factors: the model and the number of tokens generated. The life cycle of a completion request looks like this:
完成请求的延迟主要受两个因素的影响:模型和生成的token的数量。完成请求的生命周期如下所示:
The bulk of the latency typically arises from the token generation step.
大部分延迟通常由token生成步骤引起。
Intuition: Prompt tokens add very little latency to completion calls. Time to generate completion tokens is much longer, as tokens are generated one at a time. Longer generation lengths will accumulate latency due to generation required for each token.
直觉:提示符token 几乎不会给完成调用增加延迟。生成完成token的时间要长得多,因为token是一次生成一个。更长的生成长度将由于每个令牌所需的生成而累积延迟。
Now that we have looked at the basics of latency, let’s take a look at various factors that can affect latency, broadly ordered from most impactful to least impactful.
现在我们已经了解了延迟的基本知识,让我们来看看可能影响延迟的各种因素,从最具影响力到最不具影响力大致排序。
Our API offers different models with varying levels of complexity and generality. The most capable models, such as gpt-4
, can generate more complex and diverse completions, but they also take longer to process your query. Models such as gpt-3.5-turbo
, can generate faster and cheaper chat completions, but they may generate results that are less accurate or relevant for your query. You can choose the model that best suits your use case and the trade-off between speed and quality.
我们的API提供不同的模型,具有不同的复杂性和通用性。功能最强大的模型(如 gpt-4
)可以生成更复杂和更多样化的补全,但它们也需要更长的时间来处理您的查询。 gpt-3.5-turbo
等模型可以生成更快、更便宜的聊天完成,但它们可能生成不太准确或与您的查询相关的结果。您可以选择最适合您的用例的模型,并在速度和质量之间进行权衡。
Requesting a large amount of generated tokens completions can lead to increased latencies:
请求大量生成的令牌完成可能会导致延迟增加:
max_tokens
parameter incur less latency.max_tokens
参数的那些请求招致较少等待时间。11.
as a stop sequence, you can generate a list with only 10 items, since the completion will stop when 11.
is reached. Read our help article on stop sequences for more context on how you can do this.11.
作为停止序列,您可以生成一个只有10个项目的列表,因为完成将在达到 11.
时停止。 请阅读我们关于停止序列的帮助文章,了解如何执行此操作的更多上下文。n
and best_of
when possible where n refers to how many completions to generate for each prompt and best_of is used to represent the result with the highest log probability per token.n
和 best_of
的值,其中 n
是指为每个提示生成多少个完成, best_of
用于表示每个令牌具有最高对数概率的结果。If n
and best_of
both equal 1 (which is the default), the number of generated tokens will be at most, equal to max_tokens
.
如果 n
和best_of
都等于1(这是默认值),则生成的令牌的数量将最多等于 max_tokens
。
If n
(the number of completions returned) or best_of
(the number of completions generated for consideration) are set to > 1
, each request will create multiple outputs. Here, you can consider the number of generated tokens as [ max_tokens * max (n, best_of) ]
如果将 n
(返回的完成数)或 best_of
(生成的完成数)设置为 > 1
,则每个请求将创建多个输出。在这里,您可以将生成的令牌数视为 [ max_tokens * max (n, best_of) ]
Setting stream: true
in a request makes the model start returning tokens as soon as they are available, instead of waiting for the full sequence of tokens to be generated. It does not change the time to get all the tokens, but it reduces the time for first token for an application where we want to show partial progress or are going to stop generations. This can be a better user experience and a UX improvement so it’s worth experimenting with streaming.
在请求中设置 stream: true
会使模型在令牌可用时立即开始返回令牌,而不是等待生成完整的令牌序列。它不会改变获取所有令牌的时间,但它减少了我们想要显示部分进度或将要停止生成的应用程序的第一个令牌的时间。这可能是一个更好的用户体验和UX改进,所以值得尝试串流。
Our servers are currently located in the US. While we hope to have global redundancy in the future, in the meantime you could consider locating the relevant parts of your infrastructure in the US to minimize the roundtrip time between your servers and the OpenAI servers.
我们的服务器目前位于美国。虽然我们希望在未来实现全球冗余,但与此同时,您可以考虑将基础设施的相关部分放在美国,以最大限度地减少服务器和OpenAI服务器之间的往返时间。
Depending on your use case, batching may help. If you are sending multiple requests to the same endpoint, you can batch the prompts to be sent in the same request. This will reduce the number of requests you need to make. The prompt parameter can hold up to 20 unique prompts. We advise you to test out this method and see if it helps. In some cases, you may end up increasing the number of generated tokens which will slow the response time.
根据您的用例,批处理可能会有所帮助。如果要向同一端点发送多个请求,则可以批处理要在同一请求中发送的提示。这将减少您需要提出的请求的数量。prompt参数最多可以保存20个唯一提示。我们建议您测试一下这个方法,看看是否有帮助。在某些情况下,您最终可能会增加生成的令牌的数量,这将减慢响应时间。
To monitor your costs, you can set a soft limit in your account to receive an email alert once you pass a certain usage threshold. You can also set a hard limit. Please be mindful of the potential for a hard limit to cause disruptions to your application/users. Use the usage tracking dashboard to monitor your token usage during the current and past billing cycles.
为了监控您的成本,您可以在帐户中设置软限制,以便在超过特定使用阈值时收到电子邮件提醒。您也可以设置一个硬限制。请注意硬限制可能会对您的应用程序/用户造成中断。使用使用情况跟踪仪表板监控当前和过去计费周期内的令牌使用情况。
One of the challenges of moving your prototype into production is budgeting for the costs associated with running your application. OpenAI offers a pay-as-you-go pricing model, with prices per 1,000 tokens (roughly equal to 750 words). To estimate your costs, you will need to project the token utilization. Consider factors such as traffic levels, the frequency with which users will interact with your application, and the amount of data you will be processing.
将原型投入生产的挑战之一是为运行应用程序的相关成本进行预算。OpenAI提供了一个按需付费的定价模型,每1,000个token(大约等于750个单词)的价格。要估计成本,您需要预测token利用率。考虑一些因素,如流量水平、用户与应用程序交互的频率以及您将处理的数据量。
One useful framework for thinking about reducing costs is to consider costs as a function of the number of tokens and the cost per token. There are two potential avenues for reducing costs using this framework. First, you could work to reduce the cost per token by switching to smaller models for some tasks in order to reduce costs. Alternatively, you could try to reduce the number of tokens required. There are a few ways you could do this, such as by using shorter prompts, fine-tuning models, or caching common user queries so that they don’t need to be processed repeatedly.
考虑降低成本的一个有用框架是将成本视为token数量和每个token成本的函数。有两个潜在的途径来降低使用该框架的成本。首先,您可以通过为某些任务切换到较小的模型来降低每个令牌的成本,以降低成本。或者,您可以尝试减少所需的令牌数量。有几种方法可以做到这一点,例如使用更短的提示,微调模型,或缓存常见的用户查询,以便它们不需要重复处理。
You can experiment with our interactive tokenizer tool to help you estimate costs. The API and playground also returns token counts as part of the response. Once you’ve got things working with our most capable model, you can see if the other models can produce the same results with lower latency and costs. Learn more in our token usage help article.
您可以尝试使用我们的交互式符分词工具来帮助您估算成本。API和playground还返回令牌计数作为响应的一部分。一旦你使用我们最强大的模型,你就可以看到其他模型是否可以以更低的延迟和成本产生相同的结果。在我们的token 使用帮助文章中了解更多信息。
As you move your prototype into production, you may want to consider developing an MLOps strategy. MLOps (machine learning operations) refers to the process of managing the end-to-end life cycle of your machine learning models, including any models you may be fine-tuning using our API. There are a number of areas to consider when designing your MLOps strategy. These include
当您将原型投入生产时,您可能需要考虑开发一个MLOps策略。MLOps(机器学习操作策略)是指管理机器学习模型的端到端生命周期的过程,包括您可能使用我们的API进行微调的任何模型。在设计MLOps策略时,有许多方面需要考虑。其中包括
Thinking through these aspects of your application will help ensure your model stays relevant and performs well over time.
仔细考虑应用程序的这些方面将有助于确保您的模型保持相关性,并随着时间的推移表现良好。
As you move your prototype into production, you will need to assess and address any security and compliance requirements that may apply to your application. This will involve examining the data you are handling, understanding how our API processes data, and determining what regulations you must adhere to. For reference, here is our Privacy Policy and Terms of Use.
当您将原型投入生产时,您需要评估和解决可能适用于您的应用程序的任何安全性和合规性要求。这将涉及检查您正在处理的数据,了解我们的API如何处理数据,并确定您必须遵守的法规。以下是我们的隐私政策和使用条款,供您参考。
Some common areas you’ll need to consider include data storage, data transmission, and data retention. You might also need to implement data privacy protections, such as encryption or anonymization where possible. In addition, you should follow best practices for secure coding, such as input sanitization and proper error handling.
您需要考虑的一些常见领域包括数据存储、数据传输和数据保留。您可能还需要实施数据隐私保护,例如在可能的情况下进行加密或匿名化。此外,您应该遵循安全编码的最佳实践,例如输入清理和合适的错误处理。
When creating your application with our API, consider our safety best practices to ensure your application is safe and successful. These recommendations highlight the importance of testing the product extensively, being proactive about addressing potential issues, and limiting opportunities for misuse.
使用我们的API创建应用程序时,请考虑我们的安全最佳实践,以确保您的应用程序安全且成功。这些建议强调了广泛测试产品的重要性,积极主动地解决潜在问题,并限制误用的机会。
如果大家想继续了解人工智能相关学习路线和知识体系,欢迎大家翻阅我的另外一篇博客《重磅 | 完备的人工智能AI 学习——基础知识学习路线,所有资料免关注免套路直接网盘下载》
这篇博客参考了Github知名开源平台,AI技术平台以及相关领域专家:Datawhale,ApacheCN,AI有道和黄海广博士等约有近100G相关资料,希望能帮助到所有小伙伴们。