作者:Derrick Harris,Matt Bornstein,Guido Appenzeller
Research in artificial intelligence is increasing at an exponential rate. It’s difficult for AI experts to keep up with everything new being published, and even harder for beginners to know where to start.
人工智能的研究正以指数级的速度增长。人工智能专家很难跟上所有新发布的内容,初学者更难知道从哪里开始。
So, in this post, we’re sharing a curated list of resources we’ve relied on to get smarter about modern AI. We call it the “AI Canon” because these papers, blog posts, courses, and guides have had an outsized impact on the field over the past several years.
所以,在这篇文章中,我们分享了一个精选的资源列表,我们依靠这些资源来更聪明地了解现代AI。我们称之为“AI佳能”,因为这些论文,博客文章,课程和指南在过去几年中对该领域产生了巨大的影响。
We start with a gentle introduction to transformer and latent diffusion models, which are fueling the current AI wave. Next, we go deep on technical learning resources; practical guides to building with large language models (LLMs); and analysis of the AI market.
Finally, we include a reference list of landmark research results, starting with “Attention is All You Need” — the 2017 paper by Google that introduced the world to transformer models and ushered in the age of generative AI.
最后,我们包括一个具有里程碑意义的研究成果的参考列表,从“注意力是你所需要的一切”开始-谷歌2017年的论文,该论文向世界介绍了变压器模型,并迎来了生成式人工智能的时代。
首先,我们将温和地介绍变压器和潜在扩散模型,这些模型正在推动当前的AI浪潮。接下来,我们深入技术学习资源;使用大型语言模型(LLM)构建的实用指南;分析AI市场。
A gentle introduction… 温柔的介绍。。
These articles require no specialized background and can help you get up to speed quickly on the most important parts of the modern AI wave.
这些文章不需要专业背景,可以帮助您快速了解现代AI浪潮中最重要的部分。
- Software 2.0: Andrej Karpathy was one of the first to clearly explain (in 2017!) why the new AI wave really matters. His argument is that AI is a new and powerful way to program computers.
As LLMs have improved rapidly, this thesis has proven prescient, and it gives a good mental model for how the AI market may progress.
随着LLM的迅速发展,这篇论文被证明是有先见之明的,它为人工智能市场的发展提供了一个很好的心理模型。
软件2.0:Andrej Karpathy是最早明确解释的人之一(2017年!)为什么新的人工智能浪潮真的很重要他的论点是,人工智能是一种新的、强大的计算机编程方式。
- State of GPT: Also from Karpathy, this is a very approachable explanation of how ChatGPT / GPT models in general work, how to use them, and what directions R&D may take.
GPT状态:同样来自Karpathy,这是一个非常平易近人的解释如何ChatGPT / GPT模型在一般的工作,如何使用它们,以及什么方向的研发可能采取。
- What is ChatGPT doing … and why does it work?: Computer scientist and entrepreneur Stephen Wolfram gives a long but highly readable explanation, from first principles, of how modern AI models work. He follows the timeline from early neural nets to today’s LLMs and ChatGPT.
ChatGPT在做什么…为什么会有效:计算机科学家和企业家Stephen Wolfram从基本原理出发,对现代人工智能模型的工作原理进行了冗长但可读性很强的解释。他遵循从早期神经网络到今天的LLM和ChatGPT的时间轴。
- Transformers, explained: This post by Dale Markowitz is a shorter, more direct answer to the question “what is an LLM, and how does it work?” This is a great way to ease into the topic and develop intuition for the technology. It was written about GPT-3 but still applies to newer models.
Transformers解释说:这篇文章由戴尔马科维茨是一个更短,更直接的回答问题“什么是法学硕士,它是如何工作的?””这是一个很好的方式来轻松进入主题,并发展对技术的直觉。它是关于GPT-3的,但仍然适用于较新的模型。
- How Stable Diffusion works: This is the computer vision analogue to the last post. Chris McCormick gives a layperson’s explanation of how Stable Diffusion works and develops intuition around text-to-image models generally. For an even gentler introduction, check out this comic from r/StableDiffusion.
稳定扩散的工作原理:这是上一篇文章的计算机视觉模拟。Chris McCormick给出了一个外行的解释,说明了稳定扩散是如何工作的,并通常围绕文本到图像模型开发了直觉。对于更温和的介绍,请查看r/StableDiffusion的这部漫画。
Foundational learning: neural networks, backpropagation, and embeddings
基础学习:神经网络、反向传播和嵌入
These resources provide a base understanding of fundamental ideas in machine learning and AI, from the basics of deep learning to university-level courses from AI experts.
这些资源提供了对机器学习和人工智能基本思想的基本理解,从深度学习的基础知识到人工智能专家的大学课程。
Explainers
- Deep learning in a nutshell: core concepts: This four-part series from Nvidia walks through the basics of deep learning as practiced in 2015, and is a good resource for anyone just learning about AI.
深度学习简介:核心理念:Nvidia的这个四部分系列介绍了2015年实践的深度学习基础知识,对于任何刚刚学习AI的人来说都是一个很好的资源。
- Practical deep learning for coders: Comprehensive, free course on the fundamentals of AI, explained through practical examples and code.
面向程序员的实用深度学习:关于人工智能基础的全面免费课程,通过实际示例和代码进行解释。
- Word2vec explained: Easy introduction to embeddings and tokens, which are building blocks of LLMs (and all language models).
Word2vec解释说:简单介绍嵌入和令牌,它们是LLM(和所有语言模型)的构建块。
- Yes you should understand backprop: More in-depth post on back-propagation if you want to understand the details. If you want even more, try the Stanford CS231n lecture on Youtube.
是的,你应该理解backprop:如果你想了解更多的细节,请阅读关于反向传播的更深入的文章。如果你想知道更多,可以试试Youtube上的斯坦福大学CS231n讲座。
Courses
- Stanford CS229: Introduction to Machine Learning with Andrew Ng, covering the fundamentals of machine learning.
斯坦福大学CS229:与Andrew Ng一起介绍机器学习,涵盖机器学习的基础知识。
- Stanford CS224N: NLP with Deep Learning with Chris Manning, covering NLP basics through the first generation of LLMs.
斯坦福大学CS224N:NLP与Chris Manning的深度学习,通过第一代LLM涵盖NLP基础知识。
Tech deep dive: understanding transformers and large models
技术深度挖掘:了解变压器和大型模型
There are countless resources — some better than others — attempting to explain how LLMs work. Here are some of our favorites, targeting a wide range of readers/viewers.
有无数的资源-一些比其他的更好-试图解释LLM是如何工作的。以下是我们的一些最爱,针对广泛的读者/观众。
Explainers
- The illustrated transformer: A more technical overview of the transformer architecture by Jay Alammar.
图示的变压器:Jay Alammar对变压器架构的技术概述。
- The annotated transformer: In-depth post if you want to understand transformers at a source code level. Requires some knowledge of PyTorch.
带注释的变压器:如果你想在源代码级别上理解transformer,这篇文章是一篇深入的文章。需要一些PyTorch的知识。
- Let’s build GPT: from scratch, in code, spelled out: For the engineers out there, Karpathy does a video walkthrough of how to build a GPT model.
让我们构建GPT:从头开始,用代码拼出来:对于那里的工程师,Karpathy做了一个如何构建GPT模型的视频演练。
- The illustrated Stable Diffusion: Introduction to latent diffusion models, the most common type of generative AI model for images.
图示的稳定扩散:介绍潜在扩散模型,这是最常见的图像生成AI模型。
- RLHF: Reinforcement Learning from Human Feedback: Chip Huyen explains RLHF, which can make LLMs behave in more predictable and human-friendly ways. This is one of the most important but least well-understood aspects of systems like ChatGPT.
RLHF:从人类反馈中强化学习Chip Huyen解释了RLHF,它可以使LLM以更可预测和人性化的方式运行。这是像ChatGPT这样的系统最重要但最不容易理解的方面之一。
- Reinforcement learning from human feedback: Computer scientist and OpenAI cofounder John Shulman goes deeper in this great talk on the current state, progress and limitations of LLMs with RLHF.
来自人类反馈的强化学习:计算机科学家和OpenAI联合创始人John Shulman在这个关于RLHF LLM的当前状态,进展和局限性的精彩演讲中进行了深入的探讨。
Courses
- Stanford CS25: Transformers United, an online seminar on Transformers.
斯坦福大学CS25:变形金刚联合会,一个关于变形金刚的在线研讨会。
- Stanford CS324: Large Language Models with Percy Liang, Tatsu Hashimoto, and Chris Re, covering a wide range of technical and non-technical aspects of LLMs.
斯坦福大学CS324:与珀西Liang,Tatsu Hashimoto和Chris Re合作的大型语言模型,涵盖了LLM的广泛技术和非技术方面。
Reference and commentary 参考文献和评注
- Predictive learning, NIPS 2016: In this early talk, Yann LeCun makes a strong case for unsupervised learning as a critical element of AI model architectures at scale. Skip to 19:20 for the famous cake analogy, which is still one of the best mental models for modern AI.
预测学习,NIPS 2016:在这个早期的演讲中,Yann LeCun将无监督学习作为大规模AI模型架构的关键要素。跳到19:20来看著名的蛋糕类比,它仍然是现代人工智能最好的心智模型之一。
- AI for full-self driving at Tesla: Another classic Karpathy talk, this time covering the Tesla data collection engine. Starting at 8:35 is one of the great all-time AI rants, explaining why long-tailed problems (in this case stop sign detection) are so hard.
特斯拉的全自动驾驶AI:另一个经典的Karpathy演讲,这次涵盖了特斯拉的数据收集引擎。从8:35开始是人工智能历史上最伟大的咆哮之一,解释了为什么长尾问题(在这种情况下停止标志检测)如此困难。
- The scaling hypothesis: One of the most surprising aspects of LLMs is that scaling — adding more data and compute — just keeps increasing accuracy. GPT-3 was the first model to demonstrate this clearly, and Gwern’s post does a great job explaining the intuition behind it.
缩放假设:LLM最令人惊讶的方面之一是扩展-添加更多的数据和计算-只是不断提高准确性。GPT-3是第一个清楚地证明这一点的模型,Gwern的帖子很好地解释了它背后的直觉。
- Chinchilla’s wild implications: Nominally an explainer of the important Chinchilla paper (see below), this post gets to the heart of the big question in LLM scaling: are we running out of data? This builds on the post above and gives a refreshed view on scaling laws.
龙猫的野生含义:名义上是重要的Chinchilla论文的解释者(见下文),这篇文章触及了LLM缩放中的大问题的核心:我们的数据用完了吗?这是建立在上面的帖子上,并给出了一个关于缩放定律的更新视图。
- A survey of large language models: Comprehensive breakdown of current LLMs, including development timeline, size, training strategies, training data, hardware, and more.
大型语言模型综述:当前LLM的全面细分,包括开发时间轴,规模,培训策略,培训数据,硬件等。
- Sparks of artificial general intelligence: Early experiments with GPT-4: Early analysis from Microsoft Research on the capabilities of GPT-4, the current most advanced LLM, relative to human intelligence.
通用人工智能的火花:GPT-4的早期实验:微软研究院对GPT-4的能力的早期分析,GPT-4是目前最先进的LLM,相对于人类智能。
- The AI revolution: How Auto-GPT unleashes a new era of automation and creativity: An introduction to Auto-GPT and AI agents in general. This technology is very early but important to understand — it uses internet access and self-generated sub-tasks in order to solve specific, complex problems or goals.
AI革命:Auto-GPT如何开启自动化和创造力的新时代:Auto-GPT和AI代理的一般介绍。这项技术非常早期,但重要的是要了解-它使用互联网访问和自我生成的子任务,以解决特定的,复杂的问题或目标。
- The Waluigi Effect: Nominally an explanation of the “Waluigi effect” (i.e., why “alter egos” emerge in LLM behavior), but interesting mostly for its deep dive on the theory of LLM prompting.
瓦卢吉效应:名义上是对“Waluigi效应”的解释(即,为什么“改变自我”出现在法学硕士行为中),但有趣的主要是它对法学硕士激励理论的深入研究。
Practical guides to building with LLMs
实用指南与LLM建设
A new application stack is emerging with LLMs at the core. While there isn’t a lot of formal education available on this topic yet, we pulled out some of the most useful resources we’ve found.
一个新的应用程序堆栈正在以LLM为核心出现。虽然还没有很多关于这个主题的正规教育,但我们找到了一些最有用的资源。
Reference
- Build a GitHub support bot with GPT3, LangChain, and Python: One of the earliest public explanations of the modern LLM app stack. Some of the advice in here is dated, but in many ways it kicked off widespread adoption and experimentation of new AI apps.
使用GPT3、LangChain和Python构建GitHub支持机器人:现代LLM应用程序堆栈的最早公开解释之一。这里的一些建议已经过时,但在许多方面,它开启了新AI应用程序的广泛采用和实验。
- Building LLM applications for production: Chip Huyen discusses many of the key challenges in building LLM apps, how to address them, and what types of use cases make the most sense.
构建用于生产的LLM应用程序:Chip Huyen讨论了构建LLM应用程序的许多关键挑战,如何解决这些挑战,以及什么类型的用例最有意义。
- Prompt Engineering Guide: For anyone writing LLM prompts — including app devs — this is the most comprehensive guide, with specific examples for a handful of popular models. For a lighter, more conversational treatment, try Brex’s prompt engineering guide.
提示工程指南:对于任何编写LLM提示的人-包括应用程序开发人员-这是最全面的指南,其中包含少数流行模型的具体示例。要想获得更轻松、更有对话性的治疗,请尝试Brex的即时工程指南。
- Prompt injection: What’s the worst that can happen? Prompt injection is a potentially serious security vulnerability lurking for LLM apps, with no perfect solution yet. Simon Willison gives the definitive description of the problem in this post. Nearly everything Simon writes on AI is outstanding.
即时注射:最坏的结果是什么?提示注入是潜伏在LLM应用程序中的一个潜在的严重安全漏洞,目前还没有完美的解决方案。Simon Willison在这篇文章中给出了这个问题的明确描述。西蒙写的关于AI的几乎所有东西都很出色。
- OpenAI cookbook: For developers, this is the definitive collection of guides and code examples for working with the OpenAI API. It’s updated continually with new code examples.
OpenAI食谱:对于开发人员来说,这是使用OpenAI API的指南和代码示例的权威集合。它不断更新新的代码示例。
- Pinecone learning center: Many LLM apps are based around a vector search paradigm. Pinecone’s learning center — despite being branded vendor content — offers some of the most useful instruction on how to build in this pattern.
松果学习中心:许多LLM应用程序都基于矢量搜索范式。Pinecone的学习中心--尽管是品牌供应商内容--提供了一些关于如何构建这种模式的最有用的指导。
- LangChain docs: As the default orchestration layer for LLM apps, LangChain connects to just about all other pieces of the stack. So their docs are a real reference for the full stack and how the pieces fit together.
LangChain文档:作为LLM应用程序的默认编排层,LangChain连接到堆栈的所有其他部分。因此,他们的文档是完整堆栈以及各部分如何组合在一起的真实的参考。
Courses
- LLM Bootcamp: A practical course for building LLM-based applications with Charles Frye, Sergey Karayev, and Josh Tobin.
LLM Bootcamp:与Charles Frye,Sergey Karayev和Josh Tobin一起构建基于LLM的应用程序的实用课程。
- Hugging Face Transformers: Guide to using open-source LLMs in the Hugging Face transformers library.
拥抱脸变形金刚:在Hugging Face变压器库中使用开源LLM的指南。
LLM benchmarks
- Chatbot Arena: An Elo-style ranking system of popular LLMs, led by a team at UC Berkeley. Users can also participate by comparing models head to head.
聊天机器人竞技场:一个流行的法学硕士的Elo风格的排名系统,由加州大学伯克利分校的一个团队领导。用户还可以通过头对头比较模型来参与。
- Open LLM Leaderboard: A ranking by Hugging Face, comparing open source LLMs across a collection of standard benchmarks and tasks.
开放LLM排行榜:Hugging Face的排名,在一系列标准基准和任务中比较开源LLM。
Market analysis
We’ve all marveled at what generative AI can produce, but there are still a lot of questions about what it all means. Which products and companies will survive and thrive? What happens to artists? How should companies use it? How will it affect literally jobs and society at large? Here are some attempts at answering these questions.
我们都对生成式人工智能能产生什么感到惊讶,但关于这一切意味着什么,仍然有很多问题。哪些产品和公司将生存和发展?艺术家怎么了?企业应该如何使用它?它将如何影响就业和整个社会?以下是一些回答这些问题的尝试。
a16z thinking
- Who owns the generative AI platform?: Our flagship assessment of where value is accruing, and might accrue, at the infrastructure, model, and application layers of generative AI.
谁拥有人工智能平台?:我们对生成AI的基础设施、模型和应用层的价值正在积累和可能积累的旗舰评估。
- Navigating the high cost of AI compute: A detailed breakdown of why generative AI models require so many computing resources, and how to think about acquiring those resources (i.e., the right GPUs in the right quantity, at the right cost) in a high-demand market.
驾驭AI计算的高成本:详细分析了为什么生成式AI模型需要如此多的计算资源,以及如何考虑获取这些资源(即,在高需求的市场中,以合适的数量、合适的成本获得合适的GPU)。
- Art isn’t dead, it’s just machine-generated: A look at how AI models were able to reshape creative fields — often assumed to be the last holdout against automation — much faster than fields such as software development.
艺术并没有死,它只是机器生成的:看看人工智能模型如何能够重塑创造性领域-通常被认为是最后一个反对自动化的领域-比软件开发等领域快得多。
- The generative AI revolution in games: An in-depth analysis from our Games team at how the ability to easily create highly detailed graphics will change how game designers, studios, and the entire market function. This follow-up piece from our Games team looks specifically at the advent of AI-generated content vis à vis user-generated content.
游戏中的人工智能革命:我们的游戏团队深入分析了轻松创建高细节图形的能力将如何改变游戏设计师,工作室和整个市场的运作方式。我们的游戏团队的这篇后续文章特别关注人工智能生成内容维斯用户生成内容的出现。
- For B2B generative AI apps, is less more?: A prediction for how LLMs will evolve in the world of B2B enterprise applications, centered around the idea that summarizing information will ultimately be more valuable than producing text.
对于B2B生成式AI应用程序来说,更少更多吗?:预测LLM将如何在B2B企业应用程序的世界中发展,围绕着总结信息最终将比生成文本更有价值的想法。
- Financial services will embrace generative AI faster than you think: An argument that the financial services industry is poised to use generative AI for personalized consumer experiences, cost-efficient operations, better compliance, improved risk management, and dynamic forecasting and reporting.
金融服务将比你想象的更快地拥抱生成式AI:金融服务行业准备使用生成式人工智能来实现个性化的消费者体验、具有成本效益的运营、更好的合规性、更好的风险管理以及动态预测和报告。
- Generative AI: The next consumer platform: A look at opportunities for generative AI to impact the consumer market across a range of sectors from therapy to ecommerce.
生成AI:下一个消费平台:看看生成式人工智能在从治疗到电子商务的一系列领域影响消费者市场的机会。
- To make a real difference in health care, AI will need to learn like we do: AI is poised to irrevocably change how we look to prevent and treat illness. However, to truly transform drug discovery to care delivery, we should invest in creating an ecosystem of “specialist” AIs — that learn like our best physicians and drug developers do today.
为了在医疗保健领域发挥真实的的作用,人工智能需要像我们一样学习:人工智能将不可逆转地改变我们预防和治疗疾病的方式。然而,要真正将药物发现转变为医疗服务,我们应该投资创建一个“专家”人工智能生态系统-像我们今天最好的医生和药物开发人员一样学习。
- The new industrial revolution: Bio x AI: The next industrial revolution in human history will be biology powered by artificial intelligence.
新工业革命:Bio x AI:人类历史上的下一次工业革命将是由人工智能驱动的生物学。
Other perspectives 其他观点
- On the opportunities and risks of foundation models: Stanford overview paper on Foundation Models. Long and opinionated, but this shaped the term.
关于基金会模式的机遇和风险:斯坦福大学关于基金会模型的综述论文。很长很固执,但这塑造了这个词。
- State of AI Report: An annual roundup of everything going on in AI, including technology breakthroughs, industry development, politics/regulation, economic implications, safety, and predictions for the future.
AI状态报告:年度综述AI领域发生的一切,包括技术突破、行业发展、政治/监管、经济影响、安全性和对未来的预测。
- GPTs are GPTs: An early look at the labor market impact potential of large language models: This paper from researchers at OpenAI, OpenResearch, and the University of of Pennsylvania predicts that “around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted.”
GPT是GPT:对大型语言模型的劳动力市场影响潜力的初步研究:这篇来自OpenAI、OpenResearch和宾夕法尼亚大学的研究人员的论文预测,“大约80%的美国劳动力可能会有至少10%的工作任务受到引入LLM的影响,而大约19%的工人可能会看到至少50%的任务受到影响。
- Deep medicine: How artificial intelligence can make healthcare human again: Dr. Eric Topol reveals how artificial intelligence has the potential to free physicians from the time-consuming tasks that interfere with human connection. The doctor-patient relationship is restored. (a16z podcast)
深层医学:人工智能如何让医疗保健再次成为人类:Eric Topol博士揭示了人工智能如何有潜力将医生从干扰人类联系的耗时任务中解放出来。医患关系得到恢复。(a16z播客)
Landmark research results 里程碑式的研究成果
Most of the amazing AI products we see today are the result of no-less-amazing research, carried out by experts inside large companies and leading universities.
我们今天看到的大多数令人惊叹的人工智能产品都是由大公司和一流大学的专家进行的惊人研究的结果。
Lately, we’ve also seen impressive work from individuals and the open source community taking popular projects into new directions, for example by creating automated agents or porting models onto smaller hardware footprints.
最近,我们也看到了来自个人和开源社区的令人印象深刻的工作,将流行的项目带入新的方向,例如通过创建自动化代理或将模型移植到较小的硬件足迹上。
Here’s a collection of many of these papers and projects, for folks who really want to dive deep into generative AI.
这里收集了许多这样的论文和项目,适合那些真正想要深入研究生成式AI的人。
(For research papers and projects, we’ve also included links to the accompanying blog posts or websites, where available, which tend to explain things at a higher level. And we’ve included original publication years so you can track foundational research over time.)
(For除了研究论文和项目外,我们还提供了相应的博客文章或网站的链接,这些链接往往会在更高的层次上解释事情。我们还包括原始出版年份,以便您可以跟踪基础研究的时间。
Large language models 大型语言模型
New models
- Attention is all you need (2017): The original transformer work and research paper from Google Brain that started it all. (blog post)
你需要的只是注意力(2017)最初的变压器工作和Google Brain的研究论文开始了这一切。(blog职位)
- BERT: pre-training of deep bidirectional transformers for language understanding (2018): One of the first publicly available LLMs, with many variants still in use today. (blog post)
BERT:语言理解的深度双向转换器的预训练(2018):第一个公开可用的LLM之一,今天仍在使用许多变体。(blog职位)
- Improving language understanding by generative pre-training (2018): The first paper from OpenAI covering the GPT architecture, which has become the dominant development path in LLMs. (blog post)
通过生成式预训练提高语言理解(2018):OpenAI的第一篇论文涉及GPT架构,该架构已成为LLM的主要开发路径。(blog职位)
- Language models are few-shot learners (2020): The OpenAI paper that describes GPT-3 and the decoder-only architecture of modern LLMs.
语言模型是少数学习者(2020):OpenAI论文描述了GPT-3和现代LLM的仅解码器架构。
- Training language models to follow instructions with human feedback (2022): OpenAI’s paper explaining InstructGPT, which utilizes humans in the loop to train models and, thus, better follow the instructions in prompts. This was one of the key unlocks that made LLMs accessible to consumers (e.g., via ChatGPT). (blog post)
训练语言模型以遵循人类反馈的指令(2022):OpenAI的论文解释了InstructGPT,它利用循环中的人类来训练模型,从而更好地遵循提示中的说明。这是使LLM对消费者可访问的关键解锁之一(例如,通过ChatGPT)。(blog职位)
- LaMDA: language models for dialog applications (2022): A model form Google specifically designed for free-flowing dialog between a human and chatbot across a wide variety of topics. (blog post)
LaMDA:对话应用程序的语言模型(2022):Google专为人类和聊天机器人之间的自由对话而设计的模型表单,涉及各种主题。(blog职位)
- PaLM: Scaling language modeling with pathways (2022): PaLM, from Google, utilized a new system for training LLMs across thousands of chips and demonstrated larger-than-expected improvements for certain tasks as model size scaled up. (blog post). See also the PaLM-2 technical report.
PaLM:Scaling language modeling with pathways(2022):来自谷歌的PaLM利用一个新系统在数千个芯片上训练LLM,并在模型规模扩大时,对某些任务的改进超过预期。(blog post)。另见PaLM-2技术报告。
- OPT: Open Pre-trained Transformer language models (2022): OPT is one of the top performing fully open source LLMs. The release for this 175-billion-parameter model comes with code and was trained on publicly available datasets. (blog post)
OPT:开放预训练的Transformer语言模型(2022):OPT是表现最好的完全开源LLM之一。这个1750亿参数模型的发布附带了代码,并在公开可用的数据集上进行了训练。(blog职位)
- Training compute-optimal large language models (2022): The Chinchilla paper. It makes the case that most models are data limited, not compute limited, and changed the consensus on LLM scaling. (blog post)
训练计算最优的大型语言模型(2022):龙猫纸它使得大多数模型都是数据有限的,而不是计算有限的,并且改变了对LLM缩放的共识。(blog职位)
- GPT-4 technical report (2023): The latest and greatest paper from OpenAI, known mostly for how little it reveals! (blog post). The GPT-4 system card sheds some light on how OpenAI treats hallucinations, privacy, security, and other issues.
GPT-4技术报告(2023):OpenAI最新最伟大的论文,主要是因为它揭示的很少!(blog post)。GPT-4系统卡揭示了OpenAI如何处理幻觉,隐私,安全和其他问题。
- LLaMA: Open and efficient foundation language models (2023): The model from Meta that (almost) started an open-source LLM revolution. Competitive with many of the best closed-source models but only opened up to researchers on a restricted license. (blog post)
LLaMA:开放和高效的基础语言模型(2023):梅塔的模型(几乎)开始了开源LLM革命。与许多最好的闭源模型竞争,但只向研究人员开放有限的许可证。(blog职位)
- Alpaca: A strong, replicable instruction-following model (2023): Out of Stanford, this model demonstrates the power of instruction tuning, especially in smaller open-source models, compared to pure scale.
Alpaca:一个强大的,可复制的指令遵循模型(2023):在斯坦福大学大学,这个模型展示了指令调优的力量,特别是在较小的开源模型中,与纯规模相比。
Model improvements (e.g. fine-tuning, retrieval, attention)
模型改进(例如微调、检索、注意)
- Deep reinforcement learning from human preferences (2017): Research on reinforcement learning in gaming and robotics contexts, that turned out to be a fantastic tool for LLMs.
从人类偏好进行深度强化学习(2017):在游戏和机器人环境中进行强化学习的研究,这对LLM来说是一个很好的工具。
- Retrieval-augmented generation for knowledge-intensive NLP tasks (2020): Developed by Facebook, RAG is one of the two main research paths for improving LLM accuracy via information retrieval. (blog post)
知识密集型NLP任务的检索增强生成(2020年):由Facebook开发的RAG是通过信息检索提高LLM准确性的两条主要研究路径之一。(blog职位)
- Improving language models by retrieving from trillions of tokens (2021): RETRO, for “Retrieval Enhanced TRansfOrmers,” is another approach — this one by DeepMind — to improve LLM accuracy by accessing information not included in their training data. (blog post)
通过从数万亿个令牌中检索来改进语言模型(2021):RETRO,即“检索增强的TRansfOrmers”,是另一种方法-这是DeepMind的一种方法-通过访问未包含在其训练数据中的信息来提高LLM的准确性。(blog职位)
- LoRA: Low-rank adaptation of large language models (2021): This research out of Microsoft introduced a more efficient alternative to fine-tuning for training LLMs on new data. It’s now become a standard for community fine-tuning, especially for image models.
LoRA:大型语言模型的低秩适应(2021):微软的这项研究为在新数据上训练LLM引入了一种更有效的微调替代方案。它现在已经成为社区微调的标准,特别是对于图像模型。
- Constitutional AI (2022): The Anthropic team introduces the concept of reinforcement learning from AI Feedback (RLAIF). The main idea is that we can develop a harmless AI assistant with the supervision of other AIs.
宪法AI(2022年):Anthropic团队从AI Feedback(RLAIF)引入了强化学习的概念。主要想法是,我们可以在其他AI的监督下开发一个无害的AI助手。
- FlashAttention: Fast and memory-efficient exact attention with IO-awareness (2022): This research out of Stanford opened the door for state-of-the-art models to understand longer sequences of text (and higher-resolution images) without exorbitant training times and costs. (blog post)
FlashAttention:具有IO感知功能的快速高效的精确注意力(2022):斯坦福大学大学的这项研究为最先进的模型打开了大门,可以理解更长的文本序列(和更高分辨率的图像),而无需过多的训练时间和成本。(blog职位)
- Hungry hungry hippos: Towards language modeling with state space models (2022): Again from Stanford, this paper describes one of the leading alternatives to attention in language modeling. This is a promising path to better scaling and training efficiency. (blog post)
饥饿的河马:使用状态空间模型进行语言建模(2022):同样来自斯坦福大学,本文描述了语言建模中注意力的主要替代方案之一。这是一条有希望实现更好的扩展和训练效率的途径。(博客文章)
Image generation models 图像生成模型
- Learning transferable visual models from natural language supervision (2021): Paper that introduces a base model — CLIP — that links textual descriptions to images. One of the first effective, large-scale uses of foundation models in computer vision. (blog post)
从自然语言监督中学习可转移的视觉模型(2021):论文介绍了一个基本模型- CLIP -链接文本描述的图像。这是计算机视觉中基础模型的首次有效、大规模使用之一。(blog职位)
- Zero-shot text-to-image generation (2021): This is the paper that introduced DALL-E, a model that combines the aforementioned CLIP and GPT-3 to automatically generate images based on text prompts. Its successor, DALL-E 2, would kick off the image-based generative AI boom in 2022. (blog post)
零拍摄文本到图像生成(2021年):这篇论文介绍了DALL-E,这是一个结合了前面提到的CLIP和GPT-3的模型,可以根据文本提示自动生成图像。它的继任者DALL-E 2将在2022年开启基于图像的生成AI热潮。(blog职位)
- High-resolution image synthesis with latent diffusion models (2021): The paper that described Stable Diffusion (after the launch and explosive open source growth).
使用潜在扩散模型的高分辨率图像合成(2021):描述稳定扩散(在发布和爆炸性开源增长之后)的论文。
- Photorealistic text-to-image diffusion models with deep language understanding (2022): Imagen was Google’s foray into AI image generation. More than a year after its announcement, the model has yet to be released publicly as of the publish date of this piece. (website)
具有深度语言理解的逼真文本到图像扩散模型(2022):Imagen是Google进军AI图像生成领域的尝试。在宣布一年多之后,该模型尚未公开发布,截至本文发布之日。(网址)
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation (2022): DreamBooth is a system, developed at Google, for training models to recognize user-submitted subjects and apply them to the context of a prompt (e.g. [USER] smiling at the Eiffel Tower). (website)
DreamBooth:微调文本到图像的扩散模型,用于主题驱动的生成(2022):DreamBooth是一个由Google开发的系统,用于训练模型识别用户提交的主题并将其应用于提示的上下文(例如:[用户]微笑在埃菲尔铁塔)。(网址)
- Adding conditional control to text-to-image diffusion models (2023): This paper from Stanford introduces ControlNet, a now very popular tool for exercising fine-grained control over image generation with latent diffusion models.
向文本到图像扩散模型添加条件控制(2023):这篇来自斯坦福大学论文介绍了ControlNet,这是一种现在非常流行的工具,用于使用潜在扩散模型对图像生成进行细粒度控制。
Agents
- A path towards autonomous machine intelligence (2022): A proposal from Meta AI lead and NYU professor Yann LeCun on how to build autonomous and intelligent agents that truly understand the world around them.
迈向自主机器智能的道路(2022年):梅塔AI负责人和纽约大学教授Yann LeCun提出了一项关于如何构建真正了解周围世界的自主智能代理的建议。
- ReAct: Synergizing reasoning and acting in language models (2022): A project out of Princeton and Google to test and improve the reasoning and planning abilities of LLMs. (blog post)
ReAct:Synergizing Reasoning and Acting in Language Models(2022)普林斯顿大学和谷歌的一个项目,旨在测试和提高法学硕士的推理和规划能力。(博客文章)
- Generative agents: Interactive simulacra of human behavior (2023): Researchers at Stanford and Google used LLMs to power agents, in a setting akin to “The Sims,” whose interactions are emergent rather than programmed.
生成剂:人类行为的互动模拟(2023):斯坦福大学和谷歌的研究人员使用LLM为代理人提供动力,其设置类似于“西姆斯”,其交互是紧急的而不是编程的。
- Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Work from researchers at Northeastern University and MIT on teaching LLMs to solve problems more reliably by learning from their mistakes and past experiences.
Reflexion:an autonomous agent with dynamic memory and self-reflection(2023):东北大学和麻省理工学院的研究人员通过从错误和过去的经验中学习,教授法学硕士更可靠地解决问题。
- Toolformer: Language models can teach themselves to use tools (2023): This project from Meta trained LLMs to use external tools (APIs, in this case, pointing to things like search engines and calculators) in order to improve accuracy without increasing model size.
Toolformer:语言模型可以教自己使用工具(2023):梅塔的这个项目训练LLM使用外部工具(API,在这种情况下,指向搜索引擎和计算器等),以便在不增加模型大小的情况下提高精度。
- Auto-GPT: An autonomous GPT-4 experiment: An open source experiment to expand on the capabilities of GPT-4 by giving it a collection of tools (internet access, file storage, etc.) and choosing which ones to use in order to solve a specific task.
Auto-GPT:自主GPT-4实验:一个开源实验,通过提供一系列工具(互联网访问、文件存储等)来扩展GPT-4的功能。以及选择使用哪些来解决特定任务。
- BabyAGI: This Python script utilizes GPT-4 and vector databases (to store context) in order to plan and executes a series of tasks that solve a broader objective.
BabyAGI:这个Python脚本利用GPT-4和向量数据库(存储上下文)来计划和执行一系列任务,以解决更广泛的目标。
Other data modalities 其他数据模式
Code generation
- Evaluating large language models trained on code (2021): This is OpenAI’s research paper for Codex, the code-generation model behind the GitHub Copilot product. (blog post)
评估在代码上训练的大型语言模型(2021):这是OpenAI为Codex撰写的研究论文,Codex是GitHub Copilot产品背后的代码生成模型。(blog职位)
- Competition-level code generation with AlphaCode (2021): This research from DeepMind demonstrates a model capable of writing better code than human programmers. (blog post)
使用AlphaCode(2021)生成竞赛级代码:DeepMind的这项研究展示了一种能够编写比人类程序员更好的代码的模型。(博客文章)
- CodeGen: An open large language model for code with multi-turn program synthesis (2022): CodeGen comes out of the AI research arm at Salesforce, and currently underpins the Replit Ghostwriter product for code generation. (blog post)
CodeGen:一个开放的大型语言模型,用于多轮程序合成的代码(2022):CodeGen来自Salesforce的AI研究部门,目前支持Replit Ghostwriter产品用于代码生成。(博客文章)
Video generation
- Make-A-Video: Text-to-video generation without text-video data (2022): A model from Meta that creates short videos from text prompts, but also adds motion to static photo inputs or creates variations of existing videos. (blog post)
制作视频:无文本-视频数据的文本-视频生成(2022):梅塔的一个模型,从文本提示创建短视频,但也向静态照片输入添加运动或创建现有视频的变体。(blog职位)
- Imagen Video: High definition video generation with diffusion models (2022): Just what it sounds like: a version of Google’s image-based Imagen model optimized for producing short videos from text prompts. (website)
Imagen视频:使用扩散模型生成高清视频(2022年):就像它听起来一样:Google基于图像的Imagen模型的一个版本,优化用于从文本提示生成短视频。(网址)
Human biology and medical data 人体生物学和医学数据
- Strategies for pre-training graph neural networks (2020): This publication laid the groundwork for effective pre-training methods useful for applications across drug discovery, such as molecular property prediction and protein function prediction. (blog post)
预训练图神经网络的策略(2020):该出版物为有效的预训练方法奠定了基础,这些方法可用于药物发现的应用,例如分子性质预测和蛋白质功能预测。(博客文章)
- Improved protein structure prediction using potentials from deep learning (2020): DeepMind’s protein-centric transformer model, AlphaFold, made it possible to predict protein structure from sequence — a true breakthrough which has already had far-reaching implications for understanding biological processes and developing new treatments for diseases. (blog post) (explainer)
利用深度学习的潜力改进蛋白质结构预测(2020):DeepMind以蛋白质为中心的转换器模型AlphaFold使得从序列预测蛋白质结构成为可能-这是一项真正的突破,已经对理解生物过程和开发新的疾病治疗方法产生了深远的影响。(blog post)(解释者)
- Large language models encode clinical knowledge (2022): Med-PaLM is a LLM capable of correctly answering US Medical License Exam style questions. The team has since published results on the performance of Med-PaLM2, which achieved a score on par with “expert” test takers. Other teams have performed similar experiments with ChatGPT and GPT-4. (video)
大型语言模型编码临床知识(2022):Med-PaLM是一个能够正确回答美国医学执照考试风格问题的LLM。此后,该团队发表了Med-PaLM 2的性能结果,该结果与“专家”测试者的得分相当。其他团队也用ChatGPT和GPT-4进行了类似的实验。(视频)
Audio generation
- Jukebox: A generative model for music (2020): OpenAI’s foray into music generation using transformers, capable of producing music, vocals, and lyrics with minimal training. (blog post)
Jukebox:A generative model for music(2020)OpenAI使用变形金刚进军音乐生成领域,能够以最少的培训制作音乐,人声和歌词。(blog职位)
- AudioLM: a language modeling approach to audio generation (2022): AudioLM is a Google project for generating multiple types of audio, including speech and instrumentation. (blog post)
AudioLM:音频生成的语言建模方法(2022):AudioLM是一个Google项目,用于生成多种类型的音频,包括语音和乐器。(blog职位)
- MusicLM: Generating nusic from text (2023): Current state of the art in AI-based music generation, showing higher quality and coherence than previous attempts. (blog post)
MusicLM:Generating nusic from text(2023):基于AI的音乐生成的最新技术水平,显示出比以前的尝试更高的质量和一致性。(blog职位)
Multi-dimensional image generation
多维图像生成
- NeRF: Representing scenes as neural radiance fields for view synthesis (2020): Research from a UC-Berkeley-led team on “synthesizing novel views of complex scenes” using 5D coordinates. (website)
NeRF:将场景表示为用于视图合成的神经辐射场(2020):加州大学伯克利分校领导的团队使用5D坐标“合成复杂场景的新颖视图”的研究。(网址)
- DreamFusion: Text-to-3D using 2D diffusion (2022): Work from researchers at Google and UC-Berkeley that builds on NeRF to generate 3D images from 2D inputs. (website)
DreamFusion:使用2D扩散的文本到3D(2022):Google和UC-Berkeley的研究人员的工作,建立在NeRF的基础上,从2D输入生成3D图像。(网址)
原文地址:https://a16z.com/2023/05/25/ai-