Understanding Large Language Models-TianQi Chen

 

Schedule

Date Topic/papers Recommended reading Pre-lecture questions Presenters Feedback providers
Sep 7 (Wed) Introduction

1. 中文解读Human Language Understanding & Reasoningz
2. Attention Is All You Need (Transformers)

中文解读
3. Blog Post: The Illustrated Transformer

中文解读
4. HuggingFace's course on Transformers

- Danqi Chen
[slides]
-
What are LLMs?
Sep 12 (Mon) BERT (encoder-only models)
1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

1. Deep contextualized word representations ​​​​​​ (ELMo)中文解读
2. Improving Language Understanding by Generative Pre-Training (OpenAI GPT)中文解读
3. RoBERTa: A Robustly Optimized BERT Pretraining Approach​​​​​​

中文解读

4. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

中文解读 

lec2 questions Danqi Chen
[slides]
Sep 14 (Wed) T5 (encoder-decoder models)
1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)

1. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

中文解读

2. mT5: A massively multilingual pre-trained text-to-text transformer

中文解读

3. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

中文解读 

lec3 questions Abhishek Panigrahi, Victoria Graf
[slides]
Edward Tian, Zihan Ding, Jiatong Yu, Anirudh Ajith
Sep 19 (Mon) GPT-3 (decoder-only models)
1. Language Models are Few-Shot Learners (GPT-3)

1. Language Models are Unsupervised Multitask Learners (GPT-2)中文解读
2. PaLM: Scaling Language Modeling with Pathways

中文解读

3. OPT: Open Pre-trained Transformer Language Models

中文解读 

lec 4 questions Sabhya Chhabria, Michael Tang
[slides]
Anika Maskara, Tianle Cai, Richard Zhu, Andrea Wynn
How to Use and Adapt LLMs?
Sep 21 (Wed) Prompting for few-shot learning
1. Making Pre-trained Language Models Better Few-shot Learners (blog post)
2. How Many Data Points is a Prompt Worth?

1. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

中文解读

2. True Few-Shot Learning with Language Models

中文解读

3. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

中文解读

4. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

中文解读 

lec 5 questions Kaixuan Huang, Edward Tian
[slides]
Sam Liang, Mengzhou Xia, Victoria Graf, Tianle Cai
Sep 26 (Mon) Prompting as parameter-efficient fine-tuning
1. Prefix-Tuning: Optimizing Continuous Prompts for Generation
2. The Power of Scale for Parameter-Efficient Prompt Tuning

1. Factual Probing Is [MASK]: Learning vs. Learning to Recall

中文解读

2. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

中文解读

3. LoRA: Low-Rank Adaptation of Large Language Models

中文解读

4. Towards a Unified View of Parameter-Efficient Transfer Learning

中文解读 

lec 6 questions Chris Pan, Hongjie Wang
[slides]
Sabhya Chhabria, Andrea Wynn, Sam Liang, Wenhan Xia
Sep 28 (Wed) In-context learning
1. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
2. An Explanation of In-context Learning as Implicit Bayesian Inference (we don't expect you to read this paper in depth, you can check out this blog post instead)

1. What Makes Good In-Context Examples for GPT-3?

中文解读

2. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

中文解读

3. Data Distributional Properties Drive Emergent In-Context Learning in Transformers

中文解读

4. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

中文解读 

lec 7 questions Sam Liang, Kexin Jin
[slides]
Anika Maskara, Zixu Zhang, Tong Wu, Victoria Graf
Oct 3 (Mon) Calibration of prompting LLMs
1. Calibrate Before Use: Improving Few-Shot Performance of Language Models
2. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right

1. Noisy Channel Language Model Prompting for Few-Shot Text Classification

中文解读

2. How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering

中文解读

3. Language Models (Mostly) Know What They Know

中文解读 

lec 8 questions Vishvak Murahari, Howard Yen
[slides]
Jiatong Yu, Howard Chen, Chris Pan, Andre Niyongabo Rubungo, Devon Wood-Thomas
Oct 5 (Wed) Reasoning
1. Chain of Thought Prompting Elicits Reasoning in Large Language Models
2. Large Language Models are Zero-Shot Reasoners

1. Explaining Answers with Entailment Trees

中文解读

2. Self-Consistency Improves Chain of Thought Reasoning in Language Models

中文解读

3. Faithful Reasoning Using Large Language Models

中文解读 

lec 9 questions Zihan Ding, Zixu Zhang
[slides]
Vishvak Murahari, Beiqi Zou, Chris Pan, Xiangyu Qi
Oct 10 (Mon) Knowledge
1. Language Models as Knowledge Bases?
2. How Much Knowledge Can You Pack Into the Parameters of a Language Model?

1. Knowledge Neurons in Pretrained Transformers

中文解读

2. Fast Model Editing at Scale

中文解读

3. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets

中文解读 

lec 10 questions Jane Pan, Mengzhou Xia
[slides]
Andre Niyongabo Rubungo, Devon Wood-Thomas, Xiangyu Qi, Howard Chen
Dissecting LLMs: Data, Model Scaling and Risks
Oct 12 (Wed) Data
1. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

1. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

中文解读

2. Deduplicating Training Data Makes Language Models Better

中文解读 

lec 11 questions Andre Niyongabo Rubungo, Tanushree Banerjee
[slides]
Arseniy Andreyev, Wenhan Xia, Xindi Wu, Richard Zhu
Oct 14 (Fri) Final project proposal due at 11:59pm
Submit here.
Oct 17 (Mon) Fall recess (no class)
Oct 19 (Wed) Fall recess (no class)
Oct 24 (Mon) Scaling
1. Training Compute-Optimal Large Language Models

1. Scaling Laws for Neural Language Models

中文解读

2. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

中文解读

3. Scaling Laws for Autoregressive Generative Modeling

中文解读 

lec 12 questions Anika Maskara, Simon Park
[slides]
Hongjie Wang, Sabhya Chhabria, Edward Tian, Kaixuan Huang
Oct 26 (Wed) Privacy
1. Extracting Training Data from Large Language Models

1. Quantifying Memorization Across Neural Language Models

中文解读

2. Deduplicating Training Data Mitigates Privacy Risks in Language Models

中文解读

3. Large Language Models Can Be Strong Differentially Private Learners

中文解读

4. Recovering Private Text in Federated Learning of Language Models

中文解读 

lec 13 questions Xiangyu Qi, Tong Wu
[slides]
Anirudh Ajith, Austin Wang, Tanushree Banerjee, Arseniy Andreyev
Oct 31 (Mon) Bias & Toxicity I: evaluation
1. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
2. OPT paper, Section 4

1. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

中文解读

2. Red Teaming Language Models with Language Models

中文解读

3. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

中文解读 

lec 14 questions Maxine Perroni-Scharf, Richard Zhu
[slides]
Tong Wu, Hongjie Wang, Howard Yen, Mengzhou Xia
Nov 2 (Wed) Bias & Toxicity II: mitigation
1. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

1. Challenges in Detoxifying Language Models

中文解读

2. Detoxifying Language Models Risks Marginalizing Minority Voices

中文解读

3. Plug and Play Language Models: A Simple Approach to Controlled Text Generation

中文解读

4. GeDi: Generative discriminator guided sequence generation

中文解读 

lec 15 questions Anirudh Ajith, Arnab Bhattacharjee
[slides]
Maxine Perroni-Scharf, Xindi Wu, Jane Pan, Howard Chen
Beyond Current LLMs: Models and Applications
Nov 7 (Mon) Sparse models
1. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

1. Efficient Large Scale Language Modeling with Mixtures of Experts

中文解读

2. Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

中文解读

3. A Review of Sparse Expert Models in Deep Learning

中文解读 

lec 16 questions Zhou Lu, Wenhan Xia [slides] Michael Tang, Arnab Bhattacharjee, Kexin Jin, Beiqi Zou
Nov 9 (Wed) Retrieval-based LMs
1. Improving language models by retrieving from trillions of tokens

1. Generalization through Memorization: Nearest Neighbor Language Models

中文解读

2. Training Language Models with Memory Augmentation

中文解读

3. Few-shot Learning with Retrieval Augmented Language Models

中文解读 

lec 17 questions Tianle Cai, Beiqi Zou
[slides]
Simon Park, Jane Pan, Maxine Perroni-Scharf, Abhishek Panigrahi
Nov 14 (Mon) Training LMs with human feedback
1. Training language models to follow instructions with human feedback

1. Learning to summarize from human feedback

中文解读

2. Fine-Tuning Language Models from Human Preferences

中文解读

3. MemPrompt: Memory-assisted Prompt Editing with User Feedback

中文解读

4. LaMDA: Language Models for Dialog Application

中文解读 

lec 18 questions Howard Chen, Austin Wang
[slides]
Abhishek Panigrahi, Simon Park, Kaixuan Huang, Arseniy Andreyev
Nov 16 (Wed) Code LMs
1. Evaluating Large Language Models Trained on Code

1. A Conversational Paradigm for Program Synthesis

中文解读

2. InCoder: A Generative Model for Code Infilling and Synthesis

中文解读

3. A Systematic Evaluation of Large Language Models of Code

中文解读

4. Language Models of Code are Few-Shot Commonsense Learners

中文解读

5. Competition-Level Code Generation with AlphaCode

中文解读 

lec 19 questions Arseniy Andreyev, Jiatong Yu
[slides]
Howard Yen, Michael Tang, Tanushree Banerjee, Kexin Jin
Nov 21 (Mon) Multimodal LMs
1. Flamingo: a Visual Language Model for Few-Shot Learning

1. Blog post: Generalized Visual Language Models

中文解读

2. Learning Transferable Visual Models From Natural Language Supervision (CLIP)

中文解读

3. Multimodal Few-Shot Learning with Frozen Language Models

中文解读

4. CM3: A Causal Masked Multimodal Model of the Internet

中文解读 

lec 20 questions Andrea Wynn, Xindi Wu
[slides]
Arnab Bhattacharjee, Vishvak Murahari, Austin Wang, Zihan Ding
Nov 23 (Wed) Thanksgiving recess (no class)
Nov 28 (Mon) Guest lecture: Alexander Rush (Cornell/Hugging Face)
Multitask Prompted Training for Zero-Shot Models

1. Multitask Prompted Training Enables Zero-Shot Task Generalization

中文解读

2. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

中文解读

3. Scaling Instruction-Finetuned Language Models

中文解读

4. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

中文解读 

Nov 30 (Wed) AI Alignment + open discussion

1. A General Language Assistant as a Laboratory for Alignment

中文解读

2. Alignment of Language Agents

中文解读

3. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

中文解读 

Devon Wood-Thomas (half of the lecture)
[slides]
Richard Zhu, Sabhya Chhabria, Andrea Wynn, Anirudh Ajith
Dec 5 (Mon) in-class presentation (extended class)
Dec 7 (Wed) No class
Dec 16 (Fri) Final project due at 11:59pm (dean's date)

你可能感兴趣的:(语言模型,人工智能,自然语言处理)