Date | Topic/papers | Recommended reading | Pre-lecture questions | Presenters | Feedback providers |
---|---|---|---|---|---|
Sep 7 (Wed) | Introduction | 1. 中文解读Human Language Understanding & Reasoningz 中文解读 中文解读 |
- | Danqi Chen [slides] |
- |
What are LLMs? | |||||
Sep 12 (Mon) | BERT (encoder-only models) 1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
1. Deep contextualized word representations (ELMo)中文解读 中文解读 4. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators 中文解读 |
lec2 questions | Danqi Chen [slides] |
|
Sep 14 (Wed) | T5 (encoder-decoder models) 1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) |
1. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 中文解读 2. mT5: A massively multilingual pre-trained text-to-text transformer 中文解读 3. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model 中文解读 |
lec3 questions | Abhishek Panigrahi, Victoria Graf [slides] |
Edward Tian, Zihan Ding, Jiatong Yu, Anirudh Ajith |
Sep 19 (Mon) | GPT-3 (decoder-only models) 1. Language Models are Few-Shot Learners (GPT-3) |
1. Language Models are Unsupervised Multitask Learners (GPT-2)中文解读 中文解读 3. OPT: Open Pre-trained Transformer Language Models 中文解读 |
lec 4 questions | Sabhya Chhabria, Michael Tang [slides] |
Anika Maskara, Tianle Cai, Richard Zhu, Andrea Wynn |
How to Use and Adapt LLMs? | |||||
Sep 21 (Wed) | Prompting for few-shot learning 1. Making Pre-trained Language Models Better Few-shot Learners (blog post) 2. How Many Data Points is a Prompt Worth? |
1. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference 中文解读 2. True Few-Shot Learning with Language Models 中文解读 3. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models 中文解读 4. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing 中文解读 |
lec 5 questions | Kaixuan Huang, Edward Tian [slides] |
Sam Liang, Mengzhou Xia, Victoria Graf, Tianle Cai |
Sep 26 (Mon) | Prompting as parameter-efficient fine-tuning 1. Prefix-Tuning: Optimizing Continuous Prompts for Generation 2. The Power of Scale for Parameter-Efficient Prompt Tuning |
1. Factual Probing Is [MASK]: Learning vs. Learning to Recall 中文解读 2. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks 中文解读 3. LoRA: Low-Rank Adaptation of Large Language Models 中文解读 4. Towards a Unified View of Parameter-Efficient Transfer Learning 中文解读 |
lec 6 questions | Chris Pan, Hongjie Wang [slides] |
Sabhya Chhabria, Andrea Wynn, Sam Liang, Wenhan Xia |
Sep 28 (Wed) | In-context learning 1. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? 2. An Explanation of In-context Learning as Implicit Bayesian Inference (we don't expect you to read this paper in depth, you can check out this blog post instead) |
1. What Makes Good In-Context Examples for GPT-3? 中文解读 2. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity 中文解读 3. Data Distributional Properties Drive Emergent In-Context Learning in Transformers 中文解读 4. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes 中文解读 |
lec 7 questions | Sam Liang, Kexin Jin [slides] |
Anika Maskara, Zixu Zhang, Tong Wu, Victoria Graf |
Oct 3 (Mon) | Calibration of prompting LLMs 1. Calibrate Before Use: Improving Few-Shot Performance of Language Models 2. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right |
1. Noisy Channel Language Model Prompting for Few-Shot Text Classification 中文解读 2. How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering 中文解读 3. Language Models (Mostly) Know What They Know 中文解读 |
lec 8 questions | Vishvak Murahari, Howard Yen [slides] |
Jiatong Yu, Howard Chen, Chris Pan, Andre Niyongabo Rubungo, Devon Wood-Thomas |
Oct 5 (Wed) | Reasoning 1. Chain of Thought Prompting Elicits Reasoning in Large Language Models 2. Large Language Models are Zero-Shot Reasoners |
1. Explaining Answers with Entailment Trees 中文解读 2. Self-Consistency Improves Chain of Thought Reasoning in Language Models 中文解读 3. Faithful Reasoning Using Large Language Models 中文解读 |
lec 9 questions | Zihan Ding, Zixu Zhang [slides] |
Vishvak Murahari, Beiqi Zou, Chris Pan, Xiangyu Qi |
Oct 10 (Mon) | Knowledge 1. Language Models as Knowledge Bases? 2. How Much Knowledge Can You Pack Into the Parameters of a Language Model? |
1. Knowledge Neurons in Pretrained Transformers 中文解读 2. Fast Model Editing at Scale 中文解读 3. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets 中文解读 |
lec 10 questions | Jane Pan, Mengzhou Xia [slides] |
Andre Niyongabo Rubungo, Devon Wood-Thomas, Xiangyu Qi, Howard Chen |
Dissecting LLMs: Data, Model Scaling and Risks | |||||
Oct 12 (Wed) | Data 1. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus |
1. The Pile: An 800GB Dataset of Diverse Text for Language Modeling 中文解读 2. Deduplicating Training Data Makes Language Models Better 中文解读 |
lec 11 questions | Andre Niyongabo Rubungo, Tanushree Banerjee [slides] |
Arseniy Andreyev, Wenhan Xia, Xindi Wu, Richard Zhu |
Oct 14 (Fri) | Final project proposal due at 11:59pm Submit here. |
||||
Oct 17 (Mon) | Fall recess (no class) | ||||
Oct 19 (Wed) | Fall recess (no class) | ||||
Oct 24 (Mon) | Scaling 1. Training Compute-Optimal Large Language Models |
1. Scaling Laws for Neural Language Models 中文解读 2. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers 中文解读 3. Scaling Laws for Autoregressive Generative Modeling 中文解读 |
lec 12 questions | Anika Maskara, Simon Park [slides] |
Hongjie Wang, Sabhya Chhabria, Edward Tian, Kaixuan Huang |
Oct 26 (Wed) | Privacy 1. Extracting Training Data from Large Language Models |
1. Quantifying Memorization Across Neural Language Models 中文解读 2. Deduplicating Training Data Mitigates Privacy Risks in Language Models 中文解读 3. Large Language Models Can Be Strong Differentially Private Learners 中文解读 4. Recovering Private Text in Federated Learning of Language Models 中文解读 |
lec 13 questions | Xiangyu Qi, Tong Wu [slides] |
Anirudh Ajith, Austin Wang, Tanushree Banerjee, Arseniy Andreyev |
Oct 31 (Mon) | Bias & Toxicity I: evaluation 1. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models 2. OPT paper, Section 4 |
1. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 中文解读 2. Red Teaming Language Models with Language Models 中文解读 3. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection 中文解读 |
lec 14 questions | Maxine Perroni-Scharf, Richard Zhu [slides] |
Tong Wu, Hongjie Wang, Howard Yen, Mengzhou Xia |
Nov 2 (Wed) | Bias & Toxicity II: mitigation 1. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP |
1. Challenges in Detoxifying Language Models 中文解读 2. Detoxifying Language Models Risks Marginalizing Minority Voices 中文解读 3. Plug and Play Language Models: A Simple Approach to Controlled Text Generation 中文解读 4. GeDi: Generative discriminator guided sequence generation 中文解读 |
lec 15 questions | Anirudh Ajith, Arnab Bhattacharjee [slides] |
Maxine Perroni-Scharf, Xindi Wu, Jane Pan, Howard Chen |
Beyond Current LLMs: Models and Applications | |||||
Nov 7 (Mon) | Sparse models 1. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
1. Efficient Large Scale Language Modeling with Mixtures of Experts 中文解读 2. Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models 中文解读 3. A Review of Sparse Expert Models in Deep Learning 中文解读 |
lec 16 questions | Zhou Lu, Wenhan Xia [slides] | Michael Tang, Arnab Bhattacharjee, Kexin Jin, Beiqi Zou |
Nov 9 (Wed) | Retrieval-based LMs 1. Improving language models by retrieving from trillions of tokens |
1. Generalization through Memorization: Nearest Neighbor Language Models 中文解读 2. Training Language Models with Memory Augmentation 中文解读 3. Few-shot Learning with Retrieval Augmented Language Models 中文解读 |
lec 17 questions | Tianle Cai, Beiqi Zou [slides] |
Simon Park, Jane Pan, Maxine Perroni-Scharf, Abhishek Panigrahi |
Nov 14 (Mon) | Training LMs with human feedback 1. Training language models to follow instructions with human feedback |
1. Learning to summarize from human feedback 中文解读 2. Fine-Tuning Language Models from Human Preferences 中文解读 3. MemPrompt: Memory-assisted Prompt Editing with User Feedback 中文解读 4. LaMDA: Language Models for Dialog Application 中文解读 |
lec 18 questions | Howard Chen, Austin Wang [slides] |
Abhishek Panigrahi, Simon Park, Kaixuan Huang, Arseniy Andreyev |
Nov 16 (Wed) | Code LMs 1. Evaluating Large Language Models Trained on Code |
1. A Conversational Paradigm for Program Synthesis 中文解读 2. InCoder: A Generative Model for Code Infilling and Synthesis 中文解读 3. A Systematic Evaluation of Large Language Models of Code 中文解读 4. Language Models of Code are Few-Shot Commonsense Learners 中文解读 5. Competition-Level Code Generation with AlphaCode 中文解读 |
lec 19 questions | Arseniy Andreyev, Jiatong Yu [slides] |
Howard Yen, Michael Tang, Tanushree Banerjee, Kexin Jin |
Nov 21 (Mon) | Multimodal LMs 1. Flamingo: a Visual Language Model for Few-Shot Learning |
1. Blog post: Generalized Visual Language Models 中文解读 2. Learning Transferable Visual Models From Natural Language Supervision (CLIP) 中文解读 3. Multimodal Few-Shot Learning with Frozen Language Models 中文解读 4. CM3: A Causal Masked Multimodal Model of the Internet 中文解读 |
lec 20 questions | Andrea Wynn, Xindi Wu [slides] |
Arnab Bhattacharjee, Vishvak Murahari, Austin Wang, Zihan Ding |
Nov 23 (Wed) | Thanksgiving recess (no class) | ||||
Nov 28 (Mon) | Guest lecture: Alexander Rush (Cornell/Hugging Face) Multitask Prompted Training for Zero-Shot Models |
1. Multitask Prompted Training Enables Zero-Shot Task Generalization 中文解读 2. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts 中文解读 3. Scaling Instruction-Finetuned Language Models 中文解读 4. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks 中文解读 |
|||
Nov 30 (Wed) | AI Alignment + open discussion | 1. A General Language Assistant as a Laboratory for Alignment 中文解读 2. Alignment of Language Agents 中文解读 3. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 中文解读 |
Devon Wood-Thomas (half of the lecture) [slides] |
Richard Zhu, Sabhya Chhabria, Andrea Wynn, Anirudh Ajith | |
Dec 5 (Mon) | in-class presentation (extended class) | ||||
Dec 7 (Wed) | No class | ||||
Dec 16 (Fri) | Final project due at 11:59pm (dean's date) |