作者:孙思琦、成宇、甘哲、刘晶晶
本文为你介绍“耐心的知识蒸馏”模型。
数据派THU后台回复“191010”,获取论文地址。
图表1
图表2
Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI Blog 1.8 (2019).
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237 (2019).
Liu, Yinhan, et al. "Roberta: A robustly optimized BERT pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
Siqi Sun: is a Research SDE in Microsoft. He is currently working on commonsense reasoning and knowledge graph related projects. Prior joining Microsoft, he was a PhD student in computer science at TTI Chicago, and before that he was an undergraduate student from school of mathematics at Fudan University.
Yu Cheng: is a senior researcher at Microsoft. His research is about deep learning in general, with specific interests in model compression, deep generative model and adversarial learning. He is also interested in solving real-world problems in computer vision and natural language processing. Yu received his Ph.D.from Northwestern University in 2015 and his bachelor from Tsinghua University in 2010. Before join Microsoft, he spent three years as a Research Staff Member at IBM Research/MIT-IBM Watson AI Lab.
Zhe Gan: is a senior researcher at Microsoft, primarily working on generative models, visual QA/dialog, machine reading comprehension (MRC), and natural language generation (NLG). He also has broad interests on various machine learning and NLP topics. Zhe received his PhD degree from Duke University in Spring 2018. Before that, he received his Master's and Bachelor's degree from Peking University in 2013 and 2010, respectively.
Jingjing (JJ) Liu: is a Principal Research Manager at Microsoft, leading a research team in NLP and Computer Vision. Her current research interests include Machine Reading Comprehension, Commonsense Reasoning, Visual QA/Dialog and Text-to-Image Generation. She received her PhD degree in Computer Science from MIT EECS in 2011. She also holds an MBA degree from Judge Business School at University of Cambridge.Before joining MSR, Dr.Liu was the Director of Product at Mobvoi Inc and Research Scientist at MIT CSAIL.
数据派THU后台回复“191010”,获取论文地址。
点击“阅读原文”拥抱组织