DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning论文解读
文章目录前言一、摘要二、引言三、贡献1.贡献后训练:基础模型的大规模强化学习蒸馏:较小的模型也可以很强大2.评估结果概览reasoningtasksknowledgeohters四、方法1.Overview2.DeepSeek-R1-Zero:ReinforcementLearningontheBaseModelReinforcementLearningAlgorithm(GRPO重点)Rewar