【deepseek】论文笔记--DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1 论文解析

1. 论文基本信息

标题:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
作者:DeepSeek-AI团队(联系邮箱:[email protected]
发表时间与出处:2024年,AIME 2024(人工智能与数学教育国际会议)

关键词

  • Reinforcement Learning (强化学习)
  • Reasoning Models (推理模型)
  • Chain-of-Thought (思维链)
  • Model Distillation (模型蒸馏)
  • Mixed Precision Training (混合精度训练)
  • Self-Evolution (自我进化)
  • Cold Start (冷启动)
  • GRPO Algorithm (分组相对策略优化)
  • SWE-Bench (软件工程基准测试)

:论文聚焦强化学习在LLM推理能

你可能感兴趣的:(人工智能,大语言模型学习笔记,论文阅读,人工智能,deepseek)