RLHF代码

  • https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/reward_model/reward_model.py
  • https://github.com/CarperAI/trlx/blob/main/trlx/models/modeling_ppo.py

你可能感兴趣的:(人工智能,深度学习)