LLM记录202304-202306

RLHF

RAFT

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
code
LLM记录202304-202306_第1张图片

RRHF

RRHF: Rank Responses to Align Language Models with Human Feedback without tears
code
p i = ∑ t log ⁡ P π ( y i , t ∣ y i , < t ) ∥ y i ∥ p_i=\frac{\sum_{t}\log P_{\pi}(y_{i,t}|y_{i,pi=yitlogPπ(yi,tyi,<t)
L r a n k = ∑ r i < r j max ⁡ ( 0 , p i − p j ) L_{rank}=\sum_{r_iLr

你可能感兴趣的:(LLM,人工智能,LLM)