Softmax Strategy

 1. epsilon-greedy strategy

11111

2. UCB strategy

222

3. Softmax  strategy

333

4. Gradient strategy

444

References

[1] 科学网—【RL系列】Multi-Armed Bandit笔记——Softmax选择策略 - 管金昱的博文

[2] The Epsilon-Greedy Algorithm | James D. McCaffrey

你可能感兴趣的:(Reinforcement,Learning,强化学习)