21点是一个非常流行的扑克游戏,规则很简单,计算手中的牌,如果超过21点则输,不到21点则比谁的牌大。其中JQK都当成10点计算,A可以计算为1点或11点。
在澳门的娱乐场里面,21点的规则如下,庄家先给每位玩家发一张牌以及给自己发一张牌,然后庄家继续给玩家发一张牌。这时玩家手里有两张牌,庄家只有一张牌。玩家需要根据自己的牌以及庄家的牌来作出决策,是继续要牌,还是停牌,如果两张牌一致还可以选择分牌。庄家根据玩家的决策来进行相应的操作,并且计算点数,如果玩家的牌超过21点则算玩家负。等所有玩家的操作都完成后,庄家再继续给自己派牌,如果庄家的牌不包含A并且点数不超过16点(也称之为硬16点),庄家需要继续补牌,如果含A并且点数不超过17点(软17点),庄家也需要补牌。最后比较庄家和玩家的牌的大小来决定胜负。除此之外还有一些特殊的规则,例如玩家在刚开始拿到2张牌的时候,可以选择加倍,这是只能再拿一张牌,如果赢了则拿到双倍的筹码。当玩家刚开始拿到的2张牌是A和10, J, Q, K的任何一张时,叫做Black Jack,如果庄家的牌不是A,10,J,Q,K时,需要立即赔付玩家1.5倍的筹码,如果庄家的牌是A,10,J,Q,K,玩家可以选择先赢1倍的筹码并离场,或者等待庄家开牌之后,如果庄家不是Black Jack则赢1.5倍的筹码,或者庄家是Black Jack则打平。另外当庄家第一张牌是除A之外的其他牌时,玩家可以选择投降拿回一半的筹码。
网上已经有很多关于21点的基本出牌策略的介绍,这些出牌策略应该都是基于美国一位大学教授用概率的方法研究出来的,这种研究方法需要对概率学有很深的了解。另外在人工智能里面,基于强化学习来模拟21点游戏进行研究也是一个很基础的话题。在这里我想用另一种方法来研究21点的出牌策略,即用蒙特卡洛的方法来模拟很多局21点游戏,通过计算每种结果的取值来得出一个出牌策略,根据大数定律,当模拟的局数足够多时,得出的结果就会很接近真实的概率了。
首先我们先建立一个庄家的对象,负责模拟庄家的操作,代码如下:
class Dealer:
flag_ace = False
firstcard = 0
cards = []
def __init__(self, card):
if card>10:
card=10
self.firstcard = card
def getcard(self):
card = randint(1,13)
if card>10:
card=10
if card==1:
self.flag_ace = True
self.cards.append(card)
def calculatescore(self):
score = np.sum(np.array(self.cards))
if self.flag_ace:
if score<=11:
score += 10 #This means one ace can be treat as 11 not 1
return score
def play(self):
self.cards = [self.firstcard]
if self.firstcard==1:
self.flag_ace = True
else:
self.flag_ace = False
while True:
self.getcard()
score = self.calculatescore()
if self.flag_ace and score>17:
break
if self.flag_ace==False and score>16:
break
return score
然后我们建立一个玩家的对象,代码如下:
class Player:
cards = []
flag_ace = False
def __init__(self, card1, card2):
self.cards = [card1, card2]
if card1==1 or card2==1:
self.flag_ace = True
def hit(self):
card = randint(1,13)
if card>10:
card=10
if card==1:
self.flag_ace = True
self.cards.append(card)
def calculatescore(self):
score = np.sum(np.array(self.cards))
if self.flag_ace:
if score<=11:
score += 10 #This means one ace can be treat as 11 not 1
return score
接下来我们就可以通过模拟玩21点来研究策略了。首先研究的是当玩家没有拿到2张一样的牌的时候(即没有分牌这种可能性),玩家需要采用哪种策略最优(要牌,加倍,停牌)。为了得到最优策略,我们可以通过迭代的方法来进行模拟。首先设定当玩家两张手牌的和为21点的时候,无论庄家手牌是多少点我们都需要采取停牌的策略(这是显而易见的),那么当玩家手牌为20点时,模拟分别要牌,加倍或停牌这三种情况(因为玩家在20点时只要加倍或要牌,必然会到达21点或以上,这样我们就知道玩家的下一个采取的策略),看哪种策略获取的收益最高。当玩家手牌为20点时的策略全部确定下来之后,继续探索手牌为19点时的策略,如此循环下去,最终我们就能得到玩家手牌从4点到19点的所有最优策略。代码如下:
from random import randint
import numpy as np
policy = {}
policy['21'] = []
policy['A,10'] = []
for i in range(10):
policy['21'].append('S')
policy['A,10'].append('S')
for card1 in range(10, 1, -1):
for card2 in range(10, 0, -1):
if card2==1:
key = 'A,'+str(card1)
else:
key = str(card1+card2)
if key in policy:
continue
else:
policy[key] = []
for dealercard in range(1, 11):
player1_money=10000
player2_money=10000
player3_money=10000
bet = 100
games = 100000
player1_policy = 'S'
player2_policy = 'D'
player3_policy = 'H'
for i in range(games):
#p = Player(player_card1,player_card2)
p = Player(card1, card2)
d = Dealer(dealercard)
score1 = p.calculatescore() #For player1, stand
p.hit()
score2 = p.calculatescore()
score3 = p.calculatescore() #For player2 double, player3 hit
while True:
if score3>=20:
break
if p.flag_ace:
cardsum = np.sum(np.array(p.cards))
if cardsum-1<11:
k = 'A,'+str(cardsum-1)
else:
k = str(score3)
else:
k = str(score3)
if policy[k][dealercard-1]=='S' or policy[k][dealercard-1]=='D/S':
break
else:
p.hit()
score3 = p.calculatescore()
dealerscore = d.play()
if score2>21:
if dealerscore>21:
player1_money += bet
else:
if score1>dealerscore:
player1_money += bet
if score121:
player3_money -= bet
if dealerscore>21:
player1_money += bet
player2_money += 2*bet
else:
if score1>dealerscore:
player1_money += bet
if score1dealerscore:
player2_money += 2*bet
if score221:
player1_money += bet
player2_money += 2*bet
player3_money += bet
else:
if score1>dealerscore:
player1_money += bet
if score1dealerscore:
player2_money += 2*bet
if score2dealerscore:
player3_money += bet
if score3player3_money:
policy[key].append('D/S')
else:
policy[key].append('D')
if max_money==player3_money:
policy[key].append('H')
保存这个出牌策略到CSV文件,然后用pandas dataframe来展现:
#调整policy中的Key的排序
keys = sorted(policy.keys())
sorted_keys = keys[12:18]
sorted_keys.extend(keys[:12])
sorted_keys.extend(keys[19:])
sorted_keys.append(keys[18])
with open('policy.csv', 'w') as f:
policy_result = 'Player;2;3;4;5;6;7;8;9;10;A\n'
for key in sorted_keys:
policy_result += key+';'+';'.join(policy[key][1:])+';'+policy[key][0]+'\n'
f.write(policy_result)
df_policy = pd.read_csv('policy.csv', header=0, index_col=0, sep=';')
df_policy.head(100)
策略如下,其中D表示加倍,H表示要牌,S表示停牌,D/S表示如果不能加倍则停牌。第一行表示庄家第一张牌的点数,第一列表示玩家两张牌的点数:
Player | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | A |
---|---|---|---|---|---|---|---|---|---|---|
4 | H | H | H | H | H | H | H | H | H | H |
5 | H | H | H | H | H | H | H | H | H | H |
6 | H | H | H | H | H | H | H | H | H | H |
7 | H | H | H | H | H | H | H | H | H | H |
8 | H | H | H | H | D | H | H | H | H | H |
9 | H | D | D | D | D | H | H | H | H | H |
10 | D | D | D | D | D | D | D | D | H | H |
11 | D | D | D | D | D | D | D | D | H | H |
12 | H | S | S | S | S | H | H | H | H | H |
13 | S | S | S | S | S | H | H | H | H | H |
14 | S | S | S | S | S | H | H | H | H | H |
15 | S | S | S | S | S | H | H | H | H | S |
16 | S | S | S | S | S | H | H | H | S | S |
17 | S | S | S | S | S | S | S | S | S | S |
18 | S | S | S | S | S | S | S | S | S | S |
19 | S | S | S | S | S | S | S | S | S | S |
20 | S | S | S | S | S | S | S | S | S | S |
21 | S | S | S | S | S | S | S | S | S | S |
A,2 | H | H | H | D | D | H | H | H | H | H |
A,3 | H | H | H | D | D | H | H | H | H | H |
A,4 | H | H | D | D | D | H | H | H | H | H |
A,5 | H | D | D | D | D | H | H | H | H | H |
A,6 | D | D | D | D | D | H | H | H | H | H |
A,7 | D/S | D/S | D/S | D/S | D | S | S | H | H | H |
A,8 | S | S | S | S | D/S | S | S | S | S | S |
A,9 | S | S | S | S | S | S | S | S | S | S |
A,10 | S | S | S | S | S | S | S | S | S | S |
接下来我们需要继续补充当玩家头两张牌是一样的策略,是需要分牌,还是停牌,要牌等等。这个可以基于刚才得出的策略的基础上来进一步模拟。如以下代码:
split_result = {}
for i in trange(2,10):
split_result[str(i)+","+str(i)]=[]
for c in trange(1, 11): #Dealer card 2-10, A
lose_counts = 0
win_counts = 0
draw_counts = 0
split_gain = 0
nonsplit_gain = 0
for j in range(steps):
for k in range(3): #k-0,1, Split card, k-2, not split
d = Dealer(c)
while True:
secondcard = randint(1,13)
if secondcard>10:
secondcard = 10
if secondcard!=i:
break
if k==2:
secondcard = i
p = Player(i, secondcard)
hit_count = 0
player_lose = False
doublescaler = 1
while True:
score = p.calculatescore()
if score>18 and score<=21:
break
if score>21:
player_lose = True
break
if p.flag_ace:
cardsum = np.sum(np.array(p.cards))
if cardsum-1<11:
key = 'A,'+str(cardsum-1)
else:
key = str(score)
else:
key = str(score)
action = policy[key][c-1]
if hit_count==0:
if action=='D' or action=='D/S':
doublescaler = 2
if (hit_count==0 and action=='D/S') or action=='D' or action=='H':
p.hit()
hit_count += 1
else:
score = p.calculatescore()
if score>21:
lose_counts += 1*doublescaler
if k<2:
split_gain -= 1*doublescaler
else:
nonsplit_gain -= 1*doublescaler
player_lose = True
break
else:
player_lose = False
break
if player_lose==False:
dealerscore = d.play()
if dealerscore>21:
win_counts += 1*doublescaler
if k<2:
split_gain += 1*doublescaler
else:
nonsplit_gain += 1*doublescaler
else:
if scoredealerscore:
win_counts += 1*doublescaler
if k<2:
split_gain += 1*doublescaler
else:
nonsplit_gain += 1*doublescaler
else:
draw_counts += 1
else:
if k<2:
split_gain -= 1*doublescaler
else:
nonsplit_gain -= 1*doublescaler
del d, p
split_result[str(i)+","+str(i)].append([split_gain,nonsplit_gain])
split_policy = {}
for key in split_result:
split_policy[key] = []
for i in range(10):
if split_result[key][i][0]
分牌的策略如下:
Player | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | A |
---|---|---|---|---|---|---|---|---|---|---|
2,2 | P | P | P | P | P | P | P | H | H | H |
3,3 | P | P | P | P | P | P | P | H | H | H |
4,4 | H | H | P | P | P | H | H | H | H | H |
5,5 | D | D | D | D | D | D | D | D | H | H |
6,6 | P | P | P | P | P | P | H | H | H | H |
7,7 | P | P | P | P | P | P | P | H | H | H |
8,8 | P | P | P | P | P | P | P | P | P | S |
9,9 | P | P | P | P | P | S | P | P | S | S |
现在有了策略之后,我们就可以模拟运行多局21点游戏,看看在实际表现如何。
首先把以上得到的两个策略合并为一个完整的策略,如以下的代码:
with open('policy.csv', 'r') as f:
lines = f.readlines()
with open('split_policy.csv', 'r') as f:
lines.extend(f.readlines()[1:])
complete_policy = {}
for l in lines[1:]:
a = l.strip().split(';')
complete_policy[a[0]] = [a[-1]]
complete_policy[a[0]].extend(a[1:])
其次需要在原有的player对象中增添一个play方法,使得可以基于刚才得到的策略来进行要牌停牌的操作,代码如下,其中参数dealercard表示庄家的手牌,policy是一个字典,代表玩家要采取的策略,simulate表示是否模拟(这个参数在之后比较不同策略的时候会用到)
def play(self, dealercard, policy, simulate=False):
card1 = self.cards[0]
card2 = self.cards[1]
hitcount = 0
scale = 1.0
count = 0
while True:
if self.flag_ace and len(self.cards)==2:
if max(card1, card2)==10:
scale=1.5
score=21
break
else:
key = "A,"+str(np.sum(np.array(self.cards))-1)
action = policy[key][dealercard-1]
else:
score = self.calculatescore()
if score>=18:
break
else:
if self.flag_ace:
cardsum = np.sum(np.array(self.cards))
if cardsum<12:
key = 'A,'+str(cardsum-1)
else:
key = str(score)
else:
key = str(score)
action = policy[key][dealercard-1]
if (action=='D' or action=='D/S') and hitcount==0:
if simulate:
self.hit(0)
else:
self.hit()
score = self.calculatescore()
scale = 2.0
break
elif action=='S' or (action=='D/S' and hitcount>0):
score = self.calculatescore()
break
else:
if simulate:
self.hit(count)
else:
self.hit()
score = self.calculatescore()
if score>=18:
break
hitcount+=1
count += 1
return score, scale
然后我们就可以构建如下代码来模拟玩21点了,其中参数bet表示每局的下注量,games表示一共玩多少局,loops表示玩多少次,每次包括games这么多局。totalmoney表示玩家总的资金,debug参数开启后可以打印每一局玩家和庄家的要牌情况,代码如下:
bet = 100
games = 10
loops = 1
debug = True
money = []
for j in trange(loops):
totalmoney = 1000
for i in range(games):
if debug:
print('Game {}:'.format(i))
dealercard = randint(1,13)
if dealercard>10:
dealercard = 10
d = Dealer(dealercard)
card1 = randint(1,13)
scores = []
if card1>10:
card1 = 10
card2 = randint(1,13)
if card2>10:
card2 = 10
action = ''
if card1==card2 and card1!=10:
if card1==1:
action = 'P'
else:
key = str(card1)+','+str(card2)
action = complete_policy[key][dealercard-1]
if action=='P':
split_times = 2
while split_times>0:
card = randint(1,13)
card_count += 1
if card>10:
card=10
if card!=card1:
p_temp = Player(card1, card)
score, scale = p_temp.play(dealercard, complete_policy, False)
playercards = [str(a) for a in p_temp.cards]
if debug:
print('Player:'+','.join(playercards))
del p_temp
scores.append((score, scale))
split_times -= 1
else:
split_times += 1
else:
p = Player(card1, card2)
score, scale = p.play(dealercard, complete_policy, False)
scores.append((score, scale))
playercards = [str(a) for a in p.cards]
if debug:
print('Player:'+','.join(playercards))
del p
dealerscore = d.play()
dealercards = [str(a) for a in d.cards]
if debug:
print('Dealer:'+','.join(dealercards))
for item in scores:
score, scale = item
if score>21:
totalmoney -= bet
else:
if score==21 and scale==1.5:
if dealerscore==21 and len(d.cards)==2:
continue
else:
totalmoney += bet*1.5
else:
if dealerscore>21:
totalmoney += bet*scale
elif dealerscore==21 and len(d.cards)==2:
if score==21 and scale==1.5:
continue
else:
totalmoney -= bet*scale
else:
if score>dealerscore:
totalmoney += bet*scale
elif score
运行结果如下,玩家起始资金1000元,每次下注100元,玩10次,每次的过程以及最后资金量如下:
Game 0:
Player:10,5,10
Dealer:10,4,9
TotalMoney:900
Game 1:
Player:4,8
Dealer:4,10,10
TotalMoney:1000.0
Game 2:
Player:1,10
Dealer:2,4,9,8
TotalMoney:1150.0
Game 3:
Player:10,10
Dealer:4,9,6
TotalMoney:1250.0
Game 4:
Player:10,8
Dealer:5,10,10
TotalMoney:1350.0
Game 5:
Player:3,10
Dealer:2,1,8
TotalMoney:1250.0
Game 6:
Player:2,3,2,10
Dealer:7,9,4
TotalMoney:1150.0
Game 7:
Player:10,1
Dealer:8,10
TotalMoney:1300.0
Game 8:
Player:9,5
Dealer:3,6,6,5
TotalMoney:1200.0
Game 9:
Player:3,10
Dealer:4,10,10
TotalMoney:1300.0
[1300.0]
比较主流的一种策略如下,红色表示与我的策略不同的地方,括号内的是我的策略:
Player | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | A |
---|---|---|---|---|---|---|---|---|---|---|
8 | H | H | H | H | H(D) | H | H | H | H | H |
9 | H | D | D | D | D | H | H | H | H | H |
10 | D | D | D | D | D | D | D | D | H | H |
11 | D | D | D | D | D | D | D | D | D(H) | H |
12 | H | H(S) | S | S | S | H | H | H | H | H |
13 | S | S | S | S | S | H | H | H | H | H |
14 | S | S | S | S | S | H | H | H | H | H |
15 | S | S | S | S | S | H | H | H | H | H(S) |
16 | S | S | S | S | S | H | H | H | H(S) | H(S) |
17 | S | S | S | S | S | S | S | S | S | S |
18 | S | S | S | S | S | S | S | S | S | S |
19 | S | S | S | S | S | S | S | S | S | S |
A,2 | H | H | H | D | D | H | H | H | H | H |
A,3 | H | H | H | D | D | H | H | H | H | H |
A,4 | H | H | D | D | D | H | H | H | H | H |
A,5 | H | H(D) | D | D | D | H | H | H | H | H |
A,6 | H(D) | D | D | D | D | H | H | H | H | H |
A,7 | S(D/S) | D/S | D/S | D/S | D/S(D) | S | S | H | H | H |
A,8 | S | S | S | S | S(D/S) | S | S | S | S | S |
A,9 | S | S | S | S | S | S | S | S | S | S |
2,2 | P | P | P | P | P | P | H(P) | H | H | H |
3,3 | P | P | P | P | P | P | H(P) | H | H | H |
4,4 | H | H | H(P) | P | P | H | H | H | H | H |
5,5 | D | D | D | D | D | D | D | D | H | H |
6,6 | P | P | P | P | P | H(P) | H | H | H | H |
7,7 | P | P | P | P | P | P | H(P) | H | H | H |
8,8 | P | P | P | P | P | P | P | P | P | P(S) |
9,9 | P | P | P | P | P | S | P | P | S | S |
10,10 | S | S | S | S | S | S | S | S | S | S |
我们可以模拟实战一下,对比标准策略和我们得出的策略的表现。这里为了公平起见,比较不同策略的时候,玩家每一局里获取的牌在不同策略下是保持一致的,具体做法是,在每一局开始的时候,预先随机生成一些牌放在cardpool里面,然后每种策略都是从这个carpool来取牌。为此需要改造一下Player对象里面的hit方法,增加一个参数cardnum。
def hit(self, cardnum=-1):
if cardnum>=0:
card = self.cardpool[cardnum]
else:
card = randint(1,13)
if card>10:
card=10
if card==1:
self.flag_ace = True
self.cards.append(card)
之后写代码,模拟运行1000次,每次玩100局,每100局玩完之后计算两种策略的金额的大小来决定胜负:
bet = 100
games = 100
loops = 1000
debug = False
money = []
mypolicywins = 0
stdpolicywins = 0
policys = [standard_policy, complete_policy]
for j in trange(loops):
totalmoney = [100000,100000]
for i in range(games):
if debug:
print('Game {}:'.format(i))
dealercard = randint(1,13)
if dealercard>10:
dealercard = 10
d = Dealer(dealercard)
card1 = randint(1,13)
scores = [[],[]]
if card1>10:
card1 = 10
card2 = randint(1,13)
if card2>10:
card2 = 10
cardpool = []
splitpool = []
for cardpoolnum in range(200):
cardpool.append(randint(1,13))
for splitpoolnum in range(10):
splitpool.append([])
for a in range(20):
splitpool[-1].append(randint(1,13))
action = ''
for policyid in range(2):
if debug:
if policyid==0:
print("Standardpolicy")
else:
print("completepolicy")
if card1==card2 and card1!=10:
if card1==1:
action = 'P'
else:
key = str(card1)+','+str(card2)
action = policys[policyid][key][dealercard-1]
if action=='P':
split_times = 2
split_count = 0
card_count = 0
while split_times>0:
card = cardpool[card_count]
card_count += 1
if card>10:
card=10
if card!=card1:
p_temp = Player(card1, card)
p_temp.cardpool = splitpool[split_count]
split_count += 1
score, scale = p_temp.play(dealercard, policys[policyid], True)
playercards = [str(a) for a in p_temp.cards]
if debug:
print('Player:'+','.join(playercards))
del p_temp
scores[policyid].append((score, scale))
split_times -= 1
else:
split_times += 1
else:
p = Player(card1, card2)
p.cardpool = cardpool
score, scale = p.play(dealercard, policys[policyid], True)
scores[policyid].append((score, scale))
playercards = [str(a) for a in p.cards]
if debug:
print('Player:'+','.join(playercards))
del p
dealerscore = d.play()
dealercards = [str(a) for a in d.cards]
if debug:
print('Dealer:'+','.join(dealercards))
for policyid in range(2):
for item in scores[policyid]:
score, scale = item
if score>21:
totalmoney[policyid] -= bet
else:
if score==21 and scale==1.5:
if dealerscore==21 and len(d.cards)==2:
continue
else:
totalmoney[policyid] += bet*1.5
else:
if dealerscore>21:
totalmoney[policyid] += bet*scale
elif dealerscore==21 and len(d.cards)==2:
if score==21 and scale==1.5:
continue
else:
totalmoney[policyid] -= bet*scale
else:
if score>dealerscore:
totalmoney[policyid] += bet*scale
elif scoretotalmoney[1]:
stdpolicywins += 1
#print('Standard policy win')
print(mypolicywins)
print(stdpolicywins)
总共运行五次
第一次:我的策略赢了464次,标准策略赢了391次,145次打平
第二次:我的策略赢了457次,标准策略赢了417次,126次打平
第三次:我的策略赢了448次,标准策略赢了420次,132次打平
第四次:我的策略赢了435次,标准策略赢了437次,128次打平
第五次:我的策略赢了450次,标准策略赢了402次,148次打平
总体来看我的策略比标准策略有所改进,能更好的提高胜率
最后附上我整理好的21点最佳出牌策略