21点扑克游戏的出牌策略的研究

21点是一个非常流行的扑克游戏,规则很简单,计算手中的牌,如果超过21点则输,不到21点则比谁的牌大。其中JQK都当成10点计算,A可以计算为1点或11点。

在澳门的娱乐场里面,21点的规则如下,庄家先给每位玩家发一张牌以及给自己发一张牌,然后庄家继续给玩家发一张牌。这时玩家手里有两张牌,庄家只有一张牌。玩家需要根据自己的牌以及庄家的牌来作出决策,是继续要牌,还是停牌,如果两张牌一致还可以选择分牌。庄家根据玩家的决策来进行相应的操作,并且计算点数,如果玩家的牌超过21点则算玩家负。等所有玩家的操作都完成后,庄家再继续给自己派牌,如果庄家的牌不包含A并且点数不超过16点(也称之为硬16点),庄家需要继续补牌,如果含A并且点数不超过17点(软17点),庄家也需要补牌。最后比较庄家和玩家的牌的大小来决定胜负。除此之外还有一些特殊的规则,例如玩家在刚开始拿到2张牌的时候,可以选择加倍,这是只能再拿一张牌,如果赢了则拿到双倍的筹码。当玩家刚开始拿到的2张牌是A和10, J, Q, K的任何一张时,叫做Black Jack,如果庄家的牌不是A,10,J,Q,K时,需要立即赔付玩家1.5倍的筹码,如果庄家的牌是A,10,J,Q,K,玩家可以选择先赢1倍的筹码并离场,或者等待庄家开牌之后,如果庄家不是Black Jack则赢1.5倍的筹码,或者庄家是Black Jack则打平。另外当庄家第一张牌是除A之外的其他牌时,玩家可以选择投降拿回一半的筹码。

网上已经有很多关于21点的基本出牌策略的介绍,这些出牌策略应该都是基于美国一位大学教授用概率的方法研究出来的,这种研究方法需要对概率学有很深的了解。另外在人工智能里面,基于强化学习来模拟21点游戏进行研究也是一个很基础的话题。在这里我想用另一种方法来研究21点的出牌策略,即用蒙特卡洛的方法来模拟很多局21点游戏,通过计算每种结果的取值来得出一个出牌策略,根据大数定律,当模拟的局数足够多时,得出的结果就会很接近真实的概率了。

首先我们先建立一个庄家的对象,负责模拟庄家的操作,代码如下:

class Dealer:
    flag_ace = False
    firstcard = 0
    cards = []

    def __init__(self, card):
        if card>10:
            card=10
        self.firstcard = card

    def getcard(self):
        card = randint(1,13)
        if card>10:
            card=10
        if card==1:
            self.flag_ace = True
        self.cards.append(card)

    def calculatescore(self):
        score = np.sum(np.array(self.cards))
        if self.flag_ace:
            if score<=11:
                score += 10    #This means one ace can be treat as 11 not 1
        return score

    def play(self):
        self.cards = [self.firstcard]
        if self.firstcard==1:
            self.flag_ace = True
        else:
            self.flag_ace = False
        while True:
            self.getcard()
            score = self.calculatescore()
            if self.flag_ace and score>17:
                break
            if self.flag_ace==False and score>16:
                break
        return score

然后我们建立一个玩家的对象,代码如下:

class Player:
    cards = []
    flag_ace = False

    def __init__(self, card1, card2):
        self.cards = [card1, card2]
        if card1==1 or card2==1:
            self.flag_ace = True

    def hit(self):
        card = randint(1,13)
        if card>10:
            card=10
        if card==1:
            self.flag_ace = True
        self.cards.append(card)

    def calculatescore(self):
        score = np.sum(np.array(self.cards))
        if self.flag_ace:
            if score<=11:
                score += 10    #This means one ace can be treat as 11 not 1
        return score

要牌/停牌策略的研究

接下来我们就可以通过模拟玩21点来研究策略了。首先研究的是当玩家没有拿到2张一样的牌的时候(即没有分牌这种可能性),玩家需要采用哪种策略最优(要牌,加倍,停牌)。为了得到最优策略,我们可以通过迭代的方法来进行模拟。首先设定当玩家两张手牌的和为21点的时候,无论庄家手牌是多少点我们都需要采取停牌的策略(这是显而易见的),那么当玩家手牌为20点时,模拟分别要牌,加倍或停牌这三种情况(因为玩家在20点时只要加倍或要牌,必然会到达21点或以上,这样我们就知道玩家的下一个采取的策略),看哪种策略获取的收益最高。当玩家手牌为20点时的策略全部确定下来之后,继续探索手牌为19点时的策略,如此循环下去,最终我们就能得到玩家手牌从4点到19点的所有最优策略。代码如下:

from random import randint
import numpy as np
policy = {}
policy['21'] = []
policy['A,10'] = []
for i in range(10):
    policy['21'].append('S')
    policy['A,10'].append('S')
for card1 in range(10, 1, -1):
    for card2 in range(10, 0, -1):
        if card2==1:
            key = 'A,'+str(card1)
        else:
            key = str(card1+card2)
        if key in policy:
            continue
        else:
            policy[key] = []
        for dealercard in range(1, 11):
            player1_money=10000
            player2_money=10000
            player3_money=10000
            bet = 100
            games = 100000
            player1_policy = 'S'
            player2_policy = 'D'  
            player3_policy = 'H'

            for i in range(games):
                #p = Player(player_card1,player_card2)
                p = Player(card1, card2)
                d = Dealer(dealercard)

                score1 = p.calculatescore()   #For player1, stand
                p.hit()
                score2 = p.calculatescore() 
                score3 = p.calculatescore()   #For player2 double, player3 hit

                while True:
                    if score3>=20:
                        break
                    if p.flag_ace:
                        cardsum = np.sum(np.array(p.cards))
                        if cardsum-1<11:
                            k = 'A,'+str(cardsum-1)
                        else:
                            k = str(score3)
                    else:
                        k = str(score3)
                    if policy[k][dealercard-1]=='S' or policy[k][dealercard-1]=='D/S':
                        break
                    else:
                        p.hit()
                        score3 = p.calculatescore()

                dealerscore = d.play()

                if score2>21:
                    if dealerscore>21:
                        player1_money += bet
                    else:
                        if score1>dealerscore:
                            player1_money += bet
                        if score121:
                        player3_money -= bet
                        if dealerscore>21:
                            player1_money += bet
                            player2_money += 2*bet
                        else:
                            if score1>dealerscore:
                                player1_money += bet
                            if score1dealerscore:
                                player2_money += 2*bet
                            if score221:
                            player1_money += bet
                            player2_money += 2*bet
                            player3_money += bet
                        else:
                            if score1>dealerscore:
                                player1_money += bet
                            if score1dealerscore:
                                player2_money += 2*bet
                            if score2dealerscore:
                                player3_money += bet
                            if score3player3_money:
                    policy[key].append('D/S')
                else:
                    policy[key].append('D')
            if max_money==player3_money:
                policy[key].append('H')

保存这个出牌策略到CSV文件,然后用pandas dataframe来展现:

#调整policy中的Key的排序
keys = sorted(policy.keys())
sorted_keys = keys[12:18]
sorted_keys.extend(keys[:12])
sorted_keys.extend(keys[19:])
sorted_keys.append(keys[18])
with open('policy.csv', 'w') as f:
    policy_result = 'Player;2;3;4;5;6;7;8;9;10;A\n'
    for key in sorted_keys:
        policy_result += key+';'+';'.join(policy[key][1:])+';'+policy[key][0]+'\n'
    f.write(policy_result)
df_policy = pd.read_csv('policy.csv', header=0, index_col=0, sep=';')
df_policy.head(100)

策略如下,其中D表示加倍,H表示要牌,S表示停牌,D/S表示如果不能加倍则停牌。第一行表示庄家第一张牌的点数,第一列表示玩家两张牌的点数:

Player 2 3 4 5 6 7 8 9 10 A
4 H H H H H H H H H H
5 H H H H H H H H H H
6 H H H H H H H H H H
7 H H H H H H H H H H
8 H H H H D H H H H H
9 H D D D D H H H H H
10 D D D D D D D D H H
11 D D D D D D D D H H
12 H S S S S H H H H H
13 S S S S S H H H H H
14 S S S S S H H H H H
15 S S S S S H H H H S
16 S S S S S H H H S S
17 S S S S S S S S S S
18 S S S S S S S S S S
19 S S S S S S S S S S
20 S S S S S S S S S S
21 S S S S S S S S S S
A,2 H H H D D H H H H H
A,3 H H H D D H H H H H
A,4 H H D D D H H H H H
A,5 H D D D D H H H H H
A,6 D D D D D H H H H H
A,7 D/S D/S D/S D/S D S     S     H     H     H    
A,8 S S S S D/S S S S S S
A,9 S S S S S S S S S S
A,10 S S S S S S S S S S

分牌策略的研究

接下来我们需要继续补充当玩家头两张牌是一样的策略,是需要分牌,还是停牌,要牌等等。这个可以基于刚才得出的策略的基础上来进一步模拟。如以下代码:

split_result = {}
for i in trange(2,10):
    split_result[str(i)+","+str(i)]=[]
    for c in trange(1, 11):   #Dealer card 2-10, A
        lose_counts = 0
        win_counts = 0
        draw_counts = 0
        split_gain = 0
        nonsplit_gain = 0
        for j in range(steps):
            for k in range(3):     #k-0,1, Split card, k-2, not split
                d = Dealer(c)
                while True:
                    secondcard = randint(1,13)
                    if secondcard>10:
                        secondcard = 10
                    if secondcard!=i:
                        break
                if k==2:
                    secondcard = i
                p = Player(i, secondcard)
                hit_count = 0
                player_lose = False
                doublescaler = 1
                while True:
                    score = p.calculatescore()
                    if score>18 and score<=21:
                        break
                    if score>21:
                        player_lose = True
                        break
                    if p.flag_ace:
                        cardsum = np.sum(np.array(p.cards))
                        if cardsum-1<11:
                            key = 'A,'+str(cardsum-1)
                        else:
                            key = str(score)
                    else:
                        key = str(score)
                    action = policy[key][c-1]
                    if hit_count==0: 
                        if action=='D' or action=='D/S':
                            doublescaler = 2
                    if (hit_count==0 and action=='D/S') or action=='D' or action=='H':
                        p.hit()
                        hit_count += 1
                    else:
                        score = p.calculatescore()
                        if score>21:
                            lose_counts += 1*doublescaler
                            if k<2:
                                split_gain -= 1*doublescaler
                            else:
                                nonsplit_gain -= 1*doublescaler
                            player_lose = True
                            break
                        else:
                            player_lose = False
                            break
                if player_lose==False:
                    dealerscore = d.play()
                    if dealerscore>21:
                        win_counts += 1*doublescaler
                        if k<2:
                            split_gain += 1*doublescaler
                        else:
                            nonsplit_gain += 1*doublescaler
                    else:
                        if scoredealerscore:
                            win_counts += 1*doublescaler
                            if k<2:
                                split_gain += 1*doublescaler
                            else:
                                nonsplit_gain += 1*doublescaler
                        else:
                            draw_counts += 1
                else:
                    if k<2:
                        split_gain -= 1*doublescaler
                    else:
                        nonsplit_gain -= 1*doublescaler
                del d, p
        split_result[str(i)+","+str(i)].append([split_gain,nonsplit_gain])   

split_policy = {}
for key in split_result:
    split_policy[key] = []
    for i in range(10):
        if split_result[key][i][0]

分牌的策略如下:

Player 2 3 4 5 6 7 8 9 10 A
2,2 P P P P P P P H H H
3,3 P P P P P P P H H H
4,4 H H P P P H H H H H
5,5 D D D D D D D D H H
6,6 P P P P P P H H H H
7,7 P P P P P P P H H H
8,8 P P P P P P P P P S
9,9 P P P P P S P P S S

评估策略的效果

现在有了策略之后,我们就可以模拟运行多局21点游戏,看看在实际表现如何。

首先把以上得到的两个策略合并为一个完整的策略,如以下的代码:

with open('policy.csv', 'r') as f:
    lines = f.readlines()
with open('split_policy.csv', 'r') as f:
    lines.extend(f.readlines()[1:])
complete_policy = {}
for l in lines[1:]:
    a = l.strip().split(';')
    complete_policy[a[0]] = [a[-1]]
    complete_policy[a[0]].extend(a[1:])

其次需要在原有的player对象中增添一个play方法,使得可以基于刚才得到的策略来进行要牌停牌的操作,代码如下,其中参数dealercard表示庄家的手牌,policy是一个字典,代表玩家要采取的策略,simulate表示是否模拟(这个参数在之后比较不同策略的时候会用到)

def play(self, dealercard, policy, simulate=False):
    card1 = self.cards[0]
    card2 = self.cards[1]
    hitcount = 0
    scale = 1.0
    count = 0
    while True:
        if self.flag_ace and len(self.cards)==2:
            if max(card1, card2)==10:
                scale=1.5
                score=21
                break
            else:
                key = "A,"+str(np.sum(np.array(self.cards))-1)
                action = policy[key][dealercard-1]
        else:
            score = self.calculatescore()
            if score>=18:
                break
            else:
                if self.flag_ace:
                    cardsum = np.sum(np.array(self.cards))
                    if cardsum<12:
                        key = 'A,'+str(cardsum-1)
                    else:
                        key = str(score)
                else:
                    key = str(score)
                action = policy[key][dealercard-1]
        if (action=='D' or action=='D/S') and hitcount==0:
            if simulate:
                self.hit(0)
            else:
                self.hit()
            score = self.calculatescore()
            scale = 2.0
            break
        elif action=='S' or (action=='D/S' and hitcount>0):
            score = self.calculatescore()
            break
        else:
            if simulate:
                self.hit(count)
            else:
                self.hit()
            score = self.calculatescore()
            if score>=18:
                break
            hitcount+=1
            count += 1
    return score, scale

然后我们就可以构建如下代码来模拟玩21点了,其中参数bet表示每局的下注量,games表示一共玩多少局,loops表示玩多少次,每次包括games这么多局。totalmoney表示玩家总的资金,debug参数开启后可以打印每一局玩家和庄家的要牌情况,代码如下:

bet = 100
games = 10
loops = 1
debug = True
money = []
for j in trange(loops):
    totalmoney = 1000
    for i in range(games):
        if debug:
            print('Game {}:'.format(i))
        dealercard = randint(1,13)
        if dealercard>10:
            dealercard = 10
        d = Dealer(dealercard)

        card1 = randint(1,13)
        scores = []
        if card1>10:
            card1 = 10
        card2 = randint(1,13)
        if card2>10:
            card2 = 10
        action = ''

        if card1==card2 and card1!=10:
            if card1==1:
                action = 'P'
            else:
                key = str(card1)+','+str(card2)
                action = complete_policy[key][dealercard-1]
        if action=='P':
            split_times = 2
            while split_times>0:
                card = randint(1,13)
                card_count += 1
                if card>10:
                    card=10
                if card!=card1:
                    p_temp = Player(card1, card)
                    score, scale = p_temp.play(dealercard, complete_policy, False)
                    playercards = [str(a) for a in p_temp.cards]
                    if debug:
                        print('Player:'+','.join(playercards))
                    del p_temp
                    scores.append((score, scale))
                    split_times -= 1
                else:
                    split_times += 1
        else:
            p = Player(card1, card2)
            score, scale = p.play(dealercard, complete_policy, False)
            scores.append((score, scale))
            playercards = [str(a) for a in p.cards]
            if debug:
                print('Player:'+','.join(playercards))
            del p

        dealerscore = d.play()
        dealercards = [str(a) for a in d.cards]
        if debug:
            print('Dealer:'+','.join(dealercards))

        for item in scores:
            score, scale = item
            if score>21:
                totalmoney -= bet
            else:
                if score==21 and scale==1.5:
                    if dealerscore==21 and len(d.cards)==2:
                        continue
                    else:
                        totalmoney += bet*1.5
                else:
                    if dealerscore>21:
                        totalmoney += bet*scale
                    elif dealerscore==21 and len(d.cards)==2:
                        if score==21 and scale==1.5:
                            continue
                        else:
                            totalmoney -= bet*scale
                    else:
                        if score>dealerscore:
                            totalmoney += bet*scale
                        elif score

运行结果如下,玩家起始资金1000元,每次下注100元,玩10次,每次的过程以及最后资金量如下:

Game 0:
Player:10,5,10
Dealer:10,4,9
TotalMoney:900
Game 1:
Player:4,8
Dealer:4,10,10
TotalMoney:1000.0
Game 2:
Player:1,10
Dealer:2,4,9,8
TotalMoney:1150.0
Game 3:
Player:10,10
Dealer:4,9,6
TotalMoney:1250.0
Game 4:
Player:10,8
Dealer:5,10,10
TotalMoney:1350.0
Game 5:
Player:3,10
Dealer:2,1,8
TotalMoney:1250.0
Game 6:
Player:2,3,2,10
Dealer:7,9,4
TotalMoney:1150.0
Game 7:
Player:10,1
Dealer:8,10
TotalMoney:1300.0
Game 8:
Player:9,5
Dealer:3,6,6,5
TotalMoney:1200.0
Game 9:
Player:3,10
Dealer:4,10,10
TotalMoney:1300.0
[1300.0]

和标准策略的对比

比较主流的一种策略如下,红色表示与我的策略不同的地方,括号内的是我的策略:

Player 2 3 4 5 6 7 8 9 10 A
8 H H H H H(D) H H H H H
9 H D D D D H H H H H
10 D D D D D D D D H H
11 D D D D D D D D D(H) H
12 H H(S) S S S H H H H H
13 S S S S S H H H H H
14 S S S S S H H H H H
15 S S S S S H H H H H(S)
16 S S S S S H H H H(S) H(S)
17 S S S S S S S S S S
18 S S S S S S S S S S
19 S S S S S S S S S S
A,2 H H H D D H H H H H
A,3 H H H D D H H H H H
A,4 H H D D D H H H H H
A,5 H H(D) D D D H H H H H
A,6 H(D) D D D D H H H H H
A,7 S(D/S) D/S D/S D/S D/S(D) S S H H H
A,8 S S S S S(D/S) S S S S S
A,9 S S S S S S S S S S
2,2 P P P P P P H(P) H H H
3,3 P P P P P P H(P) H H H
4,4 H H H(P) P P H H H H H
5,5 D D D D D D D D H H
6,6 P P P P P H(P) H H H H
7,7 P P P P P P H(P) H H H
8,8 P P P P P P P P P P(S)
9,9 P P P P P S P P S S
10,10 S S S S S S S S S S

我们可以模拟实战一下,对比标准策略和我们得出的策略的表现。这里为了公平起见,比较不同策略的时候,玩家每一局里获取的牌在不同策略下是保持一致的,具体做法是,在每一局开始的时候,预先随机生成一些牌放在cardpool里面,然后每种策略都是从这个carpool来取牌。为此需要改造一下Player对象里面的hit方法,增加一个参数cardnum。

def hit(self, cardnum=-1):
    if cardnum>=0:
        card = self.cardpool[cardnum]
    else:
        card = randint(1,13)
    if card>10:
        card=10
    if card==1:
        self.flag_ace = True
    self.cards.append(card)

之后写代码,模拟运行1000次,每次玩100局,每100局玩完之后计算两种策略的金额的大小来决定胜负:

bet = 100
games = 100
loops = 1000
debug = False
money = []
mypolicywins = 0
stdpolicywins = 0
policys = [standard_policy, complete_policy]
for j in trange(loops):
    totalmoney = [100000,100000]
    for i in range(games):
        if debug:
            print('Game {}:'.format(i))
        dealercard = randint(1,13)
        if dealercard>10:
            dealercard = 10
        d = Dealer(dealercard)

        card1 = randint(1,13)
        scores = [[],[]]
        if card1>10:
            card1 = 10
        card2 = randint(1,13)
        if card2>10:
            card2 = 10
        cardpool = []
        splitpool = []
        for cardpoolnum in range(200):
            cardpool.append(randint(1,13))
        for splitpoolnum in range(10):
            splitpool.append([])
            for a in range(20):
                splitpool[-1].append(randint(1,13))
        action = ''
        for policyid in range(2):
            if debug:
                if policyid==0:
                    print("Standardpolicy")
                else:
                    print("completepolicy")
            if card1==card2 and card1!=10:
                if card1==1:
                    action = 'P'
                else:
                    key = str(card1)+','+str(card2)
                    action = policys[policyid][key][dealercard-1]
            if action=='P':
                split_times = 2
                split_count = 0
                card_count = 0
                while split_times>0:
                    card = cardpool[card_count]
                    card_count += 1
                    if card>10:
                        card=10
                    if card!=card1:
                        p_temp = Player(card1, card)
                        p_temp.cardpool = splitpool[split_count]
                        split_count += 1
                        score, scale = p_temp.play(dealercard, policys[policyid], True)
                        playercards = [str(a) for a in p_temp.cards]
                        if debug:
                            print('Player:'+','.join(playercards))
                        del p_temp
                        scores[policyid].append((score, scale))
                        split_times -= 1
                    else:
                        split_times += 1
            else:
                p = Player(card1, card2)
                p.cardpool = cardpool
                score, scale = p.play(dealercard, policys[policyid], True)
                scores[policyid].append((score, scale))
                playercards = [str(a) for a in p.cards]
                if debug:
                    print('Player:'+','.join(playercards))
                del p

        dealerscore = d.play()
        dealercards = [str(a) for a in d.cards]
        if debug:
            print('Dealer:'+','.join(dealercards))

        for policyid in range(2):
            for item in scores[policyid]:
                score, scale = item
                if score>21:
                    totalmoney[policyid] -= bet
                else:
                    if score==21 and scale==1.5:
                        if dealerscore==21 and len(d.cards)==2:
                            continue
                        else:
                            totalmoney[policyid] += bet*1.5
                    else:
                        if dealerscore>21:
                            totalmoney[policyid] += bet*scale
                        elif dealerscore==21 and len(d.cards)==2:
                            if score==21 and scale==1.5:
                                continue
                            else:
                                totalmoney[policyid] -= bet*scale
                        else:
                            if score>dealerscore:
                                totalmoney[policyid] += bet*scale
                            elif scoretotalmoney[1]:
        stdpolicywins += 1
        #print('Standard policy win')
    
print(mypolicywins)
print(stdpolicywins)

总共运行五次

第一次:我的策略赢了464次,标准策略赢了391次,145次打平

第二次:我的策略赢了457次,标准策略赢了417次,126次打平

第三次:我的策略赢了448次,标准策略赢了420次,132次打平

第四次:我的策略赢了435次,标准策略赢了437次,128次打平

第五次:我的策略赢了450次,标准策略赢了402次,148次打平

总体来看我的策略比标准策略有所改进,能更好的提高胜率

最后附上我整理好的21点最佳出牌策略

21点扑克游戏的出牌策略的研究_第1张图片

你可能感兴趣的:(Python编程,python)