《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5

备注1:每个视频的英文字幕,都翻译成中文,太消耗时间了,为了加快学习进度,我将暂停这个工作,仅对英文字幕做少量注释。
备注2:将.flv视频文件与Subtitles文件夹中的.srt字幕文件放到同1个文件夹中,然后在迅雷看看中打开播放,即可自动加载字幕。

Dealing with Uncertainty Through Probability

你可以学到什么:

  • 概率:小猪游戏。
  • 最大化期望效用以优化策略

Lesson 5

视频链接:
Lesson 5 - Udacity

Course Syllabus

Lesson 5: Dealing with Uncertainty Through Probability

  • Lesson 5 Course Notes(主要是课程视频对应的英文字幕的网页。)
  • Lesson 5 Code
  • Homework 5 Notes

1. Welcome Back

Hey, welcome back. Now, as we’ve said, this class is all about managing complexity.

Now many types of software manage complexity by trying to artificially(人为地;人工地) rule out(排除…可能性) any type of uncertainty. That is, say(比方说;宣称;假设) you have a checkbook-balancing(checkbook 支票簿) program, and it says you’ve got to enter(输入) the exact(准确的;精确的) amount(数量). You’ve got to say $39.27. You can’t say, oh I don’t know about $40.

It’s easier to write programs that deal that way, but it constrains(constrain 强迫;限制;约束) what you can do. So, in this unit we’re going to learn about how the laws of probability(the laws of probability 概率论) can allow you to deal with uncertainty in your programs. Now, the truly amazing thing is that you can allow uncertainty and what you know about the world, or what’s true right now and uncertainty in your actions, if the program does something, what happens next? Even though both of those are uncertain you can still use the laws of probability to calculate what it means to do the right thing. That is, we can have clarity(清楚;清晰度) of action. We can know exactly what the best thing to do is even though we’re uncertain about what’s going to happen. So follow with this unit, and we’ll learn how to do that.

2. Porcine Probability

porcine 猪的,似猪的

This unit is about probability, which is a tool for dealing with uncertainty. Once you understand probability, you’ll be able to tackle(着手处理) a much broader range of problems than you could with programs that don’t understand probability.

Often when we have problems with uncertainty, we’re dealing with search problems. Recall(回想;召回), in a search problem, we are in a current state. There are other states that we can transition(转换,转变) into, and we’re trying to achieve some goal, but we can’t do it all in one step. We have to paste together a sequence(序列;顺序) of steps. In doing that, we’re building up a search frontier(边界,边缘) that we’re continuing to explore from.

Now, uncertainty can come into play in two ways.

(1) We can be uncertain about the current state. Rather than knowing exactly where we are, it may be that we start off in one of four possible states and all we know is that we’re somewhere in there, but we’re not sure exactly where we are.

(2) The other place uncertainty can come in is when we apply an action, say this action here–action A–it may be that we don’t get to one specific(明确的;具体的) state but, rather, we’re uncertain as to what the action will do, and we might end up in this state or this state or this state instead of the one that we were aiming at.

And so we’ll see techniques for dealing with both of these types of uncertainty.

Now, one place where people are used to dealing with uncertainty is in playing games that employ(使用,利用) dice(骰子). And that’s what we’re going to deal with. In particular, we’re going to play a dice game which is called Pig. I don’t know why the game is called Pig. I can guarantee no porcine creatures(creature 生物,动物) were harmed in the creation(产物;创造物,产物) of this unit.

Here’s how the game works:

There are two players, although you could play with more. The players take turns, and on his turn a player has the option to roll the dice–a single die(die 骰子)–as often as he wants or to hold–to stop rolling. And the object(目标) of the game is to score a certain number of points. We’re going to say 50 points; 100 is more common, but 50 will be easier on the Udacity servers in terms of the amount of computation required.

And so it’s my turn, and we have a score. So here’s a scoreboard; we’ll have players with the imaginative(虚假的;富于想象力的) names of player 0 and player 1. And the score starts off 0 to 0. Now there’s another part of the scoreboard that is not part of the player’s score. We’ll call that the pending(待定的;未定的;未决的;即将发生的) score.

Let’s say it’s my turn. I pick up the die, I roll it, and let’s say I get a 5. Then 5 goes into the pending score, but I don’t score any points yet.

Now it’s my turn again. Do I roll or do I hold–stop rolling? Let’s say I want to roll again. This time I get a 2, so I add 2 to the pending score; I get 7. Let’s say I roll again. I’m lucky. I get a 6. I add 6 to the pending; I get 13. And I’m going great(go great 做得很好), so I roll again, and this time I get a 1. And a 1 is special. A 1 is called a pig out, and when you roll a pig out it means you lose all the pending points, and for your hand you score not this total in pending, but just the 1. So my score would be just the 1.

Now the other player, player number 1, goes. Let’s say player number 1 says, “I’m going to roll,” gets a 3. “I’m going to roll again,” gets a 4. “I’m going to roll again,” gets a 5. So now we have 12 in the pending, and now player number 1 says, “I think I’ve had enough; I’m going to hold,” and that means we take these points from the pending, the 12 points, put them up on the board for player 1’s score. And now player 1’s turn ends, and it’s player 0’s turn.

So your turn continues until you either hold or pig out, and your score for the turn is the sum of your rolls, if you didn’t pig out, if you decided to hold, and the score is just 1 if you pigged out. And you keep on taking turns until somebody reaches the target–here, 50. So that’s how the game of Pig works. Now let’s go to try to describe the game in a form that we can program.

3. The State of Pig

So as usual, we’re going to make an inventory of concepts in the game.

This time I’m going to try to break things out a little bit, and I’m going to talk about low-level concepts, high-level concepts, and mid-level concepts.

As we saw in the discussion forums there’s always a question of where do you want to start. Do you want to describe the low level first and build up from there? Do you want to describe the high level first and build down? I think for this case we’ll take more of a middle out approach.

So, at the mid level there’s the concept of current state of the game. We’re sort of(可以说,可说是) inching(缓动;微动) towards a search problem, and we know that we have to represent states for a search problem. So, we want to know the current state of the game.

If we’re thinking of search problems then we also have to know about actions we can take. We know that there are two actions: Roll and hold. So, here’s some candidates(报考者;申请求职者) for what’s in the current state.

First, the things that were on the scoreboard. The scoreboard, remember, had three things.

Then the player whose turn it is, we might want that to be part of the state. The previous roll of the dice, whether I just rolled a five or something else, that might be part of the state. The previous turn score, how much did the other player just make on their turn?

So, all of these are possibilities. You might be able to think of other possibilities. I want you to tell me which one of these are necessary to describe the state of the game. I guess I should say here that we’re assuming that the goal of the game, the number of points you need win, we’re assuming that’s constant(常量,常数) and doesn’t need to be represented in each individual state. We just represent it once for the whole game.

Which of these are necessary for the current state?

the things on the scoreboard(计分板上的东西):
 · score 0(0号玩家的分数)
 · score 1(1号玩家的分数)
 · pending(待定的分数)

the player whose turn it is(玩家的顺序):
 · player

the previous roll of dice(掷骰子游戏中,之前掷骰子的结果)
 · previous roll

the previous turn score(其他玩家在他们的顺序上,分数多少)
 · previous turn score

(我的答案:
我认为,在表示当前的状态中,前4个概念是必须的。
previous roll、previous turn score这2个概念,或者说分数,一定在pending上,或者已经计入了计分板上的玩家的分数当中。)

3. The State of Pig Solution

Well, we certainly have to know the score(score 0score 1). We have to know how much is pending, because that’s going to affect the score. We have to know what player is playing.

Now these things, what happened before, they might be interesting, but they don’t really help us to find the current state. So those are unnecessary.

STATE       -- (p, me, you, pending)
ACTIONS     -- roll/hold  

So, the state’s going to end up being something like a four tuple. I’ve written it as p, me, you, pending, the player to move, that player’s score, the other player’s score, and the pending score that hasn’t been reaped(reap 获得;得到) yet.

4. Concept Inventory

LOW LEVEL

 · DIE        the roll of a die:
 · SCORES     the implementation of scores:
 · PLAYERS    the implementation of the players:
 · TO MOVE    the player to move:
 · goal       the goal:

At the low level–I count as(当作) low-level things like the roll of a die, the implementation(实现) of scores, the implementation of the players and of the player to move, the goal–so these are all things that we’re going to have to represent.

HIGH LEVEL

 · play-pig    a function play-pig that plays a game between two players:
 · strategy    a strategy that a player is taking in order to play the game:

And then at the high level, I’m going to have a function play-pig, that plays a game between two players, and I have the notion(概念) of a strategy(策略)–a strategy that a player is taking in order to play the game.

Now let’s think about how to implement these things, and when I’m doing the implementation, I’m going to move top-down. So I started sort of(可以说,可说是) middle-out saying(say 假设;比方说) these are the kinds of things I think I’m going to need; now I have a good enough feel for them that I feel confident in moving top-down. I don’t see any difficulties in implementing any of these pieces.

If I start at the top, then I’ll be able to make choices later on without feeling constrained. If I thought there was something down here that was difficult to deal with, I might spend more time now, at the low level, trying to resolve(分解) what the right representation is for one of these difficult pieces, and that would inform(通知;预示) my high-level decisions. But since I don’t see any difficulty, I’m going to jump to the high level.

HIGH LEVEL

 · play-pig    fun(A, B) --> A
 · strategy    fn(state) --> action

Now, what’s play-pig? Well, I think that’s going to be a function, and let’s just say that its input is two players, A and B, and we haven’t decided yet how we’re going to represent those. And its output is–let’s say it’s going to be the winner of the game.

Maybe A is the winner. And we’ll have to make a choice of how we represent these players.

Now what’s a strategy? Well, a strategy–people sometimes use the word “policy”(policy 策略) for that. We can also represent that as a function. And it takes as input a state, and it returns an action or a move in the game.

MIDDLE LEVEL

 · STATE        (p, me, you, pending)
 · ACTIONS      'roll'/'hold'
     roll(state) --> state
     hold(state) --> state
         roll(state) --> {states}
             roll(state, d) --> state

In this game we said that the actions are roll and hold. We’re starting to move down. Let’s just say now how are we going to represent these actions? Well, we can call the actions just by strings, so we use the strings “roll” and “hold” and that could be what the strategy returns. But then we’ll also need something that implements these actions, so we’ll have to have something that’s a function that says–let’s say– the function “roll” takes a state and returns a new state; function “hold” takes a state and returns a new state.

But that doesn’t seem quite right; there’s a problem here. What about the die? That seems to take and effect that roll by itself is not a function from state to state.

Rather, roll–if we wanted to specify it–would be a function from a state to a set of states, and that represents the fundamental(基础的,基本的,根本的,重要的,原始的,主要的) uncertainty.

That’s why we need probability to deal with it. That’s why we have an uncertain or a nondeterministic(非确定的) domain is because an action doesn’t have a single result; rather, it can have a set of possible results.

And, in some cases it makes sense to go ahead and implement these actions as functions that look like that, that return sets of states. And I considered that as a possibility, but I ended up with an implementation where I talk about the different possibilities for the dice.

So the dice can come up as D, one of the numbers 1 to 6, and now roll, from a particular state with the particular die roll, that is going to return a single state rather than a set of states. And I just think it’s easier to deal this way, although in other applications you might want to deal that way.

LOW LEVEL
 · DIE -- int
 · SCORES -- int
 · goal -- int
 · TO MOVE -- 0, 1        the player to move
 · PLAYERS -- fn          strategy function

(这里我存在一点疑问:使用策略代表players?不是很理解)
Now the rest seems to be pretty easy. The die can be represented as an integer. Scores can be represented as integers. Likewise(likewise 同样地) the goal. The player to move–we can represent that as an index, 0 or 1, into an array of players. And the players themselves? Well, the simplest way to do it is just to represent the player by their strategy. The strategy is a function, and that could represent the player. We could have something more complex, but it seems like we don’t need anything more than that. So players will be strategy functions.

5. Hold and Roll

Now you’re probably itching(渴望的) to write some code by now–so let’s get started.

What I want you to do first is write these two action functions, hold and roll, which take a state and return a state.

Here the state that results from holding. Here the state that results from rolling and getting a d. A state is represented by this four tuple of p, the player. It’s either zero or one. The subsequent(后来的;随后的) state would remain the same if the player continues and would swap(交换) between one and the other otherwise. Me and you, two integers indicating the score, the score of the player to play and the score of the other player, and then pending, which is score accumulated(累积的) so far but not yet put onto the scoreboard.

Go ahead and write those functions.

# -----------------
# User Instructions
# instruction 指令;课程
# 
# Write the two action functions, hold and roll. Each should take a
# state as input, apply the appropriate action, and return a new
# appropriate 适当的;合适的
# state. 
#
# States are represented as a tuple of (p, me, you, pending) where
# p:       an int, 0 or 1, indicating which player's turn it is.
# me:      an int, the player-to-move's current score
# you:     an int, the other player's current score.
# pending: an int, the number of points accumulated on current turn,
# not yet scored
# accumulated 累积的

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    # reap 获得;得到
    # your code here

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    accumulated 累积的
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    # your code here

def test():
    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)
    return 'tests pass'

print test()

(谈下我自己的思考过程吧:
我先回忆了之前的视频【Porcine Probability】中,对游戏规则的讲解,没有提到两个玩家会交换分数。
但是,test()中的用例,对于me和you两个玩家,应该是交换了分数。
再看本段视频中,又提到了swap。
再看看用例,再次理解游戏规则:
如果一方玩家选择了hold,即不掷骰子,那么turn顺序就必须轮到另一方玩家;
如果一方玩家选择了roll, 即 掷骰子,那么turn顺序就可能还是自己;
补充说明1:如果一方玩家roll的情况,掷骰子掷出了1,那么必须轮到另一方玩家;
补充说明2:对于补充说明1的情况,还有补充,pending清零,1加到掷骰子的玩家。

我准备先写hold(state)函数,因为这个函数稍微简单些。
先观察测试用例:

    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)

捋一捋它的逻辑:
如果state[0] == 1,那么,将state[1]与pending相加,并与state[2]交换,返回
(0, state[2], state[1]+pending, 0);
如果state[0] == 0,那么,将state[1]与pending相加,并与state[2]交换,返回
(1, state[2], state[1]+pending, 0);

也就是说,玩家选择hold不掷骰子时,必定是要交换turn的。

如下:

def hold(state):
    if state[0] == 1:
        return (0, state[2], state[1]+pending, 0)
    elif state[0] == 0:
        return (1, state[2], state[1]+pending, 0)

简化:

def hold(state):
    return (0, state[2], state[1]+state[3], 0) if state[0]==1 else (1, state[2], state[1]+state[3], 0)

再来写roll函数。
先观察测试用例:

    #      roll(state,          d)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)

捋一捋它的逻辑:
如果d等于1,则必须交换turn;
这种情况下,则pending清零,将state[1]与1相加,将相加之和与state[2]交换,即返回
(另一个玩家, state[2], state[1]+1, 0)
如果d不等于1,则不交换turn,还是自己掷骰子;
这种情况下,则pending与d相加,其他不变。

如下:

def roll(state, d):
    if d == 1:
        return (int(not p), state[2], state[1]+1, 0)
    elif d == 0:
        return (state[0], state[1], state[2], pending+d)

简化:

def roll(state, d):
    return (int(not state[0]), state[2], state[1]+1, 0) if d == 1 else (state[0], state[1], state[2], state[3]+d)

)

(先上我的完整的代码,测试通过了:

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    # your code here
    return (0, state[2], state[1]+state[3], 0) if state[0]==1 else (1, state[2], state[1]+state[3], 0)

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    # your code here
    return (int(not state[0]), state[2], state[1]+1, 0) if d == 1 else (state[0], state[1], state[2], state[3]+d)

def test():    
    assert hold((1, 10, 20, 7))    == (0, 20, 17, 0)
    assert hold((0, 5, 15, 10))    == (1, 15, 15, 0)
    assert roll((1, 10, 20, 7), 1) == (0, 20, 11, 0)
    assert roll((0, 5, 15, 10), 5) == (0, 5, 15, 15)
    return 'tests pass'

print test()

)

5. Hold and Roll

Peter的代码:

def hold(state):
    (p, me, you, pending) = state
    return [other[p], you, me+pending, 0]

def roll(state, d):
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0)
    else:
        return (p, me, you, pending+d)

other = {1:0, 0:1}

Here’s my solution:

So, I have my state–I just broke it up into pieces so that I know what I’m talking about. Then if I hold it becomes the other player’s turn. The other player’s score is the same as it was before.

So now remember the second place is the score of the player whose turn it is. So, that was you previously, and then the score that I got–I just add in the pending. I reap(获得;得到) all of those, and the pending gets reset to zero. When I roll, again let’s figure out what’s in the state, if the roll is one that’s a pig out, it becomes the other player’s turn, and I only got one lousy point. Pending gets reset to zero.

If the roll is not a one then it’s still my turn. I don’t change my score so far, but I add d onto the pending.

Here’s just a way to map from one player to the other. If the other player, if it was one it becomes zero. If it was zero it becomes one. It’s always a great idea to write some test cases.

Now, one comment on style. Right here I’m taking this state, which is a tuple that has four components, and I’m breaking it up like this into it’s four components. When you have four components that’s probably okay, but it’s getting to worry me a little bit that maybe I won’t be able to remember which part of the state is which. If I had more than four, if I had five or six components, I really start to worry about that.

So there are other possibilities where we can be more explicit(明确的,清楚的) about what the state is rather than just have it be a set of undifferentiated(无差别的,一致的) elements of a tuple that we then define like this. We can define it ahead of time.

6. l Named Tuples

视频下方的补充材料——开始

Interested in learning more about namedtuples? Check out the python documentation.

视频下方的补充材料——结束

Now here’s an alternative.

Instead of just defining a state by just creating a tuple and then getting at the fields of a state by doing an assignment, we can use something called a namedtuple that gives a name to the tuple itself as well as to the individual elements.

We can define a new data type called state and use capitalized letters for data types. Say state is equal to a namedtuple, and the name of the data type is state, and the fields of the data type are p, me, you, and pending.

state = (1, 2, 3, 4)
(p, me, you, pending) = state

from collection import namedtuple
State = namedtuple('state', 'p me you pending')

s = State(1, 2, 3, 4)
s.p  --> 1
s.me --> 2

So I can just go ahead and make that assertion.

Namedtuples is in a module. So, from collections import namedtuple gives me access to it. Now I can say s = state (1,2,3,4), and I can ask for the components of s by name. How would I choose between this representation for states and the normal tuple representation?

Well the namedtuple had a couple of advantages. It’s explicit(明确的,清楚的) about the types. It helps you catch errors. So if you ask for the p field of something that’s not a state that would give you an error. Whereas if you just broke up something that was four elements into these components that would work even if it didn’t happen to be a proper state.

There are a few negatives as well. It’s a little bit more verbose(冗长的,累赘的), although not so much, and it may be unfamiliar to some programmers. It may take them a while to understand what namedtuples mean.

I should say we could also do the same type of thing by defining a class. That has all the same positives, and it’s certainly familiar to most Python programmers, but it would be even more verbose.

def hold(state):
    return State(other[state.p], state.you, state.me+state.pending, 0)

def roll(roll, d):
    if d == 1:
        return State(other[state.p], state.you, state.me+1, 0)
    else:
        return State(state.p, state.me, state.you, state.pending+d)

Here’s what hold and roll look like in this new notation. So, hold–where we’re explicitly creating a new state. We look at the state.p, the state.you, the state.me, and the state.pending and so on, similarly for roll. They look fairly similar.

You notice the lines are a little bit longer in terms of we’re being more explicit. So, it takes a little bit more to say that. I’m sort of up in the air(悬而未决) whether this representation is better than the previous representation with tuples. I could go either way.

7. Clueless

clueless 无能的,笨的

import random

possible_moves = ['roll', 'hold']

def clueless(state):
    ## your code here

Now I’m going to talk about strategies for a minute. Remember a strategy is a function, which takes a state as input, and it’s output is one of the action names, roll or hold.

I want you to write a strategy function, which we’re calling clueless. So its a function that takes a state as input, and it’s going to return one of the possible moves, roll or hold. It does that by ignoring the state and just choosing one of the possible moves at random.

So go ahead and write that.

# -----------------
# User Instructions
# 
# Write a strategy function, clueless, that ignores the state and
# chooses at random from the possible moves (it should either 
# return 'roll' or 'hold'). Take a look at the random library for 
# helpful functions.

import random

possible_moves = ['roll', 'hold']

def clueless(state):
    "A strategy that ignores the state and chooses at random from possible moves."
    # your code here

(我的答案。
先谈下我的思路吧,要干的事情是,在possible_moves = ['roll', 'hold']中随机选择1个,由于有2个moves,考虑把问题分为两步:
1. 生成(1, 2)的随机数;
2. 构造一个字典类型的映射,1、2分别映射到'roll'hold
代码:

import random

possible_moves = ['roll', 'hold']

def clueless(state):
        int_to_move = {1: 'roll', 2: 'hold'}
        return int_to_move[random.randint(1, 2)]

print clueless(1)

)

7. Clueless Solution

Peter的代码:

import random

possible_moves = ['roll', 'hold']

def clueless(state):
    return random.choice(possible_moves)

Here’s my solution: I gave you the hint of importing the random module. I just call it the random choice function, which takes a set of possible moves and picks one at random.

8. Hold At Strategy

视频下方的补充材料——开始

Oops – it doesn’t really make sense to have “goal = 50” in the test() function; that introduces a local variable, when what we really want is a global variable that hold_at can refer to. Place “goal = 50” at the top level of the program.

视频下方的补充材料——结束

Now I want to describe a family of strategies that I’m calling `hold_at(n)`, where n is an integer. For example, hold at 20 is a strategy that keeps on rolling(roll 滚翻。这里意指“掷骰子”) until the pending score is greater than or equal to 20, and then it holds. The point of this strategy is you get points by rolling, but you risk points by rolling as well. The higher the pending score is, the more you’re risking. So there should be some point(some point 某个时候) at which you’re saying that’s too much of a risk. I’ve accumulated(accumulate 堆积,积累) so much pending that I don’t want to risk any more, and then I’m going to hold. So hold at 10, hold at 20, hold at 30 describes that family of strategies. I should say there’s one subtlety(细微的差别;精妙) here that we’ll build in to hold at, which is let’s say that the goal is 50, and my score when I start my round(一回合) is say 40(say 假设;比方说). Then let’s say I roll a 6 and a 4. According to hold at 20 I should keep on rolling because my pending score is only 10. I haven’t gotten up to 20 yet, but it would be silly(蠢的;糊涂的) for me to keep rolling at that point. I would risk pigging out and only scoring one point and getting to 41. Whereas(然而) I know if I hold now I have 40 + 6 + 4 is 50. I’ve already won the game. So, hold at 20 will hold when pending is greater than or equal to 20, or when you win by holding.
def hold_at(x):
    def strategy(state):
        ## your code here
    strategy.__name__ = 'hold_at(%d)' %x
    return strategy

## state = (p, me, you, pending)
So, I want you to go ahead and implement that. Since hold at x is a whole family of strategy of functions, hold at x is not going to be a strategy function. Rather(相反地), it’s going to return a strategy function. So I’ve given you this outline(要点;轮廓) of saying we’re going to define a strategy function, then we’re going to fix up its name a little bit to describe it better. Then we’re going to return it. You have to write the code within the strategy function. I should say, we’re going to stick with(继续做;跟着…) the representations of states, where state is a four tuple of the player to move, zero one, me and you score, and the pending score. iff 当且仅当 Peter留的题目:
# -----------------
# User Instructions
# 
# In this problem, you will complete the code for the hold_at(x) 
# function. This function returns a strategy function (note that 
# hold_at is NOT the strategy function itself). The returned 
# strategy should hold if and only if pending >= x or if the 
# player has reached the goal.

def hold_at(x):
    """Return a strategy that holds if and only if 
    pending >= x or player reaches goal."""
    def strategy(state):
        # your code here
    strategy.__name__ = 'hold_at(%d)' % x
    return strategy

goal = 50
def test():
    assert hold_at(30)((1, 29, 15, 20)) == 'roll'
    assert hold_at(30)((1, 29, 15, 21)) == 'hold'
    assert hold_at(15)((0, 2, 30, 10))  == 'roll'
    assert hold_at(15)((0, 2, 30, 15))  == 'hold'
    return 'tests pass'

print test()
(我的答案:
def hold_at(x):
    def strategy(state):
        p, me, you, pending = state
        if pending >= x:
            return 'hold'
        elif me + pending >= goal:
            return 'hold'
        else:
            return 'roll'
    strategy.__name__ = 'hold_at(%d)' % x
    return strategy
这里的逻辑其实很简单,把视频多看几遍,测试用例分析几次,即可加强理解。 简化的代码:
def hold_at(x):
    def strategy(state):
        p, me, you, pending = state
        if (pending >= x) or (me + pending >= goal):
            return 'hold'
        else:
            return 'roll'
    strategy.__name__ = 'hold_at(%d)' % x
    return strategy
)

8. Hold At Strategy Solution

Here’s my solution: I break up this data to its components. Then I hold if the pending score is greater than or equal to the x, or if I’ve already won if my current score plus the pending score is already greater than or equal to the goal. Otherwise, I return roll as my move.
def hold_at(x):
    """Return a strategy that holds if and only if
    pending >= x or player reaches goal."""
    def strategy(state):
        p, me ,you, pending = state

        return 'hold' if ((pending >= x) or (pending+me >= goal )) else 'roll'

    strategy.__name__ = 'hold_at(%d)' % x
    return strategy

9. Play Pig

视频下方的补充材料——开始

('always_hold':即使当pending点数是0时,也不roll掷骰子,所以总是0分。)
Notice that the functions 'always_hold' and 'always_roll' might not be doing exactly what you think they do. 'always_hold' holds even before the first roll. That is, when it has zero pending points, it refuses to roll, and therefore(因此;所以) it always scores zero points. Not very clever, but that is the way it goes.

('always_roll'函数是非常蠢的。即使在幸运的情况下,pending score比goal大,还是会继续roll掷骰子,直到pig out的情况。)
'always_roll' on the other hand, is almost as foolish. Even if it is lucky and has a pending score that is greater than the goal, it continues to roll until it pigs out.

视频下方的补充材料——结束

Now let’s talk about the design of the function, Play Pig, which plays a single game of Pig.

We decided that this is a two player game, player “A” and “B,” and we decided that we’re going to represent this as a function. At some point in the future we might want to allow multiplayer games with more than two players, but we’re not going to want to worry about that for now.

So let’s make a list of what the function has to do.

  • It has to keep(保持;保留) score
  • It needs the score for player “A” and for player “B” and for pending.
  • It has to take turns. It has to figure out whose turn it is, and then that turn keeps going until they hold or pig out.

Another way to say that is, the score for “A,” the score for “B,” the pending, and whose turn it is– all of that is managing the current state.

It has to call the strategy functions, so “A” and “B” are going to be strategy functions that we pass in. It has to keep track of the current state, pass that state to the strategy function for the appropriate player whose turn it is, and then that will give back an action, either roll or hold. Then it has to do the action, the roll or hold, and that will generate a new state and we have to keep track of the state we’re in.

But there’s one more trick here–when we were doing a normal search, that was it. We had to figure out what the actions were. Apply the action. When you get to the next state, there’s a single successor for each action. But here there’s multiple successors for an action. And so we have to do one more thing, which is roll the die, and that disambiguates(disambiguate 消除…的歧义) the action of rolling and makes it not generate a set of possible states, but the action plus the die–that generates a single state.

I want you to write the function “Play Pig,” which takes two strategies as input, plays the game, keeping track of what’s going on– rolling the dice as necessary, updating the state, and then when one player wins, we turn that player either “A” or “B.” One thing I note is–I don’t have any tests here. The reason is it’s hard to test this. It’s hard to write a deterministic(确定性的) test because part of playing the game is rolling the die, and that won’t be the same every time. We’ll talk in a bit about how to test programs like this.

Peter的题目:

# -----------------
# User Instructions
# 
# Write a function, play_pig, that takes two strategy functions as input,
# plays a game of pig between the two strategies, and returns the winning
# strategy. Enter your code at line 41.
#
# You may want to borrow from the random module to help generate die rolls.

import random

possible_moves = ['roll', 'hold']
other = {1:0, 0:1}
goal = 50

def clueless(state):
    "A strategy that ignores the state and chooses at random from possible moves."
    return random.choice(possible_moves)

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    (p, me, you, pending) = state
    return (other[p], you, me+pending, 0)

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0) # pig out; other player's turn
    else:
        return (p, me, you, pending+d)  # accumulate die roll in pending

def play_pig(A, B):
    """Play a game of pig between two players, represented by their strategies.
    Each time through the main loop we ask the current player for one decision,
    which must be 'hold' or 'roll', and we update the state accordingly.
    When one player's score exceeds(超过) the goal, return that player."""
    # your code here


def always_roll(state):
    return 'roll'

def always_hold(state):
    return 'hold'

def test():
    for _ in range(10):
        winner = play_pig(always_hold, always_roll)
        assert winner.__name__ == 'always_roll'
    return 'tests pass'

print test()

(
我的分析:
视频中提到,play_pig函数应该保留玩家A、B、pending的分数,这也就是说,该函数的返回值中,应当有玩家A、B、pending的分数。
play_pig函数需要take turns,需要弄清楚是轮到了谁,并且,直到hold或者pig out,turn才不keeps going。
上两句话,玩家A、B的分数、pending的分数,轮到了谁的顺序turn,就是管理当前的状态current state。

需要调用strategy函数,所以A、B必须是strategy函数。它必须跟踪当前的状态,把那个状态传递给适当的轮次turn的玩家的strategy函数,然后,会give back 1个action,roll或hold。

do action。执行action,roll或者hold,生成一个新的state,并且,我们需要跟踪这个state。(未完)
当你到达下一个state时,对于每一个action,会有单个的successor。但是这里,对于1个action,会有多个successors。

roll die。所以,需要多做一件事情,roll the die,消除the action of rolling的歧义,使它不会生成可能的states的1个集合,do action与roll die相加,生成1个单个的state。

把视频和代码中的注释看了几遍了,只写出了两版代码草稿,而且显然是不能测试通过的:

def play_pig(A, B):

    while max(A, B) < goal:
        if A(state) == 'roll':
            roll(state, d)
        elif B(state) == 'roll':
            pass
        else:
            hold(something)

    return (player A or player B)中的较大者

我大概说下,关于上面的类伪代码,我的思路。
两个玩家A、B显然是要不断地roll掷骰子、hold不掷骰子的。如果要A、B中的任意一个要掷骰子,那么就掷骰子呗roll(state, d);如果不掷骰子,那就是hold(something)。如果两个玩家中的分数较大者超过了goal(max(A, B) < goal不为真),则停止游戏,即跳出循环,返回(player A or player B)中的较大者

显然是不够的,状态state没有保持,于是有了下面的一版代码:

p, me, you, pending = state
'''
?, 0,  0,   0       = 
'''

while max(me, you) < goal:
    if A(state) == 'roll':
        state = roll(state, d)
    elif B(state) == 'roll':
        state = roll(state, d)
    else:
        state = hold(state)

return me

利用state = roll(state, d)之类的语句保留状态。
还有,me、you、pending的初始值为0,p的初始值是多少?(后面Peter的代码中,p的初始值为0。)

在这里花了太久的时间,仍然没有写出完整正确的代码,还是看看Peter的答案吧。
)

9. Play Pig Solution

Here’s my solution.

I put the strategies into a list because we’re going to be indexing into that.

I start out in the start state. Nothing has happened. No points awarded. Player number 0, that is “Player A,” is the player to move.

Then I repeat this loop.

Tell me everything I know about the current state.

If the score of the player whose turn it is, is greater than the goal, then that player wins. “Player 0” or “Player 1”–“A” or “B.”

Same if the other player is greater than the goal, then that player wins and, otherwise, I pick out the strategy function for the player to play.

If “P” is “0,” then strategy “0” is “A.”

If “P” is “1,” then strategy “1” is “B.”

Apply that strategy function to the state and if it’s hold, I apply the hold action to the state to get a new state.

Otherwise, I assume that the strategy function is legal, and if it doesn’t return hold, then it does return roll.

I’ll give it the benefit of the doubt there, and perform the roll action on the state and on a random integer from 1 to 6, inclusive(包括的).

That will give me the new state, and I continue until we find a winner.

def play_pig(A, B):
    strategies = [A,B]
    state = (0,0,0,0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        elif stategies[p](state) == 'hold':
            state = hold(state)
        else:
            state = roll(state, random.randint(1,6)))

(继续谈下我的理解:
Peter的代码中,d来自于random.randint(1,6)随机生成。
)

10. Dependency Injection(依赖注入)

dependency 从属,从属物

injection 注射

视频下方的补充材料——开始

Read more about dependency injection.

视频下方的补充材料——结束

(上面的代码中的最后一行state = roll(state, random.randint(1,6))),有不确定的组件,如果要测试的话,如何测试?
我谈下我的想法,在参数列表中,加入一个参数,代替这个随机变量。

看看Peter的具体解决办法吧。
)
Now, the question is, how can I test a function like this, that includes this nondeterministic(非确定的) component(成分;组分)?

One thing we want to be able to do is inject(注射;添加;投入) into here some deterministic numbers to say this is the sequence of “die rolls” I want to give you and then, from that, then I can check if it’s doing the right thing.

This is an example of a concept called Dependency Injection, which has a rather scary(可怕的,吓人的) and intimidating-sounding(intimidating 恐吓,威胁) name, but it’s actually a pretty simple idea. The idea is we’ve got a function like this, it’s a big complicated function and way down somewhere inside, there’s something that we want to affect, something we want to monitor or track or change. Dependency Injection says this function depends on this random number generator, so let’s be able to inject that.

How do we inject something into a function? Well, we just add it as an argument. So let’s add in the argument here, and let’s call it “die rolls” and say, that’s going to be a sequence or an iterable that will generate possible “die rolls. In the normal case, that will just be random numbers exactly like it was before. We don’t care what they are, but when we want to test the function we can inject the “die rolls” that we want.

We can just pass in a list saying what happens if the “die rolls” are a 6 and a 1 and then a 3 and a 5, and so on. Tell me what happens. So here’s my implementation of the Dependency Injected Play Pig. I still have the regular arguments “A” and “B.”

def dierolls():
    "Generate die rolls."
    while True:
        yield random.ranint(1, 6)

def play_pig(A, B, dierolls=dierolls()):
    strategies = [A, B]
    state = (0, 0, 0, 0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        elif strategies[p](state) == 'hold':
            state = hold(state)
        else:
            state = roll(state, next(dierolls))

There’s an optional argument. If I leave that out it should behave exactly like it did before. But if I specify it, then I have control over it. So “die rolls” should be an iterable that generates rolls. Here we go down and we ask for the next one out of those rolls and get it back. By default, “die rolls” just says we’re going to generate an infinite sequence of random integers from 1 to 6.

Oops, I think I misspoke(讲错) there. I think I said that “die rolls” have to be an iterable. Actually, what it has to be is an iterator such as a generator expression or something else, in order for it to have the next apply to it.

11. Loading the Dice

So now, with this play pig, with the dependancy injection, with the goal being 50, here’s a test that I can actually run. So “A” and “B” are going to be my two contestants(contestant 竞争者). “A” is hold at 50, which is equivalent to saying never hold until you win. “B” is the clueless function, the one that acts at random, and rolls is going to be an iterator of some list of numbers, maybe 1, 2, 3 or whatever you want, but I want you to write in there the list which is the shortest possible list, or one of the shortest possible lists that allows “A” to win and then you can check Play a Game of Pig between “A” and “B” with these rolls and make sure that “A” wins.

# -----------------
# User Instructions
# 
# Modify the rolls variable in the test() function so that it 
# contains the fewest number of valid rolls that will cause
# the hold_at(50) strategy to win. Enter your rolls at line 63

import random

goal = 50
possible_moves = ['roll', 'hold']
other = {1:0, 0:1}

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    (p, me, you, pending) = state
    return (other[p], you, me+pending, 0)

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0) # pig out; other player's turn
    else:
        return (p, me, you, pending+d)  # accumulate die roll in pending

def clueless(state):
    "A strategy that ignores the state and chooses at random from possible moves."
    return random.choice(possible_moves)

def hold_at(x):
    """Return a strategy that holds if and only if 
    pending >= x or player reaches goal."""
    def strategy(state):
        (p, me, you, pending) = state
        return 'hold' if (pending >= x or me + pending >= goal) else 'roll'
    strategy.__name__ = 'hold_at(%d)' % x
    return strategy

def dierolls():
    "Generate die rolls."
    while True:
        yield random.randint(1, 6)

def play_pig(A, B, dierolls=dierolls()):
    """Play a game of pig between two players, represented by their strategies.
    Each time through the main loop we ask the current player for one decision,
    which must be 'hold' or 'roll', and we update the state accordingly.
    When one player's score exceeds the goal, return that player."""
    strategies = [A, B]
    state = (0, 0, 0, 0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        elif strategies[p](state) == 'hold':
            state = hold(state)
        else:
            state = roll(state, next(dierolls))

def test():
    A, B = hold_at(50), clueless
    rolls = iter([]) # <-- Your rolls here
    assert play_pig(A, B, rolls) == A
    return 'test passes'

print test()

(谈下我的理解吧,如果要确保总是”A”赢,那么让只需要A总是摇到6即可,直到连续9次,9*6等于54,大于50,A赢得比赛、并且比赛结束。rolls = iter([])填入9个6。即

rolls = iter([6, 6, 6, 6, 6, 6, 6, 6, 6,])

)

11. Loading the Dice Solution

Here’s my answer. I’ve rolled eight 6s. That gives me 48 points, and then a 2–that gives me 50–and that allows “A” to win. There are other sequences that are of the same length, but none that are shorter.

Peter的答案:

def test():
    A, B = hold_at(50), clueless
    rolls = iter([6,6,6,6,6,6,6,6,2]) # <-- Your rolls here
    assert play_pig(A, B, rolls) == A
    return 'test passes'

12. Optimizing Strategy

So we’ve seen several different strategies and we’ve compared them and tried to find one that was better, and we could keep doing that, trying to improve and make a strategy better and better, but what if we could make a leap?(leap 跳跃,飞跃)

Instead of incrementally(逐渐地) coming up with a slightly better strategy, would it be possible to leap(跳过,跃过) to the best strategy? To make it sound more mathematical(数学的;精确的), we can call it the optimal(最佳的,最优的) strategy. Can we do that and what would it even mean?

On the surface it’s not exactly clear. When we did search, we didn’t know what our first action was. We started out in some state and we knew there were several different states we could go to, and from there, there were other states we could go to. All we knew is that we were trying to arrive at some goal location. But we knew if we just specified(specify 指定;详述) how the domain works, how you get from one state to the next, and if we specified an algorithm that found the best path, the shortest path, or the least cost path, that eventually, by applying that algorithm to that description of the world, we would arrive at the best possible solution.

So maybe we can do the same thing here. Even though we’re dealing with uncertainty, maybe we can still define what the world looks like and discover the optimal solution. So when we were doing search, it was always sort of(可以说,可说是) one agent(代理人;特工) doing the searching, so let’s call that “me,” and what am I looking for? Am I looking for the best path or the worst path? Well, obviously, we’re looking for the best path and we can describe that and once we’ve got that description we’ve got to search it outward(向外;在外).

Now we’ve gone beyond search in two ways. The most obvious is we’re dealing with probability, so we’ve got dice or whatever other random element there is, and then in addition to(in addition to 除…之外) that, for the big game, we introduced another complication(错杂,纠纷;混乱), which is our opponent(对手;反对者;敌手).

And now this question of what each of these three are trying to do, and I want you to tell me, is our opponent trying to get the best, and that means best score for “me,” or is the opponent trying to get the worst score for “me,” assuming we’re diametrically(直接地) opposed(oppose 反对,抗争).

So the worst score for “me” would be the best score for the opponent, or is the opponent trying to come up with the outcome(结果;成果;出路) that is average? And tell me the same for the dice. Is the dice with “me” in trying to get the best result for “me?” Is the dice plotting(plot 把…分成小块;制图) against “me” in trying to get the worst result for “me?” Or is the dice going to average out?(average out 求出…的平均数;达到…的平均数)

Go ahead and click the appropriate boxes there.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第1张图片

(这里,谈下我的思考:
把视频看了几遍了,终于开窍了,这个题目其实不难。
me的best,显然,对手就是worst。
而dice掷骰子,是随机的,显然就是平均的average。
)

12. Optimizing Strategy Solution

And the answer is, in the game of pig the opponent(对手) is trying to defeat(击败,战胜) “me,” so they want the worst for “me.”

The dice has no intentionality(意向性). Everything is equal in terms of outcome for the dice, so that works out to average.

So now we have a way of describing the world. When we start out, it’s “me” and I have options I can take– roll or hold–and I go in one direction and I get to a point where it’s the dice’s turn to roll, and that can have six outcomes.

Rather than trying to choose the best, we’ll just average over all of them. Let’s say there’s one here, and now it’s my opponents turn to move and my opponent makes a choice, and let’s say ends up here. And I look at all these paths through that keep on going until they get to the end of the game. And then if I say, if I always choose the best, and if my opponent always chooses the worst for “me,” and if the dice average out, then I can describe all the paths to the end, and I can describe the value of those paths.

13. Utility

utility 功用,效用;有用的物体或器械;工具;实用程序

UTILITY(state)  -->  number

Now in economics(经济学;经济) and in game theory, the value of a state is called its utility. It’s just a number. So we’re at the end of the game, and if there’s 1 state where we win, we’ll give that a utility of 1. If there’s another state where we lose, we’ll give that a utility of 0.

Now if I have a choice here–it’s my turn to move–I have a choice to go either way. I’m going to maximize my choice, and I’m going to move there. That means the utility of this state is going to be 1 because I know I can get 1 by taking the optimal strategy. We keep backing up(back up 往后退;支持;复制;堵塞) the tree that way. That’s if it was my choice here. If it was my opponent’s choice, and they could go in either way, then my opponent is going to minimize my score or maximize their score and go in this direction, forcing(促使;强迫) me to lose and allowing them to win. So we’ll say the utility of this state is 0 for me.

QUALITY(state, action)  -->  number

And I want to also introduce here another idea called the quality, which is the function on a state and an action and gives us a number–a utility number. So that’s saying, what’s the quality of this action in this particular state?

utility(state)number The value of a state. At the end of the game a winning state has a utility of 1 and a losing state has a utility of zero. If the game has neither been won nor lost the utility of the state will be between 1 and zero.

quality(state, action) The value of a state and action combined.

Example: state A has two actions ‘hold’ and ‘roll’ which lead to the states ‘B’ and ‘C’ with consequent utilities of 1 and zero.

A -> 'hold' -> B (utility 1). So here the quality (A, 'hold') has a utility of 1

A -> 'roll' -> C (utility 0). And here the quality (A, 'roll') has a utility of 0

So if these were the actions, hold and roll, then we’d say for my opponent the quality of rolling from this state would give us this utility, and the quality of holding would give us this utility.

Finally, if it was the dice’s turn–and let’s say there are 6 outcomes(outcome 结果;成果;出路), but 3 of them lead to this state and 3 of them lead to this state– then from the dice we’re going to average(计算…的平均值) over all the possibilities, so it’s half of one and half of 0, so the utility of this state is 1/2.

So now we have a way–if we know the value of the end states, which is defined by the game, defined by when we win and lose and how many points each player gets for that– now we have a way of essentially(本质上,根本上;本来) searching backwards(向后;往后) to say, from the end state, I can go backwards and say, what’s the value of one of my moves?

Oh! I know that.

It’s the maximum.

What’s the value of my opponents?

I know that. It’s the minimum.

What’s the value of a random dice roll?

I know that. It’s the average.

Now we could go all the way back, backing up(back up 往后退;复制;支持) the tree to say, what’s the value of this start state? I can collect those values, and I can have a utility for this start state, and we’ll see–in the game of pig–the start state for the first player has a little bit better than 50% chance just because they go first.

For the game we defined, I think it works out to about 54%– .54 utility for the player who goes first.

We can also work out(解决;作出) the quality for each of these moves. We can work out, what’s the quality of rolling from this state, and what’s the quality of holding from this state, and then the optimal strategy is just the one that says choose the move which has the highest quality. If roll has the highest quality from this state, then that’s the move that we should do.

So just as we did in search, we’re finding our way from the start to the goal. We can do that with a random problems, but we have to find a way not to just 1 individual outcome, but to all the outcomes that were covered by all the possibility of the dice. So the complications(complication 混乱,纠纷;错杂) going to be more complex, but the idea is the same.

(对于本段视频,我反复看了好几遍,才看的稍微有点明白,UTILITY的含义,效用,姑且理解为概率吧。。)

14. Game Theory

Now when you have a decision under uncertainty(不确定;不确定的事物) and there’s an opponent(对手;敌手), it’s called game theory. If there’s no opponent, it’s usually called decision under uncertainty(decision under uncertainty 不确定型决策). There is other names.

So let’s look at an example of that first before we get back to the game of pig. Here’s the decision I’m going to give you. You’re on a game show, and you won, and you get a prize of 1 million dollars or euros or whatever currency(货币) you want to use. Now you’re given a decision. You can keep the $1 million, or the host will flip(轻弹,轻击) a coin(flip a coin 抛硬币(决定)) and you believe it to be a fair coin, and if you call it correctly you get $3 million, but if you get it wrong, you get nothing. So you analyze the outcome(结果) of this and you believe that this is a choice by the coin that has a 50% probability of each outcome, and you want to say, what should I do? Should I keep the million or should I go for the 3 million?

What I’m going to do is code up a model for this, and then let the decision theory decide.

million = 1000000

def Q(state, action, U):
    "The expected value of taking action in state, according to utility U."
    if action == 'hold':
        return U(state + 1*million)
    if action == 'gamble':
        return U(state + 3*million) * .5 + U(state) * .5

First, I just define a variable million because it’s hard to see the number of 0’s and count correctly. Now traditionally(传统上), utility is used with the abbreviation(缩写) U and quality with abbreviation Q. So I’m going to define here a quality function that says, given a state and an action, what’s my–and given utility, what state is worth to me that’s going to tell me the value of that state action pair? And the actions available to me are holding and gambling(gamble 赌博).

def actions(state): return ['hold', 'gamble']

Let’s go ahead and make that explicit(明确的,清楚的). So in any state, the actions available are holding and gambling, where we’re only going to deal with 1 state, but we make this perfectly general. And the state that I start with is, however many dollars I have in my pocket– could be anything. And given that state, if I hold, my state is going to be increased by $1 million, and then there’s some utility on that–how much do I value having what I have now plus 1 million. And if I gamble, there’s a 50% chance that I get 3 million more than I have now. There’s some utility for that. And a 50% chance that I get nothing more than I have now, and some utility for that. So that describes the quality of the state, but only describes it if I have a utility function. I have to know how much do I like money?

def identity(x): return x

U = identity

Well, the simplest choice for utility function is the identity function(identity 同一性;恒等;身份). Say(假设;比方说) the identity function just takes any input x and returns x. It’s the input itself, and so we could say, if I start with nothing, the value of the state of having nothing is 0, and the value of the state of having a million is a million.

def best_action(state):
    "Return the optimal action for a state, given U."
    def EU(action): return Q(state, action, U)
    return max(actions(state), key=EU)

Now here’s–the amazing thing is, I can just write out what the optimal strategy is, what the best action is for this state, and what it’s going to be is the maximum over all the possible actions from the state, that was just hold and gamble, maximized by EU, which stands for Expected Utility(期望效用,期望效益). Expected meaning average(期望的意思是平均). So what’s the average utility of each of the actions, and I’ve defined the average utility as the quality of that state, given that state action pair under the utility function? And that means that the Q had to deal with the averaging, and it did that. It said, 50% this, 50% that. That’s the value of gambling.

def best_action(state, actions, Q, U):
    def EU(action): return Q(state, action, U)
    return max(actions(state), key=EU)

Now this best_action function solves this particular problem. But the amazing thing is is that we can completely generalize this, so if we just add in parameters, now we’re saying what’s the best_action in a particular state if you tell me what the available actions are, what the quality of each state action pair is, and what the utility is over states, then I can tell you what the best_action is. That works for any possible domain that you can define. It’s an amazing thing that we solved all the problems at once. Similarly to the way in search where we had 1 best_search algorithm that could solve all of the search problems. Now it doesn’t mean that we’re done, and we never have to program anything again because programming can be difficult. There’s some problems that don’t fit into this type of formulation(构想,规划;配方), and there are many, many problems which you can describe, but which you can’t solve in a feasible(可行的;可用的) amount(数量;总额) of time. So we haven’t solved everything, but it is amazing how much we can do with just this 1 short function.

best_action(100, actions, Q, identity)
'gamble'

Let’s go ahead and solve it. Let’s solve this problem, and let’s say I start off with $100, what’s my best_action? Then when I run that, it tells me the best_action is gamble. Now that doesn’t sound quite right to me.

If you are faced with that problem, assuming you had $100 to your name. Would you take the gamble–try to go for the 3 million, or would you hold with 1 million? And there’s no right or wrong answer to this despite(不管) what the interface has to do. It has to tell you one answer is right or wrong, but you can ignore that. I just want to collect some data on how many people think that they would gamble in that situation and how many people think they would hold.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第2张图片

14. Game Theory Solution

So I predict(预测) that most people say they would hold, and why is that? Well, under the identity function, sort of(可以说,可说是) the arithmetic(算术,计算;算法) function, $3 million is 3 times better than $1 million, and so half of $3 million is 1.5 times better than $1 million. So the gamble is more. But that’s only true if $3 million really is 3 times better than $1 million. For most people, that’s not true. That going from $100 to a $1 million is a big, big jump. Going from $1 million to $3 million is a smaller jump than that, even though arithmetically(算术上) it’s more, in terms of what you can do, it seems like less, and that doesn’t mean that people are irrational(不合理的;无理性的) in any way. Instead what it means is that for people, the value of money is not a linear(线性的) function. Rather it’s something more like a logarithmic(对数的) function, meaning if you have a certain amount of money, if you double that money, you don’t get twice as much value out of having that money, rather you just get 1 increment more of having that money.

import math
best_action(100, actions, Q, math.log)
'hold'

So let’s try again. I’m going to input the math module, and now I’m going to ask, what’s the best_action starting from $100 in my pocket, but valuing money with logarithmic function rather than with the identity or linear function. Now my best_action tells me that what I should be doing is holding. That corresponds to my intuition(直觉;直觉力). That that’s the right thing.

best_action(10*million, actions, Q, math.log)
'gamble'

I can also ask, well, what if I had $10 million already, then would I take the bet(打赌,赌博,赌注), assuming my value of money is still logarithmic, and best_action tells me that yes, I should. If I have $10 million, now I’m starting to look at money as more closely linear again. I’m at this stage where logarithmic function is approximately(近似地,大约) linear locally(在附近). If I’ve got $10 million, I could say, yeah, I’m risking my $1 million, but that’s no big deal. I’ve already got $10. It’s a good bet because if I win, I get 3 or 0–that’s 1.5 on average, and that’s more than 1, so I’m willing to take that bet, and I don’t mind not gaining the additional $1 million.

15. Break Even Point

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第3张图片

So now I want to ask you a question. So given this quality function Q, and assuming that our utility function is the log function, since we saw that for some values of this state the best action– the action with the highest quality is ‘hold’ and other values is ‘gamble,’ there must be a point at which there is a crossover(杂交,交换)– where the two values are approximately(大约,近似地) equal.

So what I want you to tell me is: to the nearest million, what is that crossover point C? That value of state– that value of the amount of money that I currently have to say that I’m indifferent(中立的;漠不关心的;无关紧要的) between the 2– that if I have C dollars, then my quality for gambling is the same or approximately the same as my value for holding. What value of C to the nearest million is that true of? And just to make it easier for those who can’t divide(除以) by e in your head, we’ll use the log10 logarithms rather than the natural logarithms so that the log10 logarithm of a million is 6. Tell me what the crossover point is to the nearest million.

15. Break Even Point Solution

And the answer is crossover C is 1 million, and if I ask for the quality of C gambling versus hold with the log10 utility function– and that’s a two-value tuple– and we see, in fact, they’re the same.

这里写图片描述

16. Whats your Crossover

Now I want to gather(收集;采集) a little bit more data, and here, again, there is no right or wrong answer even though the interface may tell you that your answer was right or wrong, I just want to do this sort of as a sociological(社会学的) experiment to see where people are. So just tell me what your crossover point is. Again, there is no right or wrong answer, but what’s the number of dollars– or euros, if you prefer that–at which you’d be indifferent(漠不关心的;无关紧要的;中立的) between accepting this gamble and holding? And put that in as an integer number, not the number of millions. So, if your crossover point is 1 million, write 1 million here, don’t write 1.

17. Optimal Pig

optimal 最佳的,最优的

Now let’s get to work on defining an optimal pig strategy. So we need a Q and a U function, and an actions function. Let’s get started on that.

## Optimal Pig

def Q_pig(state, action, Pwin):
    "The expected value of choosing action in state."
    if action == 'hold':
        return 1 - Pwin(hold(state))
    if action == 'roll':
        return (1 - Pwin(roll(state, 1))
                + sum(Pwin(roll(state, d)) for d in (2,3,4,5,6))) / 6.
    raise ValueError

The Q function we’ll call Q_pig. It takes a state and an action and evaluates the quality of that against the utility function. And what should we use for the utility function? Well, I think the best thing to use is the probability of winning(Pwin,赢的概率) because we get 1 point for winning and no points for losing. That’s a good way of thinking about the game. And so our expected outcome is going to be somewhere between 0 and 1, and that’s like a probability. And so the probability of winning is our score. If we win all of the time, the probability of scoring– of winning is 1, then that should be worth 1 point. If we win none of the time, that should be worth 0 points. So our utility is just the probability of winning. And what is that? Well, if we hold, then we’re turning it over to our opponent, and we still have our hold and roll functions that tell us what state we get to when we hold. And then it’s our opponent’s, and he’s going to do his best, so our utility would be 1 minus the opponent’s utility– what he can do best, his probability of winning because either one player or the other has to win So our probability of winning is 1 minus our opponent’s probability of winning. If we roll, it’s more complicated. If we roll a 1, then we pig out, and it’s 1 minus our opponent’s probability of winning because it’s his turn. And otherwise(否则;另外;不然), we just take all the possibilities for the other die rolls, and it’s our probability of winning from the result of rolling in that state, and that’s six probabilities all together. So we have to average them. So we add them all up, and divide by 6. And if the action wasn’t hold or roll, I’m going to raise the ValueError.

def pig_actions(state):
    "The legal actions from a state."
    _, _, _, pending = state
    return ['roll', 'hold'] if pending else ['roll']

What are the actions in this state? Well, if there’s some pending numbers, I can roll or hold, and if they’re not, I’m just going to roll. That’s the only thing that makes sense to do.

18. Pwin

Now what’s the probability of winning from a state? It seems complicated. It seems like we’ve got a lot of work to do, but actually, we’ve almost solved the whole thing. All we have to do is say, “What’s the end point?”

So remember, we start out in the start position, and then we have some end positions where the game is over, and we have to assign utilities, which is the same as probability of winning, which is either 0 or 1. So this is a losing state, so it gets a Pwin of 0. This is a winning state. It gets Pwin of 1. We assign all of those, and then all the other states that depend on these– we’ve already figured that out in terms of the Q function.

@memo
def Pwin(state):
    (p, me, you, pending) = state
    if me + pending >= goal:
        return 1
    elif you >= goal:
        return 0
    else:
        return max(Q_pig(state, action, Pwin)
                   for action in pig_actions(state))

Let’s see how that works. So the probability of winning is 1 if my current score plus the pending is greater than or equal to goal. Then I win automatically just by reaping(reap 获得;得到) those pending. My probability of winning is 0 if your score is greater than the goal and I haven’t won. And otherwise(否则;另外;不然), my probability of winning is the probability that I get by taking the best action. So for all the actions– among all the actions I can do, look for the Q value of that action– from the current state according to the utility function– try to maximize that, and that’s going to be my probability of winning. So that’s saying I can make the best choice that I can. So we said that we had 3 choice points. Here, I’m making the best choice by maximizing.

def Q_pig(state, action, Pwin):
    "The expeted value of choosing action in state."
    if action == 'hold':
        return 1 - Pwin(hold(state))
    if action == 'roll':
        return (1 - Pwin(roll(state, 1))
                + sum(Pwin(roll(state, d)) for d in (2,3,4,5,6))) / 6.
    raise ValueError

Here, the die gets to roll, and we’re averaging– we’re summing them all up and dividing by 6, so that takes care of the averaging– and what about the worst choice that the opponent makes? Well, that’s just folded in(fold in 包起来;环绕) because rather than explicitly worrying about me and my opponent, I just said, “Well, I can use That’s the probability of the opponent winning.

19. Maxwins

So now we’re almost there. We’ve defined the problem. We’ve defined how the game works, and we’re ready to write the optimal strategy– the best possible strategy for Pig, and I’ll let you finish it off. We’ll call this function "max_wins"– the strategy function that maximizes the number of wins– –at least the number of expected wins– and go ahead and write your code there, and you can write it in terms of the functions we’ve defined above and in terms of a call to best action.

Peter留的题目:

# -----------------
# User Instructions
# 
# Write the max_wins function. You can make your life easier by writing
# it in terms of one or more of the functions that we've defined! Go
# to line 88 to enter your code.

from functools import update_wrapper

def decorator(d):
    "Make function d a decorator: d wraps a function fn."
    def _d(fn):
        return update_wrapper(d(fn), fn)
    update_wrapper(_d, d)
    return _d

@decorator
def memo(f):
    """Decorator that caches the return value for each call to f(args).
    Then when called again with same args, we can just look it up."""
    cache = {}
    def _f(*args):
        try:
            return cache[args]
        except KeyError:
            cache[args] = result = f(*args)
            return result
        except TypeError:
            # some element of args can't be a dict key
            return f(args)
    return _f

other = {1:0, 0:1}

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0) # pig out; other player's turn
    else:
        return (p, me, you, pending+d)  # accumulate die roll in pending

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    (p, me, you, pending) = state
    return (other[p], you, me+pending, 0)

def Q_pig(state, action, Pwin):  
    "The expected value of choosing action in state."
    if action == 'hold':
        return 1 - Pwin(hold(state))
    if action == 'roll':
        return (1 - Pwin(roll(state, 1))
                + sum(Pwin(roll(state, d)) for d in (2,3,4,5,6))) / 6.
    raise ValueError

def best_action(state, actions, Q, U):
    "Return the optimal action for a state, given U."
    def EU(action): return Q(state, action, U)
    return max(actions(state), key=EU)

def pig_actions(state):
    "The legal actions from a state."
    _, _, _, pending = state
    return ['roll', 'hold'] if pending else ['roll']

goal = 40

@memo
def Pwin(state):
    """The utility of a state; here just the probability that an optimal player
    whose turn it is to move can win from the current state."""
    # Assumes opponent also plays with optimal strategy.
    (p, me, you, pending) = state
    if me + pending >= goal:
        return 1
    elif you >= goal:
        return 0
    else:
        return max(Q_pig(state, action, Pwin)
                   for action in pig_actions(state))

def max_wins(state):
    "The optimal pig strategy chooses an action with the highest win probability."
    return # your code here


def test():
    assert(max_wins((1, 5, 34, 4)))   == "roll"
    assert(max_wins((1, 18, 27, 8)))  == "roll"
    assert(max_wins((0, 23, 8, 8)))   == "roll"
    assert(max_wins((0, 31, 22, 9)))  == "hold"
    assert(max_wins((1, 11, 13, 21))) == "roll"
    assert(max_wins((1, 33, 16, 6)))  == "roll"
    assert(max_wins((1, 12, 17, 27))) == "roll"
    assert(max_wins((1, 9, 32, 5)))   == "roll"
    assert(max_wins((0, 28, 27, 5)))  == "roll"
    assert(max_wins((1, 7, 26, 34)))  == "hold"
    assert(max_wins((1, 20, 29, 17))) == "roll"
    assert(max_wins((0, 34, 23, 7)))  == "hold"
    assert(max_wins((0, 30, 23, 11))) == "hold"
    assert(max_wins((0, 22, 36, 6)))  == "roll"
    assert(max_wins((0, 21, 38, 12))) == "roll"
    assert(max_wins((0, 1, 13, 21)))  == "roll"
    assert(max_wins((0, 11, 25, 14))) == "roll"
    assert(max_wins((0, 22, 4, 7)))   == "roll"
    assert(max_wins((1, 28, 3, 2)))   == "roll"
    assert(max_wins((0, 11, 0, 24)))  == "roll"
    return 'tests pass'

print test()

(
注释里写了,最优的pig策略,选择最高胜率的1个action。

暂时不知道怎么写这个函数max_wins(state),谈谈我的最原始的思路。
注释也已经提到了,可以使用这里的已经定义的函数,那么就去找涉及概率的函数。

接下来就没什么头绪了,还是看看Peter的答案吧。
)

19. Maxwins Solution

And this is all it is. We just call best action from the current state using the Pig actions, using the quality function for Pig and trying to maximize the probability of winning.

def max_wins(state):
    "The optimal pig strategy chooses an action with the highest win probability."
    return best_action(state, pig_actions, Q_pig, Pwin)

20. Impressing Pig Scouts

Now let’s see how we did.

strategies = [clueless, hold_at(goal/4), hold_at(1+goal/3), hold_at(goal/2),
              hold_at(goal), max_wins]

So I’m going to define a set of strategies– the clueless strategy we expect to do the worst; strategies that try to solve the problem in 4 chunks(chunk (某物)相当大的数量或部分;厚厚的一块), in 3 chunks in 2 chunks, and to solve it all in one win; and then the max win strategy.

play_tournament(strategies)

Now, we play a tournament(联赛) with these strategies, and here’s the results we get back.

So you can see that the clueless strategy does very poorly– only wins 23 games out of 500. The max win strategy does the best of all– wins 325, but there’s some competitors that are pretty close. So hold at 20 wins 314– not that much worse off than the optimal strategy. And that holds up if we play a tournament with more games just to get a little bit more accuracy.

You wouldn’t be able to hit the run button and do this because it would time out, but if you bring it in to your own development environment, you can do that. And here we see max wins gets So only a couple percent better for max wins over hold at 20, but still it’s nice to know that no strategy can do better. And it turns out that if we increase the goal and made a longer game than just playing to 40 points– that the advantage for max wins over any of these other strategies would only increase.

In the betting(打赌) game, we had different utility functions. We tried out the linear utility, and we tried out the logarithmic utility. What about here? Well, we defined our utility as a probability of winning, and the way the game is defined, that’s really the only sensible one. If you’re trying to win the game, you should maximize the probability of winning.

But maybe your only goal isn’t just to maximize the probability of winning. Maybe you’re in a big Pig tournament, and your seated at the Pig table, rolling the dice, and in the stands are lots of spectators(观众), watching the game with excitement. And you know that somewhere in the stands, there’s a scout(侦查员) from the NPA–the National Pig Association(联合;协会). And what you want to do is not just win the game– because lots of people are going to win the games– but you really want to get the attention of that NPA scout so that you can move on and have a professional career.

So maybe what your utility function would be would not just be to win the game, maybe your utility would be to maximize the differential(差别), to say, “If I just won the game by a couple points, nobody is going to notice, but if I won by a lot– if I really clobbered(狠揍,猛打) my opponent– then maybe this guy would take notice, and that would be worth more to me.” So you’d give up on the goal of just winning, and try to go for the maximizing your differential.

21. Maximizing Differential

@memo
def win_diff(state):
    "The utility of a state: here the winning differential (pos or neg)."
    (p, me, you, pending) = state
    if me + pending >= goal or you >= goal:
        return (me + pending - you)
    else:
        return max(Q_pig(state, action, win_diff)
                   for action in pig_actions(state))

So here I’ve written the utility function. I called this utility function “The Winning Differential.” And given a state, it tells me what that differential(差别) is– expected(期望的) differential for that state. And what it says is if we’re at the end of the game, if somebody’s won, then before, remember that our utility function was 0 or 1, but here utility function is my score– which is me, and I’m going to reap(获得;得到) the pending– minus your score. Otherwise, we just do the same thing with a Q function that we did before. And note that we’re always careful to memoize these functions, because they’re recursive, they’re recalling themselves over and over again. We don’t want to repeat those computations, so we memoize them so we only have to each date computation once.

def max_diffs(state):
    """A strategy that maximizes the expected difference between myfinal score
    and my opponent's."""
    ## your code here

Now, what I want you to do is write the strategy function. This was the utility function over states, now I want you to write the strategy function. You can do it in terms of what we’ve defined before

Peter留的题目:

# -----------------
# User Instructions
# 
# Write a function, max_diffs, that maximizes the point differential
# of a player. This function will often return the same action as 
# max_wins, but sometimes the strategies will differ.
#
# Enter your code at line 101.

from functools import update_wrapper

def decorator(d):
    "Make function d a decorator: d wraps a function fn."
    def _d(fn):
        return update_wrapper(d(fn), fn)
    update_wrapper(_d, d)
    return _d

@decorator
def memo(f):
    """Decorator that caches the return value for each call to f(args).
    Then when called again with same args, we can just look it up."""
    cache = {}
    def _f(*args):
        try:
            return cache[args]
        except KeyError:
            cache[args] = result = f(*args)
            return result
        except TypeError:
            # some element of args can't be a dict key
            return f(args)
    return _f

other = {1:0, 0:1}

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0) # pig out; other player's turn
    else:
        return (p, me, you, pending+d)  # accumulate die roll in pending

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    (p, me, you, pending) = state
    return (other[p], you, me+pending, 0)

def Q_pig(state, action, Pwin):  
    "The expected value of choosing action in state."
    if action == 'hold':
        return 1 - Pwin(hold(state))
    if action == 'roll':
        return (1 - Pwin(roll(state, 1))
                + sum(Pwin(roll(state, d)) for d in (2,3,4,5,6))) / 6.
    raise ValueError

def best_action(state, actions, Q, U):
    "Return the optimal action for a state, given U."
    def EU(action): return Q(state, action, U)
    return max(actions(state), key=EU)

def pig_actions(state):
    "The legal actions from a state."
    _, _, _, pending = state
    return ['roll', 'hold'] if pending else ['roll']

goal = 40

@memo        
def Pwin(state):
    """The utility of a state; here just the probability that an optimal player
    whose turn it is to move can win from the current state."""
    # Assumes opponent also plays with optimal strategy.
    (p, me, you, pending) = state
    if me + pending >= goal:
        return 1
    elif you >= goal:
        return 0
    else:
        return max(Q_pig(state, action, Pwin)
                   for action in pig_actions(state))

@memo
def win_diff(state):
    "The utility of a state: here the winning differential (pos or neg)."
    (p, me, you, pending) = state
    if me + pending >= goal or you >= goal:
        return (me + pending - you)
    else:
        return max(Q_pig(state, action, win_diff)
                   for action in pig_actions(state))

def max_diffs(state):
    """A strategy that maximizes the expected difference between my final score
    and my opponent's."""
    # your code here

def max_wins(state):
    "The optimal pig strategy chooses an action with the highest win probability."
    return best_action(state, pig_actions, Q_pig, Pwin)


def test():
    # The first three test cases are examples where max_wins and
    # max_diffs return the same action.
    assert(max_diffs((1, 26, 21, 15))) == "hold"
    assert(max_diffs((1, 23, 36, 7)))  == "roll"
    assert(max_diffs((0, 29, 4, 3)))   == "roll"
    # The remaining test cases are examples where max_wins and
    # max_diffs return different actions.
    assert(max_diffs((0, 36, 32, 5)))  == "roll"
    assert(max_diffs((1, 37, 16, 3)))  == "roll"
    assert(max_diffs((1, 33, 39, 7)))  == "roll"
    assert(max_diffs((0, 7, 9, 18)))   == "hold"
    assert(max_diffs((1, 0, 35, 35)))  == "hold"
    assert(max_diffs((0, 36, 7, 4)))   == "roll"
    assert(max_diffs((1, 5, 12, 21)))  == "hold"
    assert(max_diffs((0, 3, 13, 27)))  == "hold"
    assert(max_diffs((0, 0, 39, 37)))  == "hold"
    return 'tests pass'

print test()

(这个题我没有思考,直接看了答案。。)

21. Maximizing Differential Solution

And the answer is you call best action from the state. The actions and the Q function are just like before, but the utility function we’re trying to maximize is the differential.

def max_diffs(state):
    """A strategy that maximizes the expected difference between my final score
    and my opponent's."""
    return best_action(state, pig_actions, Q_pig, win_diff)

22. Being Careful

def max_diffs(state):
    return best_action(state, pig_actions, Q_pig, win_diff)

Now, I want to say right here that I made a mistake, and I haven’t talked about this very much over the course of these lectures(lecture 教训;演讲), but I’m making mistakes all the time. I know you guys are. You type something in, it doesn’t work. I have the same problem over and over again. I keep making mistakes, and that’s fine as long as I know how to correct them. Here the mistake I made is–the mechanical(呆板的;机械的) mistake was I messed up(mess up 弄乱;搞乱), and I typed the wrong function at one point. I think these function names are too similar, and I made a mistake in naming them max_diffs and win_diff. They sound too much alike, and when I was playing with this I put in the utility function where I meant to put in the strategy function, and that’s an easy mistake to make because they sound the same.

Probably I should have come up with better names. I should have called this win_diff_utility or something like that to make it clear that this is the utility function and this is the strategy function, but the annoying(讨厌的;恼人的) thing was that mistake went unnoticed(被忽视的) for a while, and I’ll show you part of the problem.

def play_pig(A, B, dierolls=dierolls()):
    strategies = [A, B]
    state = (0, 0, 0, 0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return  strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        elif strategies[p](state) == 'hold':
            state = hold(state)
        else:
            state = roll(state, next(dierolls)) 

Here’s the play_pig function. It takes two strategies: A and B. These have to be strategy functions, and then it comes down here, and it applies the appropriate strategy function to a state, and it says if that strategy function decides to hold, then we do the hold action, otherwise– well, there’s only 2 actions you could do, so I’ll do the roll action. That makes sense, right? Now here’s the problem. If instead of passing in a strategy function you accidentally(偶然地,意外地) pass in a utility function, then that’s all good down here. You say strategies of P apply to state, and that’s really applying a utility function to state, what’s the utility function going to return? Well, it’s going to return a number, and play_pig just says, “Well, does that number equal to hold?” No, it’s not. No numbers are equal to hold. So then I’m just going to assume that you meant roll. And so the fact that I passed in a completely wrong function that’s doing nothing related to strategy– it’s returning a number rather than an action– went completely unnoticed(被忽视的), and instead what my strategy was– the utility function that returns a number acted as if it was a strategy function that always said roll.

Now, in general, that’s one of the complaints(抱怨) that people have about Python is that it’s too easy to make that mistake because you don’t have to declare(声明) for each function what are its inputs and what are its outputs. In other languages, you would do that, and the program where you accidentally(偶然地,意外地) used a utility function where you expected a strategy function– that program wouldn’t even compile. You’d get an error message before you ran it. In Python, you don’t have that protection, so you’ve got to build in the protection yourself.

So what I’d like you to do is update the play_pig function so that it looks at the result that comes back from the strategy function, and make sure that it’s either hold or roll and if it’s not one of those, then let’s decide that what we do is that that strategy automatically(自动地) loses a game. So if you make an illegal(不合法的) move, you lose the game right there.

Peter留的题目:

# -----------------
# User Instructions
# 
# Update the play_pig function so that it looks at the result
# that comes from the strategy function and makes sure that 
# it is either 'hold' or 'roll' and if it's not one of those
# then that strategy should immediately lose and the other 
# strategy should be declared the winner.

import random

def roll(state, d):
    """Apply the roll action to a state (and a die roll d) to yield a new state:
    If d is 1, get 1 point (losing any accumulated 'pending' points),
    and it is the other player's turn. If d > 1, add d to 'pending' points."""
    (p, me, you, pending) = state
    if d == 1:
        return (other[p], you, me+1, 0) # pig out; other player's turn
    else:
        return (p, me, you, pending+d)  # accumulate die roll in pending

def hold(state):
    """Apply the hold action to a state to yield a new state:
    Reap the 'pending' points and it becomes the other player's turn."""
    (p, me, you, pending) = state
    return (other[p], you, me+pending, 0)

def dierolls():
    "Generate die rolls."
    while True:
        yield random.randint(1, 6)

other = {1:0, 0:1}
goal  = 40

def play_pig(A, B, dierolls=dierolls()):
    """Play a game of pig between two players, represented by their strategies.
    Each time through the main loop we ask the current player for one decision,
    which must be 'hold' or 'roll', and we update the state accordingly.
    When one player's score exceeds the goal, return that player."""
    strategies = [A, B]
    state = (0, 0, 0, 0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        elif strategies[p](state) == 'hold':
            state = hold(state)
        else:
            state = roll(state, next(dierolls))

def bad_strategy(state):
    "A strategy that could never win, unless a player makes an illegal move"
    return 'hold'

def illegal_strategy(state):
    return 'I want to win pig please.'

print play_pig(bad_strategy, illegal_strategy).__name__ 

def test():
    winner = play_pig(bad_strategy, illegal_strategy) 
    assert winner.__name__ == 'bad_strategy'
    return 'tests pass'

print test()
def play_pig(A, B, dierolls=dierolls()):
    strategies = [A, B]
    state = (0, 0, 0, 0)
    while True:
        (p, me, you, pending) = state
        if me >= goal:
            return  strategies[p]
        elif you >= goal:
            return strategies[other[p]]
        else:
            action = strategies[p](state)
            if action == 'hold':
                state = hold(state)
            elif action == 'roll':
                state = roll(state, next(dierolls))
            else: # Illegal action? You lose!
                return strategies[other[p]]

Here’s how I did it–makes the function just a little more complicated. I added a variable called “action” to hold the result of the strategies. If it’s hold, we hold. If it’s roll, we roll. Otherwise, you lose, which means that the other strategy wins.

24. Using Tools

So now let’s go back and analyze this maximize differential strategy versus the maximizing probability of winning strategy.

The question is, how do these 2 compare? When are they different, and when are they the same? If you’re trying to impress the scouts(scout 侦查员), you’re not going to be making some crazy moves, so probably most of the time, you’d expect the 2 strategies to agree, but some of the time, maybe 1 of them is going to be more aggressive or taking more chances than the other. Let’s see if we can analyze that.

states = [(0, me, you, pending)
          for me in range(41) for you in range(41) for pending in range(41)
          if me + pending <= goal]
len(states)
35301

So I start off by defining a bunch of states, and I’m just going to look from 1 player’s point of view. It doesn’t really matter to have both since it’s symmetric(相称性的,均衡的), so for all these values of me, you, and pending, collect all those states. It turns out that there’s 35,000 of them.

from collections import defaultdict

r = defaultdict(int)
for s in states: r[max_wins(s), max_diffs(s)] += 1

Then I define a variable r to be a default dictionary, which counts up integers, so it starts at 0, and then I go through all the states, and I increment the count for a result for the tuple of the action that’s taken by max_wins and the action that’s taken by max_diffs. I want to count up. This is going to be hold, hold, roll, roll. etc.

dict(r)
{('hold', 'hold'):  1204,
 ('hold', 'roll'):   381,
 ('roll', 'roll'): 29741,
 ('roll', 'hold'):  3975}

I want to see how many of each do we have, and let’s convert r back to standard dict, and there we have it. So most of the time, 29,700 out of the 35,000–both strategies agree that roll is the right thing to do. Then another 1200 times, both strategies agree that hold is the right thing to do. But in 2 cases, they differ. So sometimes, max_wins says hold and max_diffs says roll. That happened 381 times, but 10 times more often, it’s max_wins that says roll and max_diff that says hold. That actually surprised me. So it’s the max_wins strategy that’s really more aggressive. It’s rolling more often.

I thought it was going to be the max_diffs strategy. I thought that was going to be more aggressive, right? So that’s the one that’s trying to impress the scouts. I thought it was going to be rolling trying to rack up a really big score.

But no! So the data tells a different story. It’s not trying to rack up(在比赛中获(胜),得(分)) a really big score. So what’s going on? Well, first it might be nice just to quantify(确定…的数量) how different they are since I kind of asked that question.

(3975 + 381) / 35301.
0.12339593779213054

So there’s 35, 301 states all together and they differ on 3975 + 381, and that’s 12% of the states that they differ on. So what’s the story? Where do those 12% of the states come from? We still don’t know, and we don’t even quite know what questions to ask, but it’s here that some of our design choices start to pay off.

So remember we always start our design with an inventory of concepts, and we have things like the dice and the score, and then we got into things like the utility function and the quality function, so we built all these up, and yes, we’re building from the ground up, and yes, at the top, we have a play_pig function, and we can still call that function, but at the bottom, we have all these useful tools. So now when we’re not just about playing pig, now we’re trying to analyze the situation to understand this story of why are these 2 different? Well, play_pig by itself–the top level function we define–that’s not going to help us, but all these little tools that we built down here, they will be helpful. We can start to put them together and explore. So we built this tower, and the tower built up to define the play pig function, and in some languages, it’s all about building the tower. When you’re done, that’s all you have. But in Python, it’s common and in many languages, it’s a good design and strategy to say let’s just build up components along the way so that we–yes, we have the tower, but we can also go out in other directions. If we’re interested–not just in playing pig–but we’re interested in figuring out this story, then we can quickly assemble(集合,收集;装配,组合) pieces from down here and build something that can address that. So I’ve got all the pieces available. It makes it easy to explore. But I still need an idea, and here’s my idea. I expected maximize differential(差别) to be aggressive(积极行动的,有进取心的), to try to rack up the big points, and I found out that it was actually maximizing the probability of winning that was more aggressive that rolled more often. Why could that be? I think I might know the answer. I think it might be that the maximized differential is more willing to lose rather than more excited about winning by a lot. What do I mean by that? Well, if you’re maximizing the probability of winning, you don’t care if you lose by 1 or if you lose by 40, it’s all a loss. The maximized differential–if he’s losing by a fair amount(相当数量), he might say, well–say he’s behind 39-0 in a game to 40, and say he’s accumulated(accumulate 积累;(数量)逐渐增加) 30 points, If he’s trying to maximize the probability of winning, he would keep on rolling. He says, well, I don’t have that good of a chance of winning, but all that counts is winning. If I stop now, the opponent’s going to win on the next move, so I’ve got to keep rolling. Probably I’ll pig out and only get 1 point, but it’s worth it for that small chance of winning. That’s what the maximize win probability strategy would do. The maximize differential strategy would say, hey, if I can get 30 points rather than 1, that cuts the differential way down, so that’s worth doing. I’ll sacrifice(牺牲;奉献) winning in order to maximize the differential. Now that’s a suggestion of a story, but I don’t know yet. Is that the right story? Let’s find out.

25. Telling A Story

def story():
    r = defaultdict(lambda: [0, 0])
    for s in states:
        w, d = max_wins(s), max_diffs(s)
        if w != d:
            _, _, _, pending = s
            i = 0 if (w == 'roll') else 1
            r[pending][i] += 1
    for (delta, (wrolls, drolls)) in sorted(r.items()):
        print '%4d: %3d %3d' % (delta, wrolls, drolls)

So here’s what I did. I wrote this little function to tell a story. The story I want to tell is, over all(遍及) the states, let’s group the states in terms of the number of pending points in that state, and then for each of those number of pending points, say all the states for which there are 10 points pending, how many times did max_wins decide to roll versus how many times did max_diffs decide to roll? And just to consider the ones in which the 2 differ. So throw away all of the states in which they take the same action. Let’s see what that looks like. Let me just describe briefly what I do. So I start off–I have a default dictionary and the default is that I have 2 values– Then I go through–get the 2 strategy functions to apply to the state. If they’re different, figure out what pending is and increment the pending count for the person who decided to roll, and then I just go and print them out.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第4张图片

Here’s the result. It does tell a story. Look what’s happening. When I have a small number of points pending, most of the time, the 2 strategies agree, but when they differ, it’s the max_diff strategy that’s deciding to roll.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第5张图片

But as the number of points increase–the number of pending points increase, we see it’s max_wins who’s willing to roll, and max_diff not at all, so that’s a perfect segregation between the 2 in this crossover point(crossover point 交叉点,跨越点,相交点) between 13 and 16. We’re here. Max_wins is rolling all the time with very high pending amounts. But max_diff is never willing to do that. So what must be happening here is 300 times max_wins has 24 pending points, and he’s willing to roll. It must be because the opponent is just about to win and he says, “Even though I’m risking 24, I still want to win the game, so I’ve got to roll.” Max_diff says, “Are you crazy? I got 24 points on the board. I’m going to reap(获得;得到) them right now. And yes, I may lose, but I’m really going to cut down that differential.”

So look what the story told us. I thought that this man in the arena(竞技场) who was playing the max_diff strategy was going to impress(给…以深刻印象) the NPA scouts by playing aggressively(激烈地;攻击地), and it turned out that the story was completely different. So what does that tell us. Well, first it tells us that there is an interesting distinction about how we wrote our function to maximize differential. Note that the way we wrote it is we completely separated the what, which is the rules of the game for how pig is played, from the how of how does it make decisions, and that was the perfectly general best actions. So we didn’t go in and say, we’re going to write specific rules for pig to say, if you’re in such and such a position, do such and such. Rather we just said, here’s how pig works. Here’s what I’m trying to do. Maximize differential. You do the rest. So the results that came back were surprising to me because I didn’t really understand how the what and the how interacted(interact 相互作用;互相影响;互动), but I was still able to write a program that maximized the differential(差别), even though I didn’t understand what that program is actually doing, and that’s the power of breaking up these aspects(aspect 方面;方位) into what’s happening versus maximizing. The second part that’s interesting here is that I was able to do exploration. So it wasn’t just that I built a monolithic(整体的;庞大的) program that could do 1 thing. It’s that I build a set of tools that could explore the area. When I wanted to understand something that was different from what I originally did, I was able to do that because I had the tools around.

26. Simulation vs Enumeration

Simulation 模仿,模拟
enumeration 列举,计数;详表

So we’ve talked about some probability problems that we can handle with simulation–that is, choosing using a random number generator, some samples, and hoping that they’re representative of the problem. And if you choose enough, you get a close enough representation. An alternative strategy is enumeration(列举,计数;详表), where we actually go over all the possibilities and we can compute an exact probability. Now, some problems are so complex that it would take forever to do that, but computers are much better at it than people are, and so it’s a powerful strategy. We’ll show you some simple examples, and they can scale up(按比例增加[提高]) to more complex ones.

So let’s imagine all the possible families living in houses, and these houses have different poperties. This one is colored red and has an Englishman. This one has a zebra, and so on. But for now we’re only interested in the children that live in those houses, and, in fact, we’re only interested in the houses that have exactly 2 children. So we’re going to not consider some of them, and consider the other ones that have exactly 2 children, and then we want to be able to ask questions of them. And we can ask probability questions. We’re going to constrain ourselves to ask conditional probability questions(conditional probability question 条件概率问题).

P(A | B)

So what is the probability of Event A given Event B? And an event is just a state of affairs(affair 事件). So Event B might be the event of the family having exactly 2 children.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第6张图片

And so we’ve crossed off(cross off 划掉) these houses, and we’re only taking these other ones, and then Event A might be the probability of having a boy or a girl. Now, in real life, we could do simulation by going out and polling(poll 对…调查;作民意调查) and asking people. In mathematics, we can do enumeration if we make certain assumptions(assumption 假定,假设), and one assumption we can make is that it’s exactly 50% probable(可能的;大概的) that you get a boy and 50% probable that you get a girl and that one birth is independent(自主的;不相关联的) from another.

P(2 boys | at least 1 boy)
           and 2 children total

So let’s address(称呼) the question. What’s the probability of having 2 boys given that there is at least 1 boy in the family? And the universe(经验领域;宇宙) of possibilities is only the families that have exactly 2 children. So we could put that here in the condition as well– at least 1 boy and 2 children total. What do you think this probability is equal to? Let’s put your answer here and enter it in the form of a fraction(分数). So if you think it’s one-half, put 1 and then 2. If you think it’s 11-17ths, put 11 and 17.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第7张图片

26. Simulation vs Enumeration Solution

Here’s the way to look at it is we count up(加起来) the number of equally probable events on this side, 1, 2, 3; and then we count up out of those how many appear on this side, just one; so the answer is one-third.

27. Conditional Probability

Now if you could do all of these just by writing with a pen, you wouldn’t be in a computer class. So let’s start modeling this. And so we’re going to do our concept inventory. So what do we have? Well, these individual results here come from random variables. So a random variable is like the first child born, which can be a boy or a girl. The whole universe is called the sample space, and then these individual sets of circles are called events–like the event of having (2 boys), or the event of having at least 1 boy. And an event consists, then, of a collection of sample points. So a sample point is BG, GB or BB. In terms of representation, we’re going to just represent sample points as strings. So we’ll have like the string, ‘GG’, and we can represent events two ways: as a collection of strings or as a predicate(断言,断定;宣布)– a function, which is true of certain strings and not of others.

import itertools
from fractions import Fraction

sex = 'BG'

def product(*variables):
    "The cartesian product (as a str) of the possibilities for each variable."
    return map(''.join, itertools.product(*variables))

two_kids = product(sex, sex)

So here’s what I did: I imported itertools because we’re going to need that. I searched for–and found–a new class, called fractions, within the fractions module, and there’s a function or a constructor(构造器), called fraction, which produces an exact fraction. And the reason I wanted that is because when the answer is 1/3, I wanted to see that it was exactly 1/3. I didn’t want to see that it’s .33333. So all a fraction is is a numerator((分数的)分子) and a denominator(分母), paired together, and they know how to do arithmetic(计算,算术). So here’s my random variable. I represent that as a collection of possibilities in here– I just strung(string的过去式;串起) the possibilities together. I could have said set of ‘BG’ or the list of ‘B,G’ but I just put them together, as a string. Then I said we can combine random variables with a cartesian product(笛卡尔乘积)–and I used itertools.product– and then I just said: get all results, which itertools.product produces tuples, and I want to look at them as strings. So now two_kids is the product of two children and we’re looking at their sex.

two_kids
['BB', 'BG', 'GB', 'GG']

one_boy = [s for s in two_kids if 'B' in s]

def two_boys(s): return s.count('B') == 2

So if I evaluate that, I get this collection. And one_boy is just the points in two_kids that have at least one boy in the string. So it would be this one, this one, and this one. And then I can define two_boys as a predicate. That’s saying that’s True when the count of the number of boys is equal to (2). That would be True here.

def condP(predicate, event):
    """Conditional probability: P(predicate(s) | s in event).
    The proportion of states in event for which predicate is true."""
    pred = [s for s in event if predicate(s)]
    return Fraction(len(pred), len(event))

And finally, I can define my conditional probability. That says what’s the probability of (p), given (e), where (e) is an event specified as a list of sample points, all equal, probable, and predicate(断言,断定); is a predicate that returns True or False of elements of that event, and so the True elements are just the ones for which it is True, and then I returned a fraction– how many out of the event satisfy the predicate. Then I can ask, what’s the conditional probability of two_boys, given one_boy–and the answer is 1/3. So that’s what we expected.

print condP(two_boys, one_boy)
1/3

28. Tuesday

Now let’s move on to a slightly more complicated question: out of all the families with two kids– with at least 1 boy, born on a Tuesday– what’s the probability of two boys? Now, you might think that the answer should be the same– it should still be 1/3 because why does Tuesday matter? After all, the kid’s gotta be born sometime and if it happens to be Tuesday, why would that be any different than any other day? So is it 1/3? Well, as Gottfried Leibniz said, “Let us calculate.”

So we have the technology to model that.

day = 'SMTWtFs'  #  Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday

two_kids_bday = product(sex, day, sex, day)

First, a random variable for day of the week-_ and I had to fool around with the capitalization there, to make sure that we have 7 distinct letters: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday– plus ample(足够的;充足的) space of two_kids_bday, one kid with their day of birth; the second kid, with their day of birth. What does that look like? Well, it’s this huge thing of (2 X 7 X 2 X 7) entries. The first one: Boy born on Sunday; boy born on Sunday, all the way through to the last one: girl born on Saturday, girl born on Saturday.

boy_tuesday = [s for s in two_kids_bday if 'BT' in s]

print condP(two_boys, boy_tuesday)

Then a boy born on Tuesday is all the elements of this, where “BT” appears is in the string. So either “BT” will be the first 2 characters or the last 2 characters. And now we’re finally at the point where we can say: given at least 1 boy_tuesday, what’s the probability of two_boys? And before I show the results, I’m going to ask you what you think it is. You could follow along, either with pencil and paper or do the computation or just think it out in your head. So Enter as a fraction. If you think it’s 1/3, put a 1 here and a 3 here–or whatever.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第8张图片

28. Tuesday Solution

print condP(two_boys, boy_tuesday)
13/27

If I go ahead and execute this and print the result, it comes out: 13/27. Wow! Where did that come from? So that’ surprising–first of all, it’s not 1/3, which you might have thought should be the answer if you believe the argument that Tuesday doesn’t matter. And secondly, not only is it not 1/3, but it’s much closer to 1/2 than it is to 1/3. So just having the birthday there really changed things a lot. How did that happen?

def report(verbose=False, predicate=two_boys, prename='2 boys',
           cases = [('2 kids', two_kids), ('2 kids born any day', two_kids_body),
                    ('at least 1 boy', one_boy),
                    ('at least 1 boy born any day', boy_anyday),
                    ('at least 1 boy born on Tuesday', boy_tuesday),
                    ('at least 1 boy born in December', boy_december)]):
    import textwrap
    for (name, event) in cases:
        print 'Reason:\n"%s" has %d elements:\n%s' % (
            name, len(event), textwrap.fill(' '.join(event), 85))
        good = [s for s inevent if predicate(s)]
        print 'of those, %d are "%s":\n%s\n\n' % (
            len(good), predname, textwrap.fill(' '.join(good), 85))

report()

Well, I wrote up a little function here to report my findings, and here’s its arguments. You can give it a bunch(大量;束,串,捆) of cases that you care about– the predicate that you care about and whether you want the results to be verbose or not.

month = 'JFMAmjLaSOND'

two_kids_bmonth = product(sex, month, sex, month)

boy_december = [s for s in two_kids_bmonth if 'BD' in s]

And it just prints out some information– and, by the way, as part of this, I also looked at the question of what’s the probability of two_boys, given that there’s one boy born in December so I threw that in as well.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第9张图片

And here’s the output I get: 2 boys, given 1 boy is 1/3; and born on Tuesday is 13/27, and born in December is 23/47.

《D o C P》学习笔记(5 - 1)Dealing with Uncertainty Through Probability - Lesson 5_第10张图片

Now, I can turn on the verbose option to report In that case, here’s what I see: The probability of 2 boys, given at least 1 boy– born on Tuesday–is 13/27. And here’s the reason–at least 1 boy, born on Tuesday, has 27 elements–and there they are– and of these, 13 are 2 boys–and there they are. And so, you can’t really argue with that. You can go through and you can make sure that that’s correct, and you can look at the other elements of the sample space and say no, we didn’t miss any– so that’s got to be the right answer. It’s not quite intuitive(直觉的;直观的) yet, and I’d like to define my report function so that it gives me that intuition(直觉;直觉力) but right now, I don’t have the right visualization. So I’ve got to do some of the work myself.

And here’s what I came up with: We still have the four possibilities that we showed before but now we’re interested, not just in boys– we’re interested in boys born on Tuesday. So there’s going to be some others over here where there’s, say, boy born on Wednesday, along with some other partner– maybe a boy born on Saturday. But we’re not even considering them; we’re throwing all those out. We’re just considering the ones that match here. And like before, we draw 2 circles: one of the right-hand side of the event– of the conditional probability. And so how many of those are there? Well, there’s 7 possibilities here because the boy has to be born on Tuesday– there’s only 1 way to do that–but there’s 7 ways for the girls to be born. So there’s 7 elements of the sample state there; likewise(同样地;也,而且), 7 elements over here. Now how many elements over here? Well here, either one of the 2 can be a boy born on Tuesday. So really, we should draw(会话;招致;吸引) this state as either a boy born on Tuesday, followed by another boy or a boy, followed by a boy born on Tuesday. And how many of those are there? Well, there’s 7 of these by the same argument we used in the other case, and of these, there’s also 7 but now I’ve double-counted because in one of these 14 cases is a boy born on Tuesday, followed by a boy born on Tuesday. So I’ll just count 6 here. And so now it should be clear: 7, 14, 21, 6, 27. There’s 27 on the right-hand side, and then what’s the probability of 2 boys, given this event of at least 1 boy born on Tuesday? Well, 2 boys–that’s here–so it’s 13 out of the 27. So that’s the result. Seems hard to argue with. Both the drawing it out with a pen and the computing worked out to the same answer. Now why is it that we have a strong intuition that, knowing the boy born on Tuesday shouldn’t make any difference? I think the answer is because we’re associating that fact with an individual boy. We’re like taking that fact and nailing(钉住;使固定;抓住;揭露) it on to him–and it’s true. If we did that, that wouldn’t make any difference. But, in this situation, that’s not what we’re doing. We’re not saying anything about any individual boy. If we did that, the computation wouldn’t change. Rather, we’re making this assertion that at least one was born on Tuesday– not about boys, but about pairs. And we just don’t have very good intuitions about what it means to say something about a pair of people, rather than about an individual person, and that’s what we did here– and that’s why the answer comes out to 13/27.

29. Summary

So let’s summarize what we did in this Unit.

We learned that probability is a powerful tool for tackling(tackle 着手处理) problems with uncertainty.

We learned that we can do Search with uncertainty, like we did Search in the previous Unit, over Exact(准确的;精确的) Certain(必然的;已确定的) domains. Here, we can handle uncertainty in our Search.

We learned that the notion(概念,观念) of Utility gives us a powerful and beautiful general approach to solving the Search problems. It gives us the best-action function with which we can solve any problem that can be specified in the form that best-action expects, and that’s a wide variety of problems. Now, some of them are so complex that they can’t be computed in a feasible(可行的;可用的) amount of time. And there are more advanced techniques for dealing with approximations(approximation 近似值;粗略估计) to that. But it’s incredibly(很,极为,难以置信地) powerful because it separates out the How versus the What. You only have to tell the computer what the situation is. You don’t have to tell it how to find the best answer, and it automatically finds the best answer.

And we learned you can deal with probability through simulation(模仿,模拟), making repeated random choices, and just counting up in how many one answer occurs, versus another.

And we learned that if the total number of possibilities is small, you can just enumerate(列举,枚举) them. You can count them all, and you can get an exact answer, as an exact fraction rather than an approximation.

And we learned some general strategies that don’t have to do with probability.

When we were trying to figure out how to add printing to our game, we looked at the notion of a wrapper function. That is, how we inject(注入;注射;添加) functionality into an existing function, by sneaking(潜行;偷偷溜走) it in on top of one of the arguments. And this is an example of aspect-oriented programming, where we take the aspect of printing out what’s happening and keep that separate from the main logic of the program.

We learned that you can do exploratory(探索的;试探性) data analysis. When I was looking at the two strategies for playing PIG and where they differed, that was a completely different question than what I’d designed the PIG program for. Because I had put together the right pieces, it was easy to do the exploration and come to an understanding.

And we learned–or, at least, I learned because I was the one who made the mistake– that errors can pop up, particularly in the types of arguments and results that functions expect in return– and that you have to be careful, Python, to deal with that because Python doesn’t give you the seatbelts(安全带) that other languages have, to protect yourself from those type of errors. So you have to be vigilant, on your own.

And finally, that was a lot to cram into(cram into 强行塞入,填满) one Unit. So if you followed along all of that– congratulations, for the work you’ve done. You’ve learned a lot.

Have fun with the homework; we’ll see you in the next Unit.

参考文献:

  1. Design of Computer Programs - 英文介绍 - Udacity;
  2. Design of Computer Programs - 中文介绍 - Udacity;
  3. Design of Computer Programs - 中文介绍 - 果壳;
  4. Design of Computer Programs - 视频列表 - Udacity;
  5. Design of Computer Programs - 视频列表 - YouTube;
  6. Design of Computer Programs - class wiki - Udacity。

你可能感兴趣的:(软件工程,SICP,Design,of,Computer,Programs,学习)