《D o C P》学习笔记(6 - 1)Word Games

备注1:每个视频的英文字幕,都翻译成中文,太消耗时间了,为了加快学习进度,我将暂停这个工作,仅对英文字幕做少量注释。
备注2:将.flv视频文件与Subtitles文件夹中的.srt字幕文件放到同1个文件夹中,然后在迅雷看看中打开播放,即可自动加载字幕。

Word Games

你可以学到什么:

  • Managing complexity.
  • Large sets of words.
  • Appropriate data structures.

Lesson 6

视频链接:
Lesson 6 - Udacity

Lesson 6 - YouTube

Course Syllabus

Lesson 6: Word Games

  • Lesson 6 Course Notes(主要是课程视频对应的英文字幕的网页。)
  • Lesson 6 Code
  • Lesson 6 words4k.txt file

01 Welcome Back

Hi, welcome back. So far in this class we covered a lot of programming techniques, but we mostly done it with small examples of code. In this unit, we’re going to look at a larger example than anything we seen before. We’re going to write an algorithm for finding the highest scoring play in a crossword纵横字谜) tile(牌;麻将牌) game. Now, versions of this game go by names like Scrabble(乱摸;扒寻) and Words with Friends. So we’re going to have to represent everything about the words, the tiles, the board, the scoring and the algorithm for finding the highest scoring word. That’s going to be more code I’m going to be writing a lot of it and you’re going to get practice reading, that’s an important skill but I’m also going to stop and leave you plenty of places where you can write some other code and at any point, if you want a bigger challenge, you can stop the video and go ahead yourself and try to solve as much of it as you can on your own and I would encourage you to do that. This is a big step-up(stepup 加速的;增强的). I think you are all ready for it. So let’s get started.

02 Word Games

We’ve got a lot to cover and not much time to do it so let’s dig right in. Here’s a game I’m playing online with my friend, Ken.

《D o C P》学习笔记(6 - 1)Word Games_第1张图片

I’m winning by a little bit mostly because I got good letters like the Z and so on but Ken is catching up.

Let’s dig right in(dig in 全力以赴地做起来) and come up with our concept inventory. What have we got?

Well, the most obvious thing, there’s a board.
there’s letters–both letters on the board and letters in the hand, and the letters on the board have to form words and in the hand they’re not.
There’s the notion(概念,观念;) of a legal(合法的) play on the board, so RITZY is a word, and it’s a word independent of where it appears, but it’s legal to have placed it here where it hooks up(hook up 连接) with another letter, and it wouldn’t have been legal to place it where it bumps into(撞上;偶然遇见) the H or where it’s not attached to anything else.
There’s the notion of score and the score for individual letters. Z is worth 10. An I is worth 1.
And there are scores for a play where you add up the letters. Part of that is that there are bonuses(bonus 奖金,额外津贴;红利) on the board. DL means double letter score. A letter that’s placed there gets doubled. DW means double word score. If any letter of the word is on that square, then the whole word score is doubled, and we also have triples as well.
Somewhere behind the scenes, there’s a dictionary and all these words are in the dictionary and other combinations of letters are not.
Then not shown here is the notion of a blank tile. Part of the hand might be a blank that isn’t indicating any particular letter, but you’re free to use for any one, similar to the way we had jokers in the game of poker.

(不完整的笔记:
Z : 10 分
I : 1 分
DL : Double Letter score
DW : Double Word score
TL : Triple Letter
TW : Triple Word

blank tile : 手里的牌的一部分是 blank 空的,不代表任何特定的字母,但你可以自由使用任何 1 个字母。
)

03 Concept Inventory

Now let’s talk about how to implement any of these, see if there’s any difficulties, any areas that we think might be hard to implement.

The board can be some kind of two-dimensional array, maybe a list of lists is one possibility. One thing I’m not quite clear on now is do I need one board or two? It’s clear I need one board to hold all the letters, but then there’s also the bonus squares. Should that be part of the same board or should that be a separate board and the letters are layered on top of this background of bonus squares? I’m not quite sure yet, but I’m not too worried about it, because I can make either approach work.

A letter can be one character string.

A word can be a string.

A hand can also be a string. It could also be a list of letters. Either one would be fine. Any collection of letters would be okay. Note that a set would not work for the hand. The hand can’t be a set of letters, because we might have duplicates, and sets don’t allow duplicates.

Now, for the notion of a legal play, we’ll have some function that generates legal plays, given a board position and a hand, and then the plays themselves will need some representation. Maybe they can be something like a tuple of say starting position– for example, “RITZY” starts in this location, the direction in which they’re going– are they going across or down, the two allow about directions–and the word itself. In this case, RITZY. That seems like a good representation for a legal play.

I’m not quite sure yet what the representation of a position or a direction should be, but that’s easy enough.

A score–we’ll have some function to compute the score.

For letters, we can have a dictionary that says the value of Z is 10.

For plays we’ll need some function to compute that.

For the bonus squares, we’ll need some mapping from a position on the board to double word or triple letter or whatever.

A dictionary is a set of words.

The blank letter–well, we said letters were strings, so that’s probably okay. We could use the string space or the string underscore, to represent the blank. Then it’s dealing with it that will be an issue later on. Now, I’m a little bit worried about blanks, because in poker Jokers(joker 纸牌百搭;纸牌中可当任何点数用的一张) were easy. We just said, replace them by any card and just deal with all the possibilities. Our routines are fast enough that we could probably deal with them all. Here I’m pretty confident we can make it fast enough that that approach will work, but it doesn’t quite work because not only do we have to try all possibilities for the letter, but the scoring rules are actually different. When you use a blank instead of a letter, you don’t get the letter scores for that blank. We’ll have to have scoring know about blanks and not just know about filling things in. That’ll be a complication. But overall I went through all the concepts. I’ve got an implementation for both.

《D o C P》学习笔记(6 - 1)Word Games_第2张图片

Some of them are functions that I don’t quite know how to do, but I don’t see anything that looks like a show stopper. I think I can go ahead. The difficulty then is not that I have to invent something new in order to solve one of the problems.

The difficulty is just that there’s so much.

When faced with a problem of this size or problems can be much larger, the notion(概念;观念) of pacing(领先于) is an important one.

What do I mean by that? It means I want to attack this, and I know I’m not going to solve it all at once. I’m not just going to sit down for 20 minutes and knock out(淘汰;击败;出局) the whole problem. It’s going to be a lot longer than that.

I want to have pacing in that I have intermediate(中间的) goals along the way where I can say, okay, now I’m going focus on one part of the problem, and I’m going to get that done. Then when I’m done with that part, then I can move on to the next part.

If you don’t have that pacing, you can lose your focus. You can get discouraged that there’s so much left to do. But if you break it up into bite-sized(很小的) pieces, then you can say, okay, I’m almost there. I just have to finish a little bit more, and now this piece will be done, and then I can move on to the next piece.

The first piece I’m going to look at is finding words from a hand. In other words, I’m going ignore the whole board. I’m going to say pretend the board isn’t there and pretend all we have is the hand, and we have the dictionary, a set of legal words. I want to know out of(由于;用…(材料);得自(来源)) that hand, what words in the dictionary can I make?

04 Finding Words

Let’s get started. The first thing I need is to come up with a dictionary of all the words.

Now, we’ve created a small file with about 4,000 words in it, called “word4k.txt.”

Let’s take that file, read it, convert it to uppercase, because Scrabble(乱摸;扒寻) with Words with Friends use only uppercase letters, split it into a list of words, assign that to a global variable– we’ll call it WORDS and put it in all uppercase, just make sure that it stands out. Let’s make this a set so that access to it is easy. We can figure out very quickly whether a word is in the dictionary. Okay, so now we’re done.

(补充内容: file() 方法简介
file() 方法的别名是 open() ,是内置函数,用于创建 1 个 file 对象,我猜测 file() 方法只在 Python 2.x 中出现,Python 3.x 中应该没有,时间有限,不去深究了。
)

WORDS = set(file('words4k.txt').read().upper().split())

We have our words. Then I want to find all the words within a hand. So the hand will be seven letters, and I want to find all the words of seven letters or less that can be made out of those letters. I’m going start with a very straightforward approach, and then we’re going to refine(提炼;改善) it over time. Here is what I’ve done:
(关于为什么最多是 7 个字母?
去看下最前面的第 1 张图片,图片最下面的 1 手牌中,最多容纳 7 个字母。
)

def find_words(hand):
    "Find all words that can be made from the letters in hand."
    results = set()
    for a in hand:
        if a in WORDS: results.add(a)
        for b in removed(hand, a):
            w = a+b
            if w in WORDS: results.add(w)
            for c in removed(hand, w):
                w = a+b+c
                if w in WORDS: results.add(w)
                for d in removed(hand, w):
                    w = a+b+c+d
                    if w in WORDS: results.add(w)
                    for e in removed(hand, w):
                        w = a+b+c+d+e
                        if w in WORDS: results.add(w)
                        for f in removed(hand, w):
                            w = a+b+c+d+e+f
                            if w in WORDS: results.add(w)
                            for g in removed(hand, w):
                                w = a+b+c+d+e+f+g
                                if w in WORDS: results.add(w)

    return results

(把 removed 函数的几次调用结果拿出来,应该会有助于理解此函数的作用:

>>> removed('letter', 'l')
'etter'
>>> removed('letter', 't')
'leter'
>>> removed('letter', 'set')
'lter'
>>> removed('letter', 'setter')
'l'

)

I haven’t worried about repeating myself and about making the code long. I just wanted to make it straightforward. Then I said, the first letter a can be any letter in the hand. If that’s a word, then go ahead and add that to my set of results. I start off with an empty set of results, and I’m going to add as I go. Otherwise, b can be any letter in the result of removing a from the hand. Now the word that I’m building up is a + b–two-letter word. If that’s a word, add it. Otherwise, c can be any letter in the hand without w in it– the remaining letters in the hand. A new word can is a + b + c. If that’s in WORDS, then add it, and we just keep on going through, adding a letter each time, checking to see if that’s in the WORDS, adding them up.

(补充
replace() 函数的用法简介:

>>> s
'cheese'
>>> s.replace('e', 'f')
'chffsf'
>>> s.replace('e', 'f', 1)
'chfese'

s.replace(a, b) 的含义是将字符串 s 中的所有的字符 a 全部替换为字符 b
s.replace(a, b, 1) 的含义是将字符串 s 中的第 1 次出现的字符 a 替换为字符 b

)

Here’s my definition of removed:
(注释中,有我对该函数的理解。)

# It takes a hand or a sequence of letters and then the letter or letters to remove.
def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    # 遍历 remove 字符串中的每 1 个字符 L
    for L in remove:
        # 将 letters 中第 1 次出现的 L 字符替换为空字符 '' ,即删除掉
        letters = letters.replace(L, '', 1)
    return letters

It takes a hand or a sequence of letters and then the letter or letters to remove. For each of those letters just replace the letter in the collection of letters with the empty string and do that exactly once, so don’t remove all of them. Then return the remaining letters.

Does it work? Well, if I find words with this sequence of letters in my hand, it comes back with this list.

>>> find_words('LETTERS')
set(['ERS', 'RES', 'RET', 'ERE', 'STREET', 'ELS', 'REE', 'SET', 'LETTERS', 'SER', 'TEE', 'RE', 'SEE', 'SEL', 'TET', 'EL', 'REST', 'ELSE', 'LETTER', 'ET', 'ES', 'ER', 'LEE', 'EEL', 'TREE', 'TREES', 'LET', 'TEL', 'TEST'])
>>>

That looks pretty good. It’s hard for me to verify(核实;证明) right now that I found everything that’s in my dictionary, but it looks good, and I did a little bit of poking around(poke around 闲逛) in the dictionary for likely things, and all the words I could think of that weren’t in this set were not in the dictionary. That’s why they weren’t included. That’s looks pretty good. I’m going to be doing a lot of work here, and I’m going to be modifying this function and changing it. I’d like to have a better set of tests than just one test.

05 Regression Tests

(regression tests 回归测试)

I made up a bigger test. I made up a dictionary of hands that map from a hand to a set of words that I found.

hands = {  ## Regression test
    'ABECEDR': set(['BE', 'CARE', 'BAR', 'BA', 'ACE', 'READ', 'CAR', 'DE', 'BED', 'BEE',
         'ERE', 'BAD', 'ERA', 'REC', 'DEAR', 'CAB', 'DEB', 'DEE', 'RED', 'CAD',
         'CEE', 'DAB', 'REE', 'RE', 'RACE', 'EAR', 'AB', 'AE', 'AD', 'ED', 'RAD',
         'BEAR', 'AR', 'REB', 'ER', 'ARB', 'ARC', 'ARE', 'BRA']),
    'AEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'TIE', 'SIN', 'TAR', 'TAS',
         'RAN', 'SIT', 'SAE', 'RIN', 'TAE', 'RAT', 'RAS', 'TAN', 'RIA', 'RISE',
         'ANESTRI', 'RATINES', 'NEAR', 'REI', 'NIT', 'NASTIER', 'SEAT', 'RATE',
         'RETAINS', 'STAINER', 'TRAIN', 'STIR', 'EN', 'STAIR', 'ENS', 'RAIN', 'ET',
         'STAIN', 'ES', 'ER', 'ANE', 'ANI', 'INS', 'ANT', 'SENT', 'TEA', 'ATE',
         'RAISE', 'RES', 'RET', 'ETA', 'NET', 'ARTS', 'SET', 'SER', 'TEN', 'RE',
         'NA', 'NE', 'SEA', 'SEN', 'EAST', 'SEI', 'SRI', 'RETSINA', 'EARN', 'SI',
         'SAT', 'ITS', 'ERS', 'AIT', 'AIS', 'AIR', 'AIN', 'ERA', 'ERN', 'STEARIN',
         'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE', 'AI', 'IS', 'IT',
         'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'IRE', 'ARS', 'ART', 'ARE']),
    'DRAMITC': set(['DIM', 'AIT', 'MID', 'AIR', 'AIM', 'CAM', 'ACT', 'DIT', 'AID', 'MIR',
         'TIC', 'AMI', 'RAD', 'TAR', 'DAM', 'RAM', 'TAD', 'RAT', 'RIM', 'TI',
         'TAM', 'RID', 'CAD', 'RIA', 'AD', 'AI', 'AM', 'IT', 'AR', 'AT', 'ART',
         'CAT', 'ID', 'MAR', 'MA', 'MAT', 'MI', 'CAR', 'MAC', 'ARC', 'MAD', 'TA',
         'ARM']),
    'ADEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'DEAR', 'TIE', 'SIN', 'RAD', 
         'TAR', 'TAS', 'RAN', 'SIT', 'SAE', 'SAD', 'TAD', 'RE', 'RAT', 'RAS', 'RID',
         'RIA', 'ENDS', 'RISE', 'IDEA', 'ANESTRI', 'IRE', 'RATINES', 'SEND',
         'NEAR', 'REI', 'DETRAIN', 'DINE', 'ASIDE', 'SEAT', 'RATE', 'STAND',
         'DEN', 'TRIED', 'RETAINS', 'RIDE', 'STAINER', 'TRAIN', 'STIR', 'EN',
         'END', 'STAIR', 'ED', 'ENS', 'RAIN', 'ET', 'STAIN', 'ES', 'ER', 'AND',
         'ANE', 'SAID', 'ANI', 'INS', 'ANT', 'IDEAS', 'NIT', 'TEA', 'ATE', 'RAISE',
         'READ', 'RES', 'IDS', 'RET', 'ETA', 'INSTEAD', 'NET', 'RED', 'RIN',
         'ARTS', 'SET', 'SER', 'TEN', 'TAE', 'NA', 'TED', 'NE', 'TRADE', 'SEA',
         'AIT', 'SEN', 'EAST', 'SEI', 'RAISED', 'SENT', 'ADS', 'SRI', 'NASTIER',
         'RETSINA', 'TAN', 'EARN', 'SI', 'SAT', 'ITS', 'DIN', 'ERS', 'DIE', 'DE',
         'AIS', 'AIR', 'DATE', 'AIN', 'ERA', 'SIDE', 'DIT', 'AID', 'ERN',
         'STEARIN', 'DIS', 'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE',
         'AD', 'AI', 'IS', 'IT', 'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'ID', 'ARS',
         'ART', 'ANTIRED', 'ARE', 'TRAINED', 'RANDIEST', 'STRAINED', 'DETRAINS']),
    'ETAOIN': set(['ATE', 'NAE', 'AIT', 'EON', 'TIN', 'OAT', 'TON', 'TIE', 'NET', 'TOE',
         'ANT', 'TEN', 'TAE', 'TEA', 'AIN', 'NE', 'ONE', 'TO', 'TI', 'TAN',
         'TAO', 'EAT', 'TA', 'EN', 'AE', 'ANE', 'AI', 'INTO', 'IT', 'AN', 'AT',
         'IN', 'ET', 'ON', 'OE', 'NO', 'ANI', 'NOTE', 'ETA', 'ION', 'NA', 'NOT',
         'NIT']),
    'SHRDLU': set(['URD', 'SH', 'UH', 'US']),
    'SHROUDT': set(['DO', 'SHORT', 'TOR', 'HO', 'DOR', 'DOS', 'SOUTH', 'HOURS', 'SOD',
         'HOUR', 'SORT', 'ODS', 'ROD', 'OUD', 'HUT', 'TO', 'SOU', 'SOT', 'OUR',
         'ROT', 'OHS', 'URD', 'HOD', 'SHOT', 'DUO', 'THUS', 'THO', 'UTS', 'HOT',
         'TOD', 'DUST', 'DOT', 'OH', 'UT', 'ORT', 'OD', 'ORS', 'US', 'OR',
         'SHOUT', 'SH', 'SO', 'UH', 'RHO', 'OUT', 'OS', 'UDO', 'RUT']),
    'TOXENSI': set(['TO', 'STONE', 'ONES', 'SIT', 'SIX', 'EON', 'TIS', 'TIN', 'XI', 'TON',
         'ONE', 'TIE', 'NET', 'NEXT', 'SIN', 'TOE', 'SOX', 'SET', 'TEN', 'NO',
         'NE', 'SEX', 'ION', 'NOSE', 'TI', 'ONS', 'OSE', 'INTO', 'SEI', 'SOT',
         'EN', 'NIT', 'NIX', 'IS', 'IT', 'ENS', 'EX', 'IN', 'ET', 'ES', 'ON',
         'OES', 'OS', 'OE', 'INS', 'NOTE', 'EXIST', 'SI', 'XIS', 'SO', 'SON',
         'OX', 'NOT', 'SEN', 'ITS', 'SENT', 'NOS'])}

The idea here is that this test is not so much proving that I’ve got the right answer, because I don’t know for sure that this is the right answers. Rather, this is what we call a regression test, meaning as we change our program we want to make sure that we haven’t broken any of these–that we haven’t made changes to our functions.

Even if I don’t know this is exactly the right set, I want to know when I made a change, have I changed the result here. I’ll be able to rerun this and say, have we done exactly the same thing. I’ll also be able to time(测定…的时间) the results of running these various hands and see if we can make our function faster. Here is my list of hands. I’ve got eight hands.

Then I did some further tests here.

def test_words():
    assert removed('LETTERS', 'L') == 'ETTERS'
    assert removed('LETTERS', 'T') == 'LETERS'
    assert removed('LETTERS', 'SET') == 'LTER'
    assert removed('LETTERS', 'SETTER') == 'L'
    t, results = timedcall(map, find_words, hands)
    for ((hand, expected), got) in zip(hands.items(), results):
        assert got == expected, "For %r: got %s, expected %s (diff %s)" % (
            hand, got, expected, expected ^ got)
    return t

timedcall(map, find_words, hands)
0.5527249999

I’m testing removing letters–got all those right. Then I’m going through the hands, and I’m using my timedcall() function that we build last time. That returnsin lapsed(流失的;堕落的) time and a set of results. I make sure all the results are what I expected. Then I return the time elapsed for finding all the words in those eight hands.

It turns out it takes half a second. That kind of worries me. That doesn’t sound very good. Sure, if I was playing Scrabble with a friend and they reply in a half second, that’d be pretty good. Much better than me, for example. In this game here it says that I haven’t replied to my friend Ken in 22 hours. This is a lot better, but still, if we’re going to be doing a lot of work and trying to find the best possible play, half a second to evaluate eight hands– that doesn’t seem fast enough.

Why is find_words() so slow? One thing is that it’s got a lot of nested loops, and it always does all of them. A lot of that is going to be wasteful. For example, let’s say the first two letters in the hand were z and q. At the very start here w is z + q, and now I loop through all the other combinations of all the other letters in the hand trying to find words that start with z + q, but there aren’t any words in the dictionary that start with zq. As soon as I got here, I should be able to figure that out and not do all of the rest of these nested loops.
(
find_words() 函数为什么这么慢?
比方说,手里有 z q 开头的两个字母,但是字典里面没有 zq 开头的单词,但是上述函数的内部仍然会不断地进入内层的循环,这样的情况下,效率非常低下。
)

06 Readwordlist

What I’m going to do is introduce a new concept that we didn’t see before in our initial listing of the concepts, but which is an important one–the notion of a prefix of a word. It’s important only for efficiency and not for correctness–that’s why it didn’t show up the first time. The idea is that given a word there are substrings, which are prefixes of the word.

The empty string is such a prefix. Just W is a prefix. W-O is a prefix. W-O-R is a prefix.

Now, we always have to decide what we want to do with the endpoints. I think for the way I want to use it I do want to include the empty string as a valid prefix, but I think I don’t want to include the entire string W-O-R-D. I’m not going to count that as a prefix of the word. That is the word. I’m going to define this function prefixes(word). It’s pretty straightforward. Just iterate through the range, and the prefixes of W-O-R-D are the empty string and these three longer strings. Now here’s the first bit that I want you to do for me. Reading in our list of words from the dictionary is a little bit complicated in that we want to compute two things–a set of words and a set of prefixes for all the words in the dictionary. The set together of prefixes for each word–union all of those together. I’m going to put that together into a function readwordlist(), which takes the file name and returns these two sets. I want you to write the code for that function here.

06 Readwordlist (answer)

(我的答案:

def readwordlist(filename):
    """Read the words from a file and return a set of the words 
    and a set of the prefixes."""
    file = open(filename) # opens file
    text = file.read()    # gets file into string
    # your code here

    wordset = set(text.split())

    prefixset = []
    for word in wordset:
        prefixset += prefixes(word)
    prefixset = set(prefixset)

    return wordset, prefixset

)

Here’s my answer. The wordset is just like before.

def readwordlist(filename):
    wordset = set(file(filename).read().upper().split())
    # 下面的 1 行代码,实际上是遍历了 wordset 中的每 1 个单词 word ;
    # 然后对于每 1 个单词 word ,遍历其中的所有前缀;
    # 总的来说,也就是将 wordset 中的所有单词 word 的所有前缀,全部收集起来,放进 1 个集合。
    prefixset = set(p for word in wordset for p in prefixes(word))  
    return wordset, prefixset

Read the file, uppercase it, and split it. In the prefixset, we go through each word in the wordset and then each prefix of the word, and collect that set p of prefixes and then return them. Now let’s see what these prefixes can do for us.
( PREFIXES 即上面的 prefixset 。
下面的代码的每 1 个循环中,都加入了 1 行 if a/w not in PREFIXES: continue ,意思是,手里的牌取出某1张a,若不在 PREFIXES 中,则不再继续往内层循环判断,跳出本次循环, continue 到下一轮循环,取出另一张牌a;若这张牌a在 PREFIXES 中,则取出第2张牌b,若这2张牌a+b的组合在PREFIXES中,则继续往内层循环判断,若这2张牌a+b的组合不在 PREFIXES 中,则不再继续往内层循环判断,跳出本次循环, continue 到下一轮循环,取出另一张牌b。。。
)

def find_words(letters):
    results = set()
    for a in letters:
        if a in WORDS: results.add(a)
        if a not in PREFIXES: continue
        for b in removed(letters, a):
            w = a+b
            if w in WORDS: results.add(w)
            if w not in PREFIXES: continue
            for d in removed(letters, w)
                w = a+b+c+d
                if w in WORDS: results.add(w)
                if w not in PREFIXES: continue
                for e in removed(letters, w):
                    w = a+b+c+d+e
                    if w in WORDS: results.add(w)
                    if w not in PREFIXES: continue
                    for f in removed(letters, w):
                        w = a+b+c+d+e+f
                        if w in WORDS: result.add(w)
                        if w not in PREFIXES: continue
    return results

I can define a new version of find_words(), and what this one is it looks exactly like the one before except what we do at each level of the loop is we add one statement that says, if the word that we built up so far is not one of the prefixes of a word in the dictionary, then there’s no sense doing any of these nested loops. We can continue onto the next iteration of the current loop, and that’s what the continue statement says is don’t do anything below, rather go back to the for loop that we’re nested in and go through the next iteration of that for loop. Normally, I don’t like the continue statement and normally, instead of saying if w not in prefixes continue, I would’ve said if w in prefixes then do this, but that would’ve introduced another level of indentation(凹进) for each of these seven levels and I’d be running off the edge of the page, so here I grudgingly(不情愿的;勉强的) accepted the continue statement. The code looks just like before. I’ve just added seven lines. The exact same line indented into different levels goes all the way through a, b, c, d, e, f, and g. Now if I run the test_words function again, I get not half a second but 0.003 seconds. That’s nice and fast. That’s 150 times faster than before, 2000 hands per second. The function is long and ugly, but it’s fast enough. But still I’d like to clean it up. I don’t like repeating myself with code like this. I don’t like that this only works exactly for seven letters. I know that I may need more than that because there’s only seven letters in a hand, but sometimes you combine letters in a hand with letters on the board. This function won’t be able to deal with it.

07 Extend Prefix

In order to improve this function I have to ask myself, “What do each of the loops do?” “And can I implement that in another way other than with nested loops?” The answer seems to be that each of the nested loops is incrementing the word by one letter, from abcd to abcde, and then it’s checking to see if we have a new word, and it’s checking to see if we should stop if what we have so far is not a prefix of any word in the dictionary. If I don’t want to have nested loops what I want instead is a recursive procedure. I’m going to have the same structure as before. I’m going to start off by initializing the results to be the empty set, and then I’m going to have some loops that add elements to that set, and then I’m going to return the results that I have built up. Then I’m going to start the loops in motion(in motion 在开动中;在运转中) by making a call to this recursive routine. What I want you to do is fill in the code here.

题目:

# -----------------
# User Instructions
# 
# Write a function, extend_prefix, nested in find_words,
# that checks to see if the prefix is in WORDS and 
# adds that to results if it is.
#
# If not, your function should check to see if the prefix
# is in PREFIXES, and if it is should recursively add letters
# until the prefix is no longer valid.

def prefixes(word):
    "A list of the initial sequences of a word, not including the complete word."
    return [word[:i] for i in range(len(word))]

def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    for L in remove:
        letters = letters.replace(L, '', 1)
    return letters

def readwordlist(filename):
    file = open(filename)
    text = file.read().upper()
    wordset = set(word for word in text.splitlines())
    prefixset = set(p for word in wordset for p in prefixes(word))
    return wordset, prefixset

WORDS, PREFIXES = readwordlist('words4k.txt')

def find_words(letters):
    results = set()

    def extend_prefix(w, letters):
        if w in WORDS: ###Your code here.
        if w not in PREFIXES: return
        for L in letters:
            ###Your code here.

    extend_prefix('', letters)
    return results

07 Extend Prefix (answer)

(我花了一点时间,没有想出来,还是来看看Peter的答案吧。)

def find_words(letters):
    results = set()

    def extend_prefix(w, letters):
        if w in WORDS: # your code here
        if w not in PREFIXES: return
        for L in letters:
            extend_prefix(w+L, removed(letters, L))

    extend_prefix('', letters)
    return results

( extend_prefix(w, letters) 函数的第 1 个 if 语句后面不完整,但是,视频中,Peter仍然可以正常运行,我不知道为什么。)
The answer is here we’re doing a nested loop, and the way we do a nested loop is the way we did the first loop– by calling extend prefix. What is the word so far that we’ve built up? It’s the w we had before, and now we’re looping through the letters, so we want to add the letter L to the end of that. Now what are the remaining letters we have in order to add into that word? That’s the letters we had before with L removed. That’s all there is to it. Now if we test words again, the speed is almost the same–0.003 something, but the function is more concise(简明的;简洁的), more readable, and more general in that it will take any number of letters. Now, there are a lot of variations(variation 变化;变动;变量) on this. If you type “import this” into a Python interpreter you get out a little set of aphorisms(aphorism 格言;警句), almost like a poem, called “The Zen of Python” by Tim Peters. One of them says “Flat is better than nest.” We can take out this nested function. Instead of having it in here, we can make it flat like that.

def find_words(letters):
    return extend_prefix('', letters, set())

def extend_prefix(pre, letters, results):
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in letters:
            extend_prefix(pre+L, letters.replace(L, '', 1), results)
    return results

I’ve also made a small change here in that removed works when you’re removing any number of letters. Here if I only want to remove one letter, I can just call the built-in method letters.replace directly. When we call test_words() on this just to make sure we haven’t broken anything, it verifies okay, and the speed is about the same. You can keep it like this. This is a good approach. I’m pretty happy with this one. But notice what we’re doing here–find_words() is sort of a wrapper to extend_prefix(), which takes letters and adds in two more extra arguments– the prefix that we found so far and the results that we want to accumulate(积累;逐渐增加) the results into. Instead of having one function call a second, we could do this all in one function if we made these two extra things be optional arguments.

def find_words(letters, pre='', results=None)::
    if results is None: results = set()
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in letters:
            find_words(letters.replace(L, '', 1), pre+L, results)
    return results

We could do it like that–where we just have one function find_words(), which takes letters, and then the optional prefix of our end results to accumulate into. Now in terms of pacing, let’s stop here. Let’s congratulate ourselves and say we’ve done our job. We’ve come up with find_words(), and we said for any set of letters I can find all the words in the dictionary that correspond to that hand of letters. Furthermore(此外;而且), I can do that at a speed of 2000 hands per second, which seems pretty good. We’ve achieved our first milestone. Now we should think–first I guess we should relax, congratulate ourselves, have a drink or whatever it is you need to do, and then when you’re ready to come back then we can start the next leg of the journey.

08 Add Suffixes

(suffix 后缀)
Let’s go back to our list of concepts and say what have we done so far and what’s next? We think we did a good job with the dictionary, and we did a good job with our hands here. In terms of legal play, well, we’ve got words, so we’re sort of part way there, but we haven’t hooked up(hook up 连接) the words to the board. Maybe that’s the next thing to do–is say, let’s do a better job of hooking up the letters and the hand with the words in the dictionary and placing them in the right place on the board. I don’t want to have to deal with the whole board. Let’s just deal with one letter at a time. Let’s say there is one letter on the board, and I have my hand and I want to say I can play W-O-R and make up a word. Just to make it a little bit more satisfying than one letter, let’s say that there is a set of possible letters that are already on the board, and we can place our words anywhere, but we’re not going to worry about placing some letters here and having them run off(偷走;流失) the board. We’re not going to worry about placing letters here and having them run into(快速进入…;加起来) another letter. We’re just going to say what words can I make out of my hand that connect with either a D or an X or an L. So I need a strategy for that. Let’s just consider one letter at a time. What I need to find is all the plays that take letters in my hand– [HANDSIE] let’s say those are the seven letters in my hand. Take those letters and combine them with a D and find all the words. What can those words consist of? They can have some prefix here, which can be any prefix in our set of prefixes that come solely(唯一地;仅仅) from the letters in my hand. Then the letter D that’s already there doesn’t have to come from my hand. Then some more letters. I’ll think of this as a prefix plus a suffix where I make sure that I know that D is already there. Here is word-plays–takes a hand and a set of letters that are on the board, and it’s going t o find all possible words that can be made from that hand, connecting to exactly one of the letters on the board. We’re going break it up into a prefix that comes only from the hand, then the letter from the board, and then the remainder of the suffix that comes from the hand. The same structure as we had before–we start off with an empty set of result words. In the end we’re going to return that set of result words. Then we’re going to go through all the possible prefixes that come exclusively(唯一地) from the hand, then the possible letters on the board, and add a suffix to the prefix plus the letter on the board from the letters in the hand except for we can no longer use the letters in the prefix. Find_prefixes is just like find_words except we’re collecting things that are in the prefixes rather than things that are in the list of words. Now I want you to write add_suffixes. Given a hand, a prefix that we found before, results set that you want to put things into, find me all the words that can be made by adding on letters from the hand into the prefix to create words.
(实现这个函数时,不考虑单词超出板子边界的问题,不考虑单词会盖过其他字母的问题。)
题目:

# -----------------
# User Instructions
# 
# Write a function, add_suffixes, that takes as input a hand, a prefix we
# have already found, and a result set we'd like to add to, and returns 
# the result set we have added to. For testing, you can assume that you 
# have access to a file called 'words4k.txt'

import time

def prefixes(word):
    "A list of the initial sequences of a word, not including the complete word."
    return [word[:i] for i in range(len(word))]

def readwordlist(filename):
    file = open(filename)
    text = file.read().upper()
    wordset = set(word for word in text.splitlines())
    prefixset = set(p for word in wordset for p in prefixes(word))
    return wordset, prefixset

WORDS, PREFIXES = readwordlist('words4k.txt')

def find_words(letters, pre='', results=None):
    if results is None: results = set()
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in letters:
            find_words(letters.replace(L, '', 1), pre+L, results)
    return results

def word_plays(hand, board_letters):
    "Find all word plays from hand that can be made to abut with a letter on board."
    # Find prefix + L + suffix; L from board_letters, rest from hand
    results = set()
    for pre in find_prefixes(hand, '', set()):
        for L in board_letters:
            add_suffixes(removed(hand, pre), pre+L, results)
    return results

def find_prefixes(hand, pre='', results=None):
    "Find all prefixes (of words) that can be made from letters in hand."
    if results is None: results = set()
    if pre in PREFIXES:
        results.add(pre)
        for L in hand:
            find_prefixes(hand.replace(L, '', 1), pre+L, results)
    return results

def add_suffixes(hand, pre, results):
    """Return the set of words that can be formed by extending pre with letters in hand."""
    ###Your code here.


def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    for L in remove:
        letters = letters.replace(L, '', 1)
    return letters

def timedcall(fn, *args):
    "Call function with args; return the time in seconds and result."
    t0 = time.clock()
    result = fn(*args)
    t1 = time.clock()
    return t1-t0, result

hands = {  ## Regression test
    'ABECEDR': set(['BE', 'CARE', 'BAR', 'BA', 'ACE', 'READ', 'CAR', 'DE', 'BED', 'BEE',
         'ERE', 'BAD', 'ERA', 'REC', 'DEAR', 'CAB', 'DEB', 'DEE', 'RED', 'CAD',
         'CEE', 'DAB', 'REE', 'RE', 'RACE', 'EAR', 'AB', 'AE', 'AD', 'ED', 'RAD',
         'BEAR', 'AR', 'REB', 'ER', 'ARB', 'ARC', 'ARE', 'BRA']),
    'AEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'TIE', 'SIN', 'TAR', 'TAS',
         'RAN', 'SIT', 'SAE', 'RIN', 'TAE', 'RAT', 'RAS', 'TAN', 'RIA', 'RISE',
         'ANESTRI', 'RATINES', 'NEAR', 'REI', 'NIT', 'NASTIER', 'SEAT', 'RATE',
         'RETAINS', 'STAINER', 'TRAIN', 'STIR', 'EN', 'STAIR', 'ENS', 'RAIN', 'ET',
         'STAIN', 'ES', 'ER', 'ANE', 'ANI', 'INS', 'ANT', 'SENT', 'TEA', 'ATE',
         'RAISE', 'RES', 'RET', 'ETA', 'NET', 'ARTS', 'SET', 'SER', 'TEN', 'RE',
         'NA', 'NE', 'SEA', 'SEN', 'EAST', 'SEI', 'SRI', 'RETSINA', 'EARN', 'SI',
         'SAT', 'ITS', 'ERS', 'AIT', 'AIS', 'AIR', 'AIN', 'ERA', 'ERN', 'STEARIN',
         'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE', 'AI', 'IS', 'IT',
         'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'IRE', 'ARS', 'ART', 'ARE']),
    'DRAMITC': set(['DIM', 'AIT', 'MID', 'AIR', 'AIM', 'CAM', 'ACT', 'DIT', 'AID', 'MIR',
         'TIC', 'AMI', 'RAD', 'TAR', 'DAM', 'RAM', 'TAD', 'RAT', 'RIM', 'TI',
         'TAM', 'RID', 'CAD', 'RIA', 'AD', 'AI', 'AM', 'IT', 'AR', 'AT', 'ART',
         'CAT', 'ID', 'MAR', 'MA', 'MAT', 'MI', 'CAR', 'MAC', 'ARC', 'MAD', 'TA',
         'ARM']),
    'ADEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'DEAR', 'TIE', 'SIN', 'RAD', 
         'TAR', 'TAS', 'RAN', 'SIT', 'SAE', 'SAD', 'TAD', 'RE', 'RAT', 'RAS', 'RID',
         'RIA', 'ENDS', 'RISE', 'IDEA', 'ANESTRI', 'IRE', 'RATINES', 'SEND',
         'NEAR', 'REI', 'DETRAIN', 'DINE', 'ASIDE', 'SEAT', 'RATE', 'STAND',
         'DEN', 'TRIED', 'RETAINS', 'RIDE', 'STAINER', 'TRAIN', 'STIR', 'EN',
         'END', 'STAIR', 'ED', 'ENS', 'RAIN', 'ET', 'STAIN', 'ES', 'ER', 'AND',
         'ANE', 'SAID', 'ANI', 'INS', 'ANT', 'IDEAS', 'NIT', 'TEA', 'ATE', 'RAISE',
         'READ', 'RES', 'IDS', 'RET', 'ETA', 'INSTEAD', 'NET', 'RED', 'RIN',
         'ARTS', 'SET', 'SER', 'TEN', 'TAE', 'NA', 'TED', 'NE', 'TRADE', 'SEA',
         'AIT', 'SEN', 'EAST', 'SEI', 'RAISED', 'SENT', 'ADS', 'SRI', 'NASTIER',
         'RETSINA', 'TAN', 'EARN', 'SI', 'SAT', 'ITS', 'DIN', 'ERS', 'DIE', 'DE',
         'AIS', 'AIR', 'DATE', 'AIN', 'ERA', 'SIDE', 'DIT', 'AID', 'ERN',
         'STEARIN', 'DIS', 'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE',
         'AD', 'AI', 'IS', 'IT', 'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'ID', 'ARS',
         'ART', 'ANTIRED', 'ARE', 'TRAINED', 'RANDIEST', 'STRAINED', 'DETRAINS']),
    'ETAOIN': set(['ATE', 'NAE', 'AIT', 'EON', 'TIN', 'OAT', 'TON', 'TIE', 'NET', 'TOE',
         'ANT', 'TEN', 'TAE', 'TEA', 'AIN', 'NE', 'ONE', 'TO', 'TI', 'TAN',
         'TAO', 'EAT', 'TA', 'EN', 'AE', 'ANE', 'AI', 'INTO', 'IT', 'AN', 'AT',
         'IN', 'ET', 'ON', 'OE', 'NO', 'ANI', 'NOTE', 'ETA', 'ION', 'NA', 'NOT',
         'NIT']),
    'SHRDLU': set(['URD', 'SH', 'UH', 'US']),
    'SHROUDT': set(['DO', 'SHORT', 'TOR', 'HO', 'DOR', 'DOS', 'SOUTH', 'HOURS', 'SOD',
         'HOUR', 'SORT', 'ODS', 'ROD', 'OUD', 'HUT', 'TO', 'SOU', 'SOT', 'OUR',
         'ROT', 'OHS', 'URD', 'HOD', 'SHOT', 'DUO', 'THUS', 'THO', 'UTS', 'HOT',
         'TOD', 'DUST', 'DOT', 'OH', 'UT', 'ORT', 'OD', 'ORS', 'US', 'OR',
         'SHOUT', 'SH', 'SO', 'UH', 'RHO', 'OUT', 'OS', 'UDO', 'RUT']),
    'TOXENSI': set(['TO', 'STONE', 'ONES', 'SIT', 'SIX', 'EON', 'TIS', 'TIN', 'XI', 'TON',
         'ONE', 'TIE', 'NET', 'NEXT', 'SIN', 'TOE', 'SOX', 'SET', 'TEN', 'NO',
         'NE', 'SEX', 'ION', 'NOSE', 'TI', 'ONS', 'OSE', 'INTO', 'SEI', 'SOT',
         'EN', 'NIT', 'NIX', 'IS', 'IT', 'ENS', 'EX', 'IN', 'ET', 'ES', 'ON',
         'OES', 'OS', 'OE', 'INS', 'NOTE', 'EXIST', 'SI', 'XIS', 'SO', 'SON',
         'OX', 'NOT', 'SEN', 'ITS', 'SENT', 'NOS'])}

def test_words():
    assert removed('LETTERS', 'L') == 'ETTERS'
    assert removed('LETTERS', 'T') == 'LETERS'
    assert removed('LETTERS', 'SET') == 'LTER'
    assert removed('LETTERS', 'SETTER') == 'L'
    t, results = timedcall(map, find_words, hands)
    for ((hand, expected), got) in zip(hands.items(), results):
        assert got == expected, "For %r: got %s, expected %s (diff %s)" % (
            hand, got, expected, expected ^ got)
    return t

print test_words()

08 Add Suffixes (answer)

( 没想出来,看看 Peter 的答案吧 )

def add_suffixes(hand, pre, results):
    return extend_prefix(pre, letters, result)

(另一个答案)

def add_suffixes(hand, pre, results):
    "Return the set of words that can be formed by extending pre with letters in hand."
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in add_suffixes(hand.replace(L, '', 1), pre+L, results)
    return results

The answer is here we’re doing a nested loop, and the way we do a nested loop is the way we did the first loop– by calling extend prefix. What is the word so far that we’ve built up? It’s the w we had before, and now we’re looping through the letters, so we want to add the letter L to the end of that. Now what are the remaining letters we have in order to add into that word? That’s the letters we had before with L removed. That’s all there is to it. Now if we test words again, the speed is almost the same–0.003 something, but the function is more concise, more readable, and more general in that it will take any number of letters. Now, there are a lot of variations on this. If you type “import this” into a Python interpreter you get out a little set of aphorisms, almost like a poem, called “The Zen of Python” by Tim Peters. One of them says “Flat is better than nest.” We can take out this nested function. Instead of having it in here, we can make it flat like that. I’ve also made a small change here in that removed works when you’re removing any number of letters. Here if I only want to remove one letter, I can just call the built-in method letters.replace directly. When we call test_words() on this just to make sure we haven’t broken anything, it verifies okay, and the speed is about the same. You can keep it like this. This is a good approach. I’m pretty happy with this one. But notice what we’re doing here–find_words() is sort of a wrapper to extend_prefix(), which takes letters and adds in two more extra arguments– the prefix that we found so far and the results that we want to accumulate the results into. Instead of having one function call a second, we could do this all in one function if we made these two extra things be optional arguments. We could do it like that–where we just have one function find_words(), which takes letters, and then the optional prefix of our end results to accumulate into. Now in terms of pacing, let’s stop here. Let’s congratulate ourselves and say we’ve done our job. We’ve come up with find_words(), and we said for any set of letters I can find all the words in the dictionary that correspond to that hand of letters. Furthermore, I can do that at a speed of 2000 hands per second, which seems pretty good. We’ve achieved our first milestone. Now we should think–first I guess we should relax, congratulate ourselves, have a drink or whatever it is you need to do, and then when you’re ready to come back then we can start the next leg of the journey.
(
explicitly 明白地;明确地
intersect 横断;相交
complicate 使复杂化;复杂的
stick with 继续做;跟着…
customize 定制;按规格改制
)

09 Longest Words

We can write some assertions here. Here we have some letters in my hand, seven letters, and some possible letters on the board, and here’s a long list of possibilities for plays I could make. We can already see that this would be useful for cheating–I mean, augmenting or studying your word game play. And to make it even more useful, let’s write a function that tells us what the longest possible words are. Given the definition of word play, write a definition of longest words.

题目:

# -----------------
# User Instructions
# 
# Write a function, longest_words, that takes as input a hand, and a set 
# of letters on the board, and returns all word plays, the longest first.
# For testing, you can assume that you have access to a file called 
# 'words4k.txt'

import time

def prefixes(word):
    "A list of the initial sequences of a word, not including the complete word."
    return [word[:i] for i in range(len(word))]

def readwordlist(filename):
    file = open(filename)
    text = file.read().upper()
    wordset = set(word for word in text.splitlines())
    prefixset = set(p for word in wordset for p in prefixes(word))
    return wordset, prefixset

WORDS, PREFIXES = readwordlist('words4k.txt')

def find_words(letters, pre='', results=None):
    if results is None: results = set()
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in letters:
            find_words(letters.replace(L, '', 1), pre+L, results)
    return results

def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    for L in remove:
        letters = letters.replace(L, '', 1)
    return letters

def word_plays(hand, board_letters):
    "Find all word plays from hand that can be made to abut with a letter on board."
    # Find prefix + L + suffix; L from board_letters, rest from hand
    results = set()
    for pre in find_prefixes(hand, '', set()):
        for L in board_letters:
            add_suffixes(removed(hand, pre), pre+L, results)
    return results

def find_prefixes(hand, pre='', results=None):
    "Find all prefixes (of words) that can be made from letters in hand."
    if results is None: results = set()
    if pre in PREFIXES:
        results.add(pre)
        for L in hand:
            find_prefixes(hand.replace(L, '', 1), pre+L, results)
    return results

def add_suffixes(hand, pre, results):
    """Return the set of words that can be formed by extending pre with letters in hand."""
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in hand:
            add_suffixes(hand.replace(L, '', 1), pre+L, results)
    return results

assert (word_plays('ADEQUAT', set('IRE')) ==
        set(['DIE', 'ATE', 'READ', 'AIT', 'DE', 'IDEA', 'RET', 'QUID', 'DATE', 'RATE',
             'ETA', 'QUIET', 'ERA', 'TIE', 'DEAR', 'AID', 'TRADE', 'TRUE', 'DEE',
             'RED', 'RAD', 'TAR', 'TAE', 'TEAR', 'TEA', 'TED', 'TEE', 'QUITE', 'RE',
             'RAT', 'QUADRATE', 'EAR', 'EAU', 'EAT', 'QAID', 'URD', 'DUI', 'DIT', 'AE',
             'AI', 'ED', 'TI', 'IT', 'DUE', 'AQUAE', 'AR', 'ET', 'ID', 'ER', 'QUIT',
             'ART', 'AREA', 'EQUID', 'RUE', 'TUI', 'ARE', 'QI', 'ADEQUATE', 'RUT']))

def longest_words(hand, board_letters):
    "Return all word plays, longest first."
    ###Your code here.


def timedcall(fn, *args):
    "Call function with args; return the time in seconds and result."
    t0 = time.clock()
    result = fn(*args)
    t1 = time.clock()
    return t1-t0, result

hands = {  ## Regression test
    'ABECEDR': set(['BE', 'CARE', 'BAR', 'BA', 'ACE', 'READ', 'CAR', 'DE', 'BED', 'BEE',
         'ERE', 'BAD', 'ERA', 'REC', 'DEAR', 'CAB', 'DEB', 'DEE', 'RED', 'CAD',
         'CEE', 'DAB', 'REE', 'RE', 'RACE', 'EAR', 'AB', 'AE', 'AD', 'ED', 'RAD',
         'BEAR', 'AR', 'REB', 'ER', 'ARB', 'ARC', 'ARE', 'BRA']),
    'AEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'TIE', 'SIN', 'TAR', 'TAS',
         'RAN', 'SIT', 'SAE', 'RIN', 'TAE', 'RAT', 'RAS', 'TAN', 'RIA', 'RISE',
         'ANESTRI', 'RATINES', 'NEAR', 'REI', 'NIT', 'NASTIER', 'SEAT', 'RATE',
         'RETAINS', 'STAINER', 'TRAIN', 'STIR', 'EN', 'STAIR', 'ENS', 'RAIN', 'ET',
         'STAIN', 'ES', 'ER', 'ANE', 'ANI', 'INS', 'ANT', 'SENT', 'TEA', 'ATE',
         'RAISE', 'RES', 'RET', 'ETA', 'NET', 'ARTS', 'SET', 'SER', 'TEN', 'RE',
         'NA', 'NE', 'SEA', 'SEN', 'EAST', 'SEI', 'SRI', 'RETSINA', 'EARN', 'SI',
         'SAT', 'ITS', 'ERS', 'AIT', 'AIS', 'AIR', 'AIN', 'ERA', 'ERN', 'STEARIN',
         'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE', 'AI', 'IS', 'IT',
         'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'IRE', 'ARS', 'ART', 'ARE']),
    'DRAMITC': set(['DIM', 'AIT', 'MID', 'AIR', 'AIM', 'CAM', 'ACT', 'DIT', 'AID', 'MIR',
         'TIC', 'AMI', 'RAD', 'TAR', 'DAM', 'RAM', 'TAD', 'RAT', 'RIM', 'TI',
         'TAM', 'RID', 'CAD', 'RIA', 'AD', 'AI', 'AM', 'IT', 'AR', 'AT', 'ART',
         'CAT', 'ID', 'MAR', 'MA', 'MAT', 'MI', 'CAR', 'MAC', 'ARC', 'MAD', 'TA',
         'ARM']),
    'ADEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'DEAR', 'TIE', 'SIN', 'RAD', 
         'TAR', 'TAS', 'RAN', 'SIT', 'SAE', 'SAD', 'TAD', 'RE', 'RAT', 'RAS', 'RID',
         'RIA', 'ENDS', 'RISE', 'IDEA', 'ANESTRI', 'IRE', 'RATINES', 'SEND',
         'NEAR', 'REI', 'DETRAIN', 'DINE', 'ASIDE', 'SEAT', 'RATE', 'STAND',
         'DEN', 'TRIED', 'RETAINS', 'RIDE', 'STAINER', 'TRAIN', 'STIR', 'EN',
         'END', 'STAIR', 'ED', 'ENS', 'RAIN', 'ET', 'STAIN', 'ES', 'ER', 'AND',
         'ANE', 'SAID', 'ANI', 'INS', 'ANT', 'IDEAS', 'NIT', 'TEA', 'ATE', 'RAISE',
         'READ', 'RES', 'IDS', 'RET', 'ETA', 'INSTEAD', 'NET', 'RED', 'RIN',
         'ARTS', 'SET', 'SER', 'TEN', 'TAE', 'NA', 'TED', 'NE', 'TRADE', 'SEA',
         'AIT', 'SEN', 'EAST', 'SEI', 'RAISED', 'SENT', 'ADS', 'SRI', 'NASTIER',
         'RETSINA', 'TAN', 'EARN', 'SI', 'SAT', 'ITS', 'DIN', 'ERS', 'DIE', 'DE',
         'AIS', 'AIR', 'DATE', 'AIN', 'ERA', 'SIDE', 'DIT', 'AID', 'ERN',
         'STEARIN', 'DIS', 'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE',
         'AD', 'AI', 'IS', 'IT', 'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'ID', 'ARS',
         'ART', 'ANTIRED', 'ARE', 'TRAINED', 'RANDIEST', 'STRAINED', 'DETRAINS']),
    'ETAOIN': set(['ATE', 'NAE', 'AIT', 'EON', 'TIN', 'OAT', 'TON', 'TIE', 'NET', 'TOE',
         'ANT', 'TEN', 'TAE', 'TEA', 'AIN', 'NE', 'ONE', 'TO', 'TI', 'TAN',
         'TAO', 'EAT', 'TA', 'EN', 'AE', 'ANE', 'AI', 'INTO', 'IT', 'AN', 'AT',
         'IN', 'ET', 'ON', 'OE', 'NO', 'ANI', 'NOTE', 'ETA', 'ION', 'NA', 'NOT',
         'NIT']),
    'SHRDLU': set(['URD', 'SH', 'UH', 'US']),
    'SHROUDT': set(['DO', 'SHORT', 'TOR', 'HO', 'DOR', 'DOS', 'SOUTH', 'HOURS', 'SOD',
         'HOUR', 'SORT', 'ODS', 'ROD', 'OUD', 'HUT', 'TO', 'SOU', 'SOT', 'OUR',
         'ROT', 'OHS', 'URD', 'HOD', 'SHOT', 'DUO', 'THUS', 'THO', 'UTS', 'HOT',
         'TOD', 'DUST', 'DOT', 'OH', 'UT', 'ORT', 'OD', 'ORS', 'US', 'OR',
         'SHOUT', 'SH', 'SO', 'UH', 'RHO', 'OUT', 'OS', 'UDO', 'RUT']),
    'TOXENSI': set(['TO', 'STONE', 'ONES', 'SIT', 'SIX', 'EON', 'TIS', 'TIN', 'XI', 'TON',
         'ONE', 'TIE', 'NET', 'NEXT', 'SIN', 'TOE', 'SOX', 'SET', 'TEN', 'NO',
         'NE', 'SEX', 'ION', 'NOSE', 'TI', 'ONS', 'OSE', 'INTO', 'SEI', 'SOT',
         'EN', 'NIT', 'NIX', 'IS', 'IT', 'ENS', 'EX', 'IN', 'ET', 'ES', 'ON',
         'OES', 'OS', 'OE', 'INS', 'NOTE', 'EXIST', 'SI', 'XIS', 'SO', 'SON',
         'OX', 'NOT', 'SEN', 'ITS', 'SENT', 'NOS'])}

def test_words():
    assert removed('LETTERS', 'L') == 'ETTERS'
    assert removed('LETTERS', 'T') == 'LETERS'
    assert removed('LETTERS', 'SET') == 'LTER'
    assert removed('LETTERS', 'SETTER') == 'L'
    t, results = timedcall(map, find_words, hands)
    for ((hand, expected), got) in zip(hands.items(), results):
        assert got == expected, "For %r: got %s, expected %s (diff %s)" % (
            hand, got, expected, expected ^ got)
    return t

print test_words()

09 Longest Words (answer)

(我的答案:

def longest_words(hand, board_letters):
    "Return all word plays, longest first."
    ###Your code here.
    return sorted(word_plays(hand, board_letters), key=len, reverse=True)

谈下我的思考过程吧:
首先是需要返回所有的 plays 的,显然就需要调用 word_plays(hand, board_letters) 。最长的优先,也就是说,需要按照长度排序,一时半会儿我没有想出怎么搞出这个排序方法,于是想到利用标准库的方法,sort() 或 sorted()方法,可以加入按照长度排序的参数、以及反转。
)
There we go–we just generate the words from word plays, and then we sort them by length in reverse order so that longest are first.

Peter 的答案:

def longest_words(hand, board_letters):
    words = word_plays(hand, board_letters)
    return sorted(words, key=len, reverse=True)

10 Word Score

题目:

# -----------------
# User Instructions
# 
# Write a function, word_score, that takes as input a word, and
# returns the sum of the individual letter scores of that word.
# For testing, you can assume that you have access to a file called 
# 'words4k.txt'


POINTS = dict(A=1, B=3, C=3, D=2, E=1, F=4, G=2, H=4, I=1, J=8, K=5, L=1, M=3, N=1, O=1, P=3, Q=10, R=1, S=1, T=1, U=1, V=4, W=4, X=8, Y=4, Z=10, _=0)

def word_score(word):
    "The sum of the individual letter point scores for this word."
    ###Your code here.

10 Word Score (answer)

(我的答案:

def word_score(word):
    total = 0
    for letter in word:
        total += POINTS[letter]
    return total

)
Here’s my solution. It’s pretty straightforward. We just sum the points of each letter for every letter in the word.
Peter 的答案:

def word_score(word):
    return sum(POINTS[L] for L in word)

(Peter的代码,简洁优雅,值得学习!!!)

11 Top N Hands

Now, I want you to write me a function called topn. Again, takes a hand and set the board letters and the number, which defaults to 10, and give me the n best words and highest scoring words, according to the word score function.
题目:

# -----------------
# User Instructions
# 
# Write a function, topn, that takes as input a hand, the set of
# current letters on the board, and a number n, and returns the 
# n best words we can play, sorted by word score.
# For testing, you can assume that you have access to a file called 
# 'words4k.txt'
#
# Enter your code at line 83.

import time

def prefixes(word):
    "A list of the initial sequences of a word, not including the complete word."
    return [word[:i] for i in range(len(word))]

def readwordlist(filename):
    file = open(filename)
    text = file.read().upper()
    wordset = set(word for word in text.splitlines())
    prefixset = set(p for word in wordset for p in prefixes(word))
    return wordset, prefixset

WORDS, PREFIXES = readwordlist('words4k.txt')

def removed(letters, remove):
    "Return a str of letters, but with each letter in remove removed once."
    for L in remove:
        letters = letters.replace(L, '', 1)
    return letters

def find_words(letters, pre='', results=None):
    if results is None: results = set()
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in letters:
            find_words(letters.replace(L, '', 1), pre+L, results)
    return results

def word_plays(hand, board_letters):
    "Find all word plays from hand that can be made to abut with a letter on board."
    # Find prefix + L + suffix; L from board_letters, rest from hand
    results = set()
    for pre in find_prefixes(hand, '', set()):
        for L in board_letters:
            add_suffixes(removed(hand, pre), pre+L, results)
    return results

def find_prefixes(hand, pre='', results=None):
    "Find all prefixes (of words) that can be made from letters in hand."
    if results is None: results = set()
    if pre in PREFIXES:
        results.add(pre)
        for L in hand:
            find_prefixes(hand.replace(L, '', 1), pre+L, results)
    return results

def add_suffixes(hand, pre, results):
    """Return the set of words that can be formed by extending pre with letters in hand."""
    if pre in WORDS: results.add(pre)
    if pre in PREFIXES:
        for L in hand:
            add_suffixes(hand.replace(L, '', 1), pre+L, results)
    return results

assert (word_plays('ADEQUAT', set('IRE')) ==
        set(['DIE', 'ATE', 'READ', 'AIT', 'DE', 'IDEA', 'RET', 'QUID', 'DATE', 'RATE',
             'ETA', 'QUIET', 'ERA', 'TIE', 'DEAR', 'AID', 'TRADE', 'TRUE', 'DEE',
             'RED', 'RAD', 'TAR', 'TAE', 'TEAR', 'TEA', 'TED', 'TEE', 'QUITE', 'RE',
             'RAT', 'QUADRATE', 'EAR', 'EAU', 'EAT', 'QAID', 'URD', 'DUI', 'DIT', 'AE',
             'AI', 'ED', 'TI', 'IT', 'DUE', 'AQUAE', 'AR', 'ET', 'ID', 'ER', 'QUIT',
             'ART', 'AREA', 'EQUID', 'RUE', 'TUI', 'ARE', 'QI', 'ADEQUATE', 'RUT']))

def longest_words(hand, board_letters):
    "Return all word plays, longest first."
    words = word_plays(hand, board_letters)
    return sorted(words, reverse=True, key=len)

POINTS = dict(A=1, B=3, C=3, D=2, E=1, F=4, G=2, H=4, I=1, J=8, K=5, L=1, M=3, N=1, O=1, P=3, Q=10, R=1, S=1, T=1, U=1, V=4, W=4, X=8, Y=4, Z=10, _=0)

def word_score(word):
    "The sum of the individual letter point scores for this word."
    return sum(POINTS[L] for L in word)

def topn(hand, board_letters, n=10):
    "Return a list of the top n words that hand can play, sorted by word score."
    ###Your code here.


def timedcall(fn, *args):
    "Call function with args; return the time in seconds and result."
    t0 = time.clock()
    result = fn(*args)
    t1 = time.clock()
    return t1-t0, result

hands = {  ## Regression test
    'ABECEDR': set(['BE', 'CARE', 'BAR', 'BA', 'ACE', 'READ', 'CAR', 'DE', 'BED', 'BEE',
         'ERE', 'BAD', 'ERA', 'REC', 'DEAR', 'CAB', 'DEB', 'DEE', 'RED', 'CAD',
         'CEE', 'DAB', 'REE', 'RE', 'RACE', 'EAR', 'AB', 'AE', 'AD', 'ED', 'RAD',
         'BEAR', 'AR', 'REB', 'ER', 'ARB', 'ARC', 'ARE', 'BRA']),
    'AEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'TIE', 'SIN', 'TAR', 'TAS',
         'RAN', 'SIT', 'SAE', 'RIN', 'TAE', 'RAT', 'RAS', 'TAN', 'RIA', 'RISE',
         'ANESTRI', 'RATINES', 'NEAR', 'REI', 'NIT', 'NASTIER', 'SEAT', 'RATE',
         'RETAINS', 'STAINER', 'TRAIN', 'STIR', 'EN', 'STAIR', 'ENS', 'RAIN', 'ET',
         'STAIN', 'ES', 'ER', 'ANE', 'ANI', 'INS', 'ANT', 'SENT', 'TEA', 'ATE',
         'RAISE', 'RES', 'RET', 'ETA', 'NET', 'ARTS', 'SET', 'SER', 'TEN', 'RE',
         'NA', 'NE', 'SEA', 'SEN', 'EAST', 'SEI', 'SRI', 'RETSINA', 'EARN', 'SI',
         'SAT', 'ITS', 'ERS', 'AIT', 'AIS', 'AIR', 'AIN', 'ERA', 'ERN', 'STEARIN',
         'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE', 'AI', 'IS', 'IT',
         'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'IRE', 'ARS', 'ART', 'ARE']),
    'DRAMITC': set(['DIM', 'AIT', 'MID', 'AIR', 'AIM', 'CAM', 'ACT', 'DIT', 'AID', 'MIR',
         'TIC', 'AMI', 'RAD', 'TAR', 'DAM', 'RAM', 'TAD', 'RAT', 'RIM', 'TI',
         'TAM', 'RID', 'CAD', 'RIA', 'AD', 'AI', 'AM', 'IT', 'AR', 'AT', 'ART',
         'CAT', 'ID', 'MAR', 'MA', 'MAT', 'MI', 'CAR', 'MAC', 'ARC', 'MAD', 'TA',
         'ARM']),
    'ADEINRST': set(['SIR', 'NAE', 'TIS', 'TIN', 'ANTSIER', 'DEAR', 'TIE', 'SIN', 'RAD', 
         'TAR', 'TAS', 'RAN', 'SIT', 'SAE', 'SAD', 'TAD', 'RE', 'RAT', 'RAS', 'RID',
         'RIA', 'ENDS', 'RISE', 'IDEA', 'ANESTRI', 'IRE', 'RATINES', 'SEND',
         'NEAR', 'REI', 'DETRAIN', 'DINE', 'ASIDE', 'SEAT', 'RATE', 'STAND',
         'DEN', 'TRIED', 'RETAINS', 'RIDE', 'STAINER', 'TRAIN', 'STIR', 'EN',
         'END', 'STAIR', 'ED', 'ENS', 'RAIN', 'ET', 'STAIN', 'ES', 'ER', 'AND',
         'ANE', 'SAID', 'ANI', 'INS', 'ANT', 'IDEAS', 'NIT', 'TEA', 'ATE', 'RAISE',
         'READ', 'RES', 'IDS', 'RET', 'ETA', 'INSTEAD', 'NET', 'RED', 'RIN',
         'ARTS', 'SET', 'SER', 'TEN', 'TAE', 'NA', 'TED', 'NE', 'TRADE', 'SEA',
         'AIT', 'SEN', 'EAST', 'SEI', 'RAISED', 'SENT', 'ADS', 'SRI', 'NASTIER',
         'RETSINA', 'TAN', 'EARN', 'SI', 'SAT', 'ITS', 'DIN', 'ERS', 'DIE', 'DE',
         'AIS', 'AIR', 'DATE', 'AIN', 'ERA', 'SIDE', 'DIT', 'AID', 'ERN',
         'STEARIN', 'DIS', 'TEAR', 'RETINAS', 'TI', 'EAR', 'EAT', 'TA', 'AE',
         'AD', 'AI', 'IS', 'IT', 'REST', 'AN', 'AS', 'AR', 'AT', 'IN', 'ID', 'ARS',
         'ART', 'ANTIRED', 'ARE', 'TRAINED', 'RANDIEST', 'STRAINED', 'DETRAINS']),
    'ETAOIN': set(['ATE', 'NAE', 'AIT', 'EON', 'TIN', 'OAT', 'TON', 'TIE', 'NET', 'TOE',
         'ANT', 'TEN', 'TAE', 'TEA', 'AIN', 'NE', 'ONE', 'TO', 'TI', 'TAN',
         'TAO', 'EAT', 'TA', 'EN', 'AE', 'ANE', 'AI', 'INTO', 'IT', 'AN', 'AT',
         'IN', 'ET', 'ON', 'OE', 'NO', 'ANI', 'NOTE', 'ETA', 'ION', 'NA', 'NOT',
         'NIT']),
    'SHRDLU': set(['URD', 'SH', 'UH', 'US']),
    'SHROUDT': set(['DO', 'SHORT', 'TOR', 'HO', 'DOR', 'DOS', 'SOUTH', 'HOURS', 'SOD',
         'HOUR', 'SORT', 'ODS', 'ROD', 'OUD', 'HUT', 'TO', 'SOU', 'SOT', 'OUR',
         'ROT', 'OHS', 'URD', 'HOD', 'SHOT', 'DUO', 'THUS', 'THO', 'UTS', 'HOT',
         'TOD', 'DUST', 'DOT', 'OH', 'UT', 'ORT', 'OD', 'ORS', 'US', 'OR',
         'SHOUT', 'SH', 'SO', 'UH', 'RHO', 'OUT', 'OS', 'UDO', 'RUT']),
    'TOXENSI': set(['TO', 'STONE', 'ONES', 'SIT', 'SIX', 'EON', 'TIS', 'TIN', 'XI', 'TON',
         'ONE', 'TIE', 'NET', 'NEXT', 'SIN', 'TOE', 'SOX', 'SET', 'TEN', 'NO',
         'NE', 'SEX', 'ION', 'NOSE', 'TI', 'ONS', 'OSE', 'INTO', 'SEI', 'SOT',
         'EN', 'NIT', 'NIX', 'IS', 'IT', 'ENS', 'EX', 'IN', 'ET', 'ES', 'ON',
         'OES', 'OS', 'OE', 'INS', 'NOTE', 'EXIST', 'SI', 'XIS', 'SO', 'SON',
         'OX', 'NOT', 'SEN', 'ITS', 'SENT', 'NOS'])}

def test_words():
    assert removed('LETTERS', 'L') == 'ETTERS'
    assert removed('LETTERS', 'T') == 'LETERS'
    assert removed('LETTERS', 'SET') == 'LTER'
    assert removed('LETTERS', 'SETTER') == 'L'
    t, results = timedcall(map, find_words, hands)
    for ((hand, expected), got) in zip(hands.items(), results):
        assert got == expected, "For %r: got %s, expected %s (diff %s)" % (
            hand, got, expected, expected ^ got)
    return t

print test_words()

11 Top N Hands (answer)

(我的答案:

def topn(hand, board_letters, n=10):
    "Return a list of the top n words that hand can play, sorted by word score."
    ###Your code here.
    words = word_plays(hand, board_letters)
    return sorted(words, reverse=True, key=word_score)[:n]

虽然这一小段代码写出来了, 并且也是正确的,但是总感觉 sorted() 函数里的第1个参数有点不对。。。
)
Again, pretty straight forward. We get the word plays, and we sort them in reverse order again so that bgigest are first, this time by word score, and then we just take the first n. By doing the subscripting(带下标) like that, it works when n is too big. It works when n equals none. Now, just an aside here, as the great American philosopher(哲学家;思想家;学者), Benjamin Parker once said, “With great power comes great responsibility.” We have a great power here to go through all the words in the dictionary and come up with all the best plays. Now, I could read in the official Scrabble dictionary and I could apply the board position that you saw in my game with Ken and I could come up with a bunch(束;串;捆) of good plays. But that wouldn’t be fair to my friend Ken, unless we had previously agreed that it was legal and fair to do so. I’m not going to do that. I got to resist(抵抗;忍耐;反对) that temptation(诱惑;引诱). And throughout your career as an engineer, these types of temptations or these types of possibilities are going to come up. Having strong ethics(伦理;道德;行为准则) is part of learning to be a good software engineer. So now, in terms of our pacing, we’ve achieved milestone #2. We can stop sprinting(sprint 冲刺) again. We can relax. You can have a drink. We can lay down. We can congratulate ourselves or do whatever we want to do.

12 Cross Words

Let’s go back to our list of concepts, go back to our diagram(图表;示意图) of where we were and say, “What should be the next step?” Well, what can we do now? We can take a hand and we can take a single letter on the board and we can say, “Yes, I can pick letters out of the hand and maybe do a S-I-D-E.” That would be a good play, except if there was an X here, then it would not be a good play. Similarly, if there were letters in the opposite(相对的;对面的;对立的) direction, that could be a bad play. But sometimes, there’s letters in the opposite direction and it makes a good play. Where here I’m forming two words at once, and the rules are the play I have to make has to be all in one direction and all adjacent(邻近的) to each other so forming one consecutive(连续的;连贯的) word. Then if it incidentally(偶然地) forms some other words in the other directions, that’s okay. But I can’t put some in this direction and then put some others down in that direction. I think my next goal will be to place words on a row while worrying about the crosswords(crossword 纵横字谜) in the opposite direction.

13 Anchors

Now let’s be a little bit more precise(清晰的;精确的) about what the rules are and what it means to play a word within a row and how that hooks up(hook up 连接) to the other columns. Now, the rules say that at least one letter that you play has to be adjacent to an existing letter on the board. We’ll mark with red asterisks(asterisk 星号) such squares. We call these anchor(锚;最后一棒的) squares. These are the squares that we can start from. Then we build out in each direction, forming consecutive(连续的;连贯的) letters into a single word. Now, the anchor squares do have to be adjacent(邻近的) to an existing letter, but they don’t have to be adjacent all within a row. They can be adjacent in either direction. Let’s expand the board beyond a single row and let’s populate(居住于;移民于) this with some more letters. Imagine that this board goes on in both directions. There’s probably an E here or something like that. If we restrict(限制;限定) our attention just to this row, notice that we’ve now introduced a new anchor point. This square is adjacent to an existing letter, and so that also counts as(count as 当做) an anchor. Now we want to find a word, which consists of a prefix plus a suffix. We get to define the game. We can say that for every anchor point, the prefix is going to be zero or more letters to the left of the anchor point, not counting the anchor point itself. Then the suffix will be the anchor point and everything to the right. Of course, we have to arrange(分类;整理) so that prefix plus suffix together form a word which is in the dictionary. Now here’s a cool play that comes from the dictionary. BACKBENCH is a word, and note that if we just have this rule of word equals prefix plus suffix where the suffix has to start with an anchor, then there’d be four possible ways of specifying(specify 指定;详述) this one move. We could anchor it here with no suffix(应该是prefix). We could anchor it here with these three letters as a suffix(应该是prefix). We could anchor it here with these letters as a suffix(应该是prefix). Or we could anchor it here with all these as a suffix(应该是prefix) and just H as the prefix(应该是suffix). Now, it seems wasteful to degenerate the same result four times, so we can arbitrarily(任意地) and without loss of completeness make up a rule which says there’s no anchor within a prefix. We couldn’t use this as a the anchor, because then there’d be anchors within the prefix. Likewise(同样地;而且), we couldnaEUt use this one or this one. We can only use this one as the prefix in order to generate this particular word. The anchor will also come from the hand, and the suffix can be a mix of hand and board. Here, this is the anchor. The prefix is empty. The anchor letter comes from the hand. Then there’s a mix of letters for the rest of the word. Now, what are the rules for a prefix. Let’s summarize. A prefix is zero or more characters, can’t cover up(盖起来;掩盖) an anchor square, and they can only cover empty squares. For example, for this anchor square here, the prefix can go backward, but it can’t cover this anchor. So the possible lengths for this prefix are zero to two characters. Any prefix can be zero characters, and here there’s room for two, but there’s not room for three, because then it would cover up an anchor. In that case, all the letters in the prefix come from the hand, but consider this anchor. For this anchor, we’re required to take these two letters as part of the prefix, because we can’t go without them because this abuts(abut 与…邻接). These two must be part of the prefix, and this one can’t be part of the prefix because it’s an anchor. If we wanted that we generate it from this anchor, rather than from this one. That means the length of a prefix for this anchor has to be exactly two. Similarly, the length of the prefix for this anchor has to be exactly one, has to include this character, because if we place a letter here, this is adjacent– it’s got to be part of the word–and this is an anchor so we can’t cover it. So we see that a prefix either the letters come all from the hand o or they come all from the board. What I want you to do is for the remaining anchors here, tell me what the possible lengths are. Either put a single number like this or a range of numbers–number-number.
(
My mistake: I confused the terms “prefix” and “suffix” between 1:40 and 1:55 in this video. Sorry about that.

-Peter
)
(
我的笔记:
红星锚定方块anchor square,必须与 1 个现存的字母邻接。
对于每 1 个 anchor point ,前缀必须是 0 个或者更多的字母。
)

13 Anchors (answer)

The answers are for this anchor the prefix has got to be one character–the A. This anchor–we can’t cover another anchor, so it’s got to be zero. This anchor–we conclude(结束;终止;推断出) this if we want, but we can’t go on to the other anchor, so it’s zero to one. Here we’ve got to include the D but nothing else, so it’s 1. Now, there’s one more thing about anchors I want to cover, which is how we deal with the words in the other direction. For these five anchors there are no letters in the other direction. So these are completely unconstrained(不受拘束的). We say that any letter can go into those spots. But in these two anchors, there’s adjacent letters, and it would be okay. We could form a word going in this direction. But we can do that only if we can also form a word going in this direction. Let’s say there are no more. This is either the edge of the board or the next row is all blanks. Then we can say, well, what letters can go here? Only the letters that form a word when the first letter is that word and the second letter is U. In our dictionary, it turns out that that possibility is the set of letters M, N, and X. MU, NU, and XU are all words in our dictionary, believe it or not. The Scrabble dictionaries are notorious(臭名昭著的;声名狼藉的) for having two- and three-letter words that you’ve never heard of. Similarly here–what are two-letter words that end in Y? It’ the set M, O, A, B. You’ve probably heard of most of those. When we go to place words on a particular row, we can pre-compute the crosswords and make that be part of the anchor. What we’re going to do is have a process that goes through, finds all the anchor points, and finds all the sets of letters–whether it’s any letter for these five anchors, or whether it’s a constrained(受约束的) set of anchor letters for these two anchors. Sounds complicated, but we can make it all work. Let me say that once you’ve got this concept, the concept of the anchor sets and the cross words, then basically we’re done. We’ve done it all. We can handle a complete board no matter how complicated, and we can get all the plays. It’s just a matter of implementing this idea and then just fleshing it out(flesh out 充实).

14 Bird By Bird

We’ve congratulated ourselves for getting this far. We’ve still got a ways to go. Now the question is what do we do next? It may seem a little bit daunting(令人畏惧的;使人气馁的), but there’s so much to do, and when I get that feeling, I remembered the book Bird by Bird by Anne Lamott, a very funny book. In it, she relates the story of how when she was in elementary(基本的;初级的) school and there was a big book report due where she had to write up descriptions of multiple different birds. And she was behind and it was due soon, and she went to her father and complained, “How am I ever gonna get done. I’m behind,” and her father just told her, “Bird by bird.” “Just go and take the first bird, write up a report on that, and then take the next bird off the list and keep continuing until you’re done.” Let’s go bird-by-bird and finish this up. What do we have left to do? Well, we got to figure out how to put letters on one particular row while dealing with the crosswords, then we got to expand from that to all the rows and then we got to do the columns as well, and then we got to worry about the scoring. There was a couple of minor(较小的;少数的) things to be put off like dealing with the blanks. That’s a lot to do, and let’s go bird-by-bird .

15 Anchor Class

The thing I want to do next is say let’s just deal with a single row at a time. Let’s not worry about the rest of the row. Let’s now worry about going in columns. Just deal with one row but also have that row handle the cross letters–the cross words. I’m going to need a representation for a row, and I think I’m going to make that be a list. There are many possible choices, but a list is good. I choose a list rather than a tuple, because I want to be able to change it in place. I want to be able to modify the row as we go, as the game evolves. Row is going to be a list of squares. If the square is a letter, we’ll just use that letter as the value of the square. If the square is an empty spot which has nothing in it, I think I’ll use dot just to say nothing is there. A trick that you learn once you’ve done this kind of thing a lot of times, is to say I’m going to be look at letters, I’m going to be looking at locations in the row, I’m going to be looking at their adjacent letters and going to the right and going to the left. If I’m trying to fill in from this anchor, I’ll move to the left to put in the prefix and I’ll move to the right to extend the word. It seems to me like I’m always going to have to be making checks of saying what’s the next character, is it a letter, is it an anchor, what is it? Also, oops, did I get past the end of the board. It seems like I’m going to have to duplicate the amount of code I have to write to check both the case when I go off the board and when I don’t go off the board. One way to avoid that is to make sure you never go off the board. It cuts the amount of code in half to some extent(程度;长度). A way to do that is to put in extra squares– to say here are the squares on the board, but let’s make another extra square on each side and just fill that in say the value of what’s in that square is a boarder, not a real square that you can play in, but if I’m here and I say what’s the value of square number i -1, I get an answer saying it’s a border rather than getting an answer that’s saying when you go i - 1 from position 0 you get an error. I think I’ll use a vertical(垂直的;竖立的) bar to indicate a border. I’ll have one there, and at the end of my row, I’ll have another border. Now I’ve sort of got everything. I got borders, letters, empty squares. The only thing left is anchors. I think what I’ll do here is I’ll introduce a special type for anchor. I could have used something like a tuple or a set of characters, but I want to make sure I know, and I want to have something in my code that says if the value of row[ i ] is an instance of anchor, then I want to do something. So we’ll make anchor be a class, and I want it to be a class that contains the set of letters. I can do that in a very easy way. I can use a class statement to say I’m going to define a new class. The class is called an anchor, and the class is a subset of the set class. Then I don’t need anything else for the definition of the class. All I have to know is that anchors are a type of set, but they’re a particular type of set. They’re a set of anchor letters. Here’s a code for that. I define a class of anchor. I have all of my allowable letters. Then I say any is an anchor, which allows you to put any letter onto that anchor spot. Now I want to represent this row where here are the borders, here are the empty spots, and here are the particular letters, and this representation– the schematic(示意的;严谨的) representation as a string does not account for a fact that after the A we’re going to have two restricted anchors that have to contain these characters. So we’ll define them–use the names mnx and moab to be the two anchors that are restricted to have only those letters. Now our row is equal to the border square is element number 0. Then the A is element number 1. Then we have these two restricted anchors, two more empty spots, another anchor where anything can go–the B and the E, and so on.

16 Row Plays

There’s our whole row, and while I’m at it I might as well define a hand. Now my next target, the next bird to cross off the list is to define a function row_plays, which takes a hand and a row in this format and returns a set of legal plays from the row. Now, rather than just return legal words, I’m using this notion of a play, where a play is a pair of location within the row and the word that we want to play. You can imagine it’s going to take the same general approach that we’ve used before, start with an empty set, do something to it, and then return the results that we built up. What is it that we want to do? We want to consider each possible allowable prefix, and to that we want to add all the suffixes, keeping the words. Now, prefixes of what? That’s the first thing to figure out. What I’m going to do is enumerate the row–enumerate actually just the good bits. The row from the first position to the last position, and that tells me I don’t want the borders. I don’t want to consider playing on the borders. I just want to consider playing on the interior(内部的;内地的) of the row. Enumerate that starting from position number 1. One would be where the A is. Now I have an index–a number 1, 2, 3–and I have the square, which is going to be a, and then an anchor and then an anchor and so on. Where do I want to consider my rows? We’re going to anchor them on an anchor so I can ask a square an instance of an anchor. If it is an anchor, then there’s two possibilities. If it’s an anchor like this, there’s only one allowable prefix. The prefix which is the letters that are already there just to the left of the anchor. We want to just consider that one prefix and then add all the suffixes. If it’s an anchor like this one, then there can be many prefixes. We want all possible prefixes that fit into these spots(场所;斑点) here, consider each one of those, and for each one of those consider adding on the suffixes. What I’m going to do is define a function, legal _prefix, which gives me a description of the legal prefix that can occur at position i within a row. There are two possibilities. I could combine the possibilities into one, but I’m going to have a tuple of two values returned. I’m going to have legal_prefix return the actual prefix as a string if there is one, like in this case, and return the maximum size otherwise. For this anchor here, this would be legal_prefix of one, two, three, four, five, six– that’s for legal_prefix when i = 6. The result would be that there are now characters to the left. It’ll be the empty string for the first element of the tuples. The maximum size of the prefix that I’m going to allow is two characters. Now, if I asked here–that’s index number one, two, three, four, five, six, seven, eight, nine– when i = 9, the result would be that the prefix is BE, and the maximum size is the same as the minimum size. It’s the exact size of 2. I define legal_prefix in order to tell me what to do next based on the two types of anchors. Now, I can go back to row plays. I can call legal_prefix, get my results, and say if there is a prefix, then I want to add to the letters already on the board. Otherwise, I have an empty space to the left, and I want to go through all possible prefixes. Here’s what we do if there is a prefix already there. Now we can calculate the start of our position. Remember a row play is going to return the starting location of the word. We can figure that out. It’s the i position of the anchor minus the length of the prefix. In fact, let me go and change this comment here. I is not very descriptive. Let’s just call that start. Now we know what the starting location is for the word. When we find any words we can return that. Then we go ahead and add suffixes. With the suffixes, some of the letters are going to come out of the hand. We’re adding suffixes to the prefix that’s already there on the board. Starting in the start location, going through the row, accumulating(accumulate 积累;逐渐增加) the results into the result set, and then I needed this one more argument. I actually made a mistake and left this out the first time, and it didn’t work. We’ll see in a bit what that’s there for. Now if we have empty space to the left of the anchor, now we’ve got to go through all the possible prefixes, but we already wrote that function–find_prefixes. That’s good. Looks like we’re converging(收敛的;趋同的). We’re not writing that much new stuff. Now, out of all the possible prefixes for the hand, we only want to look at the ones that are less than or equal to the maximum size. If the prefix is too big, it won’t fit into the empty spot. It will run into another word, and we don’t want to allow that. We can calculate the start position again. Then we do the same thing. We add suffixes. What do we add them to? We’ll the prefix that we just found from the hand. Since the prefix came from the hand, the remaining letters left in the hand we have to subtract(subtract 减去;扣除) out those prefix letters. Here we didn’t have to subtract them out, because they prefix letters were already on the board. We’re adding to the prefix from the start, from the row, results are accumulated, and we have this anchored equals false again. We’re almost there. Just two things left to do–add_suffixes and legal_prefix. Add_suffixes we had before, but it’s going to be a little bit more complicated now, because we’re dealing with the anchors. legal_prefix is just a matter of looking to the left and see how much space is there.

(本段视频+完整字幕 - YouTube)

Here is legal prefix. It’s pretty easy…we start out at position i within the row and we’re going to define s to be the starting position of any prefix to the left, we say while there is a letter just to the left of us by one decrement s by 1 s minus equals 1 is the sameas saying s equals s minus 1 if they’re any letters then s is going to be changed if s was changed if it’s less than I then there is a prefix on the board so we return those letters combined into a single word like be the value of the size of those which is I minus s which is two in that case if there are no letters already on the board to the left of the anchor then what we want to do is say how many squares are there that have empty squares but squares that are not anchors because remember we agreed that a prefix would’t span over an anchor because we don’t want to have duplicates we only want empty squares that are not anchors keep going backwards keep decrementing s while that’s true and then return no there are no letters on the board to the left of me this is the number of empty squares that I found and in this definition I introduced these predicates is letter and is empty they’re pretty straightforward empty square is one which is the empty character also can be there’s one character on the board that I’m going to mark with an asterisk that’s the starting location that’s usually in the middle of the board for anchors are also empty and I have to check specifically here if I want an empty thing that’s not an anchor and what’s a letter a letter is a string that’s one of the allowable letters Wow, So that was a lot I think it’s time to put it some tests to make sure that all the code we wrote is doing the right thing so I wrote one test here the legal prefix for location number two within the row should be just the letter A so that make sense here’s position two to the left of that is the prefix a we don’t want to go further and go into the boarder and what I want you to do is just fill in the rest of these tests put in the proper values for this and all these others of what legal prefix should return this example here and figure bring out what makes sense to you for what the legal prefixes should be you can also look at this definition ofa row and make sure that we got that right.

Here’s the answers of what they should be for each of these positions within a row.

18 Life is Good

(bump into 偶然碰见)
One more function to write–add_suffixes–and then we’ll be done with this bird. We’re given the start location. We recover the location in which we want to place the anchor letter, which is start at the start and then add in however much of the prefix we have. It could be the empty string. It could be more. If the prefix is already a word, then we want to add in a possible play. The play is starting in the start location we put in this word. But there is also this test. This test says we want to make sure that we’ve at least anchored the prefix. What does that mean? If we go back to our diagram, if we’re right here, we get passed in to add_suffixes the prefix B-E–that is, in fact, a word. B-E is in our dictionary, but we wouldn’t want to say the answer is B-E, because that’s not a play. That was already on the board. We can’t say our play is we’re making this word that’s already there. We’ve got to add at least one letter. That’s why when we were defining row plays, we said we’re adding suffixes, but we haven’t anchored it yet. Anchored is equal to False. We haven’t anchored our potential prefix so far so don’t go reporting that prefix as a valid play. Now in the definition of add_suffixes, we say, okay, if it’s the first time through, we’re not anchored so we’re not going return BE as a possible play, but when I do a recursive call to add_suffix, I’ll just have the default value and anchor will be true. I will have played on the anchor, and then I’ll be okay from then on. One additional test here is saying if there are existing letters already on the board, and you’re bumping into them, then you have to count them. If there’s a letter already there, don’t report everything up to that letter. That’s taken care of this case when we get up to this C, we can’t like say put a T here and have B-E-T and report that by itself as a word, because the T is running into the C. We’ve got to continue and see what words can we have–we have to account this C here. Okay. That takes care of adding in the play. Whether or not we found a valid play to add to our list of results, we still want to say are we going to continue? Can we keep going to the right adding on more letters? Well if the pre we have so far is within the prefixes, then yes we do want to try to continue. What we’ll do is say tell me what square is in the current position in the row. If it’s a letter, try to add suffixes to that–to the prefix we have so far plus the letter. If there is a letter already on the board, it’s mandatory(强制的;命令的) that we have to use it. Otherwise, if the square is empty, then we want to figure out what are all the possible letters that we could place into that empty square. If the square is an anchor, then the anchor will tell us what the possibilities are. Remember an anchor is a set of possible letters, so if the square is an anchor, let’s use that as a set of possibilities. Otherwise, if the square is empty and it’s not an anchor, then any letter is a possibility. Now we just go through the letters in our hand and say, if that letter is a possibility, then we want to add a new suffix by saying let’s place that letter onto the prefix, remove the letter form our hand, and continue adding suffixes from there. When we’re done that, return the results. That’s it. Now we’re done with that bird, but let’s go back, look at our test routine. We had these tests for legal prefix. Now, if we go into our interpreter and we run row_plays with the given hand and the row–that’s this row where the hand is maybe ABCEHKN. This is the result we get. That’s an awesome result. Look, we got the BACKBENCH that we saw before. We got all the smaller words. We can go through and we can check that each of these makes sense. They are the right ones. They don’t run into any letters or do anything wrong. It’s hard to check that we go all of them right, but we can still go ahead and make this an assertion. When we run this, all the tests pass and life is good.

19 Increasing Efficiency

(slap 猛打 用力放置)
However, here’s something that bothers me. In the definition of row_plays, we’re calling find_prefixes of hand inside this loop where we’re enumerating over the row, so it’s going to happen multiple times. It’s going to happen one time for every anchor. Eventually when we have multiple rows it’s going to happen for every anchor on every row, but notice that find_prefixes only depends on the hand. It’s not dependent on the row at all, so it seems wasteful to be recomputing find_prefixes of hand multiple times. If we were just dealing with row_plays, that would be easy enough. Up here we could say found prefixes equals find_prefixes of hand, assign it to a variable, and then just reference the variable down here. We’d be computing it once outside of the loop rather than many times inside the loop. Eventually, we’re going to have a bigger function that calls row_plays once for each row, and we wouldn’t want to have to compute that each time within row_plays. We want to compute it just once. We could pass into row_plays the set of prefixes, but that’s just complicating the interface. I’d like to cut down on the computation without complicating the interface. In other words, I want find_prefixes, when you pass it a hand, if it hasn’t seen the hand before, go ahead and compute all the prefixes. If it has seen the hand before, then I don’t have to re-compute it. I’ll just look it up and return it immediately. What I want you to do is take find_prefixes and make it more efficient. We could just slap a memo decorator on the front of find_prefixes, but that’s probably not exactly what we want, because note that find_prefixes is recursive, though it’s going to call itself on each subcomponent(子分量) of the hand. Really, we want to say I only want to remember the top level hand. If I’ve seen that exact hand before, then give me the answer. Don’t give me the answer for all the sub-parts of the hand. You could write a different decorator that works just on those parts, although that might be hard given that it’s recursive, or you could modify the function itself, or you could have two levels of functions. One top-level function–find_prefixes–that calls another function recursively. Your choice as to how you want to handle it. However you handle it, the idea is that if you call find_prefixes with a certain hand and you get back this result, then if you make the same top-level call again, it should immediately return the same result that it saved away rather than trying to recompute it.
题目:

# -----------------
# User Instructions
# 
# The find_prefixes function takes a hand, a prefix, and a 
# results list as input. 
# Modify the find_prefixes function to cache previous results
# in order to improve performance.

def prefixes(word):
    "A list of the initial sequences of a word, not including the complete word."
    return [word[:i] for i in range(len(word))]

def readwordlist(filename):
    "Return a pair of sets: all the words in a file, and all the prefixes. (Uppercased.)"
    wordset = set(file(filename).read().upper().split())
    prefixset = set(p for word in wordset for p in prefixes(word))
    return wordset, prefixset

WORDS, PREFIXES = readwordlist('words4k.txt')

class anchor(set):
    "An anchor is where a new word can be placed; has a set of allowable letters."

LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
ANY = anchor(LETTERS) # The anchor that can be any letter

def is_letter(sq):
    return isinstance(sq, str) and sq in LETTERS

def is_empty(sq):
    "Is this an empty square (no letters, but a valid position on board)."
    return sq  == '.' or sq == '*' or isinstance(sq, set) 

def add_suffixes(hand, pre, start, row, results, anchored=True):
    "Add all possible suffixes, and accumulate (start, word) pairs in results."
    i = start + len(pre)
    if pre in WORDS and anchored and not is_letter(row[i]):
        results.add((start, pre))
    if pre in PREFIXES:       
        sq = row[i]
        if is_letter(sq):
            add_suffixes(hand, pre+sq, start, row, results)        
        elif is_empty(sq):        
            possibilities = sq if isinstance(sq, set) else ANY
            for L in hand:
                if L in possibilities:
                    add_suffixes(hand.replace(L, '', 1), pre+L, start, row, results)
    return results

def legal_prefix(i, row):
    """A legal prefix of an anchor at row[i] is either a string of letters
    already on the board, or new letters that fit into an empty space.
    Return the tuple (prefix_on_board, maxsize) to indicate this.
    E.g. legal_prefix(a_row, 9) == ('BE', 2) and for 6, ('', 2)."""
    s = i
    while is_letter(row[s-1]): s -= 1
    if s < i: ## There is a prefix
        return ''.join(row[s:i]), i-s
    while is_empty(row[s-1]) and not isinstance(row[s-1], set): s -= 1
    return ('', i-s)

###Modify this function. You may need to modify 
# variables outside this function as well.

prev_hand, prev_results = '', set() # cache for find_prefixes

def find_prefixes(hand, pre='', results=None):
    """Find all prefixes (of words) that can be made from letters in hand."""
    if results is None: results = set()
    if pre in PREFIXES:
        for L in hand:
            find_prefixes(hand.replace(L, '', 1), pre+L, results)
    return results

19 Increasing Efficiency (answer)

Here’s what I did. I introduced two global variables–the previous hand and previous results. I am making a cache, like a memoization cache, but it’s only for one hand, because we’re only dealing with one hand at a time. Then I say, we then find_prefixes if the hand that you were given is equal to the previous hand, then return the previous results. I’m only going to update the previous hand and the previous results in the case where the prefix is the empty string. And that’s how I know I’m at the top level call when the prefix is the empty string. For all the recursive calls, the prefix will be something else. I’m only storing away the results when I’m at the top level call and I’m updating previous hand and previous results. With that, efficiency improvement to find prefixes, now when I do timedcalls of row plays for this fairly complex row, it’s only about a thousandth of a second. If I had a complete board that was similarly complex and say fifteen rows or so in the board, then it’d still be around one or two hundredths of a second and that’s pretty good performance.

20 Show and Spell

(engage in 参加;从事)

Now back to our diagram. Let’s figure out where we are. We did row_plays, so I can check off that bird. What’s left? Well, now I want to be able to do all the plays–all the plays in all the rows and all the columns. Another thing I want to be able to do is–notice that I cheated a little bit. I engaged in wishful thinking, which is always a good design strategy, in that when I called row_plays, I gave it a hand and a row, but I made the row myself– built that sample row that I called a row by making a list and saying, okay, I know A is here, I know an anchor called MNX is here, and so on. I didn’t have my program construct that row. All_plays is going to have to somehow do that type of construction. It’s going to somehow have to set the anchors within the row rather than having me give them explicitly as test. Then just one more thing to deal with, which is scoring. After I’ve got all the plays, I want to be able to figure out how much each one scores and pick out the top-scoring play. I talked about pacing at the beginning of this. Now I’m starting to pick up the pace. I’m feeling pretty good now. I’m saying it was a long way, we had to run hard, but now I can start to see the finish line. We can put together one final sprint to get it all done. What do I want to do next? I want to handle complete boards, not just individual rows. Just as we did with rows where we made up a sample row, let’s make a sample board. I define a function a_board, which returns a sample board. It’s the same one we were dealing with here. Note that I’m making this a function rather than a variable definition. The reason I’m doing this is because every time I reference a_board I want to create a new one, and I want to create a new one, because I’m going to be modifying the old one. I’m going to be placing letters onto the board. I’m going to be inserting anchors into the board and modifying the board structure itself. I don’t want to be dealing with the old one that I’ve already modified. I want to make sure I have a fresh one from scratch, and so I’m going to say the only way to access this is through the function. What it does is it takes these strings, maps list over each one. Rather than have this first row be a string, the first row will be then a list of characters. Same for all the other rows. There we can see when we call a_board we get this board, but that’s not very pretty to look at. I’d rather look at something like this where here I’ve printed the results. Notice that I put spaces between each letter to make the board more square-like. What I’d like you to do is define a function show, which takes a board as input, print out the results, looking just like that, returns None as a value.
题目:

# -----------------
# User Instructions
# 
# Write the function show that takes a board
# as input and outputs a pretty-printed 
# version of it as shown below.


## Handle complete boards    

def a_board():
    return map(list, ['|||||||||||||||||',
                      '|J............I.|',
                      '|A.....BE.C...D.|',
                      '|GUY....F.H...L.|',
                      '|||||||||||||||||'])

def show(board):
    "Print the board."
    ###Your code here.

# >>> a_board()
# [['|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|'], 
#  ['|', 'J', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', 'I', '.', '|'],
#  ['|', 'A', '.', '.', '.', '.', '.', 'B', 'E', '.', 'C', '.', '.', '.', 'D', '.', '|'],
#  ['|', 'G', 'U', 'Y', '.', '.', '.', '.', 'F', '.', 'H', '.', '.', '.', 'L', '.', '|'], 
#  ['|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|', '|']]

# >>> show(a_board())
# | | | | | | | | | | | | | | | | |
# | J . . . . . . . . . . . . I . |
# | A . . . . . B E . C . . . D . |
# | G U Y . . . . F . H . . . L . |
# | | | | | | | | | | | | | | | | |

20 Show and Spell (answer)

(我的答案:

def show(board):
    "Print the board."
    ###Your code here.
    for row in board:
        for column in row:
            print column,
        print ''

)
Here’s my answer–very simple–iterate over over the rows and over the squares in each row print one out. A comma(逗号) at the end of the print statement says put in a space but not a new line. At the end of each row, that’s where I put in a new line.
Peter的答案:

def show(board):
    for row i board:
        for sq in row:
            print sq,
        print

21 Horizontal Plays

(为了对 enumerate() 进行一定的了解,下面以简单的实例来说明:

>>> enumerate([5, 6], 1)
0x7fb21bc79730>
>>> for e in enumerate([5, 6], 1):
...     print e
...
(1, 5)
(2, 6)

)
Now let’s do a little bit of planning. We did row plays. What is row play return? Well, it’s a set of plays where each play is an i-word pair, where i is the index into the row where the word starts. We eventually want to get all plays. Before we can get there, I’m going to introduce another function called horizontal(水平线;水平的) plays, which does row plays across all the possible rows, but only going in the across(横穿;横过) direction not in the down direction. That’ll take a hand and a board as input. A board is just a list of rows. It’ll return a set of plays where a play, like in a row play, is the position in the word except now the position is not going just to be i, the position is an i-j pair. It’s going to be at this column in this row along with the word. It’s a set of tuples that look like that. Let’s define horizontal plays. Well, you know the drill(操练;钻头) by know–familiar structure. We start out with an empty set of results. We’re going to build them up somehow and then get the results. Now, how are we going to do that? Let’s enumerate over all the rows in the board. We just want the good ones–the one from 1 to -1. We don’t want the rows at the top and the bottom, which are off the board or the border squares. For each good row, I’m going to write a function called set_anchors which takes the row and modifies that row and mutates(mutate 使…变化) the row to have all the anchors in it. Remember before when I called row plays I passed in manually(用手的;手工的) all the anchors. Here, I’m going to have the program do it for me. Now, for each word, I want to find all the plays within that row and properly add them in to results. I want to do something with the row plays of hand within that row. And I want you to tell me what code should go here. It could be a single line or it could be a loop over the results that come back from row plays. Figure out what goes here so that it can return the proper results.

21 Horizontal Plays (answer)

Here is my answer, I call row plays on the row and that gives me a set of results which are of the form i–index into the row– and a word, and I can’t just add that into my results set, because that doesn’t tell me what row number I’m in. Instead, I want to add the tuple i and j. I’ve already got the row number j. Add the tuple of the position i, j along with the word into the results. That’s all we have to do.

22 All Plays

Okay. Back here, check off another bird. Just one left. Well, okay, I lied. It’s not quite one left. There is also scoring we’ll have to do next, but one left for this part. So all plays, like horizontal plays, takes a hand and the board. What should it return? Well, it’s going to be a set, the position in which we start the word. Well, that can be the same as before, an i-j position is perfectly good. Now, we’re also going to have some words going across and some words going down. Now, we want our results to be a three tuple. It’s a set of an ij position followed by a direction–across or down, followed by a word, a set of those. Now onto the all_plays function–takes a hand in the board, it’s going to return all plays in both directions on any square so the play is a position, direction, and a word where the position is an ij pair, j picked up row number, i the column number. Direction is either across or down, and those will just be global variables. We don’t have to decide for now how to represent it. I used a trick here–I said all the horizontal plays, the cross plays, we get from calling horizontal plays directly on the hand in the board. The vertical plays–I didn’t have to write a separate function. All I have to do is transpose(进行变换;移项) the board–flop the i and the j, call horizontal plays on that, and that gives me a set of results, but they’re the results in the wrong direction. They’re ji pairs rather than ij pairs. Now your task is write the code to put that all together to take these two sets, one of them is in reversed order, so they have to be swapped around. Neither set has a direction associated with it. Assemble them all into the resulting set that should be returned by all plays.

22 All Plays (answer)

Here’s my answer, so I took all the i, j, w pairs from the horizontal plays and just reassembled(reassemble 重新组装) them with the i, j, putting in the indication that they’re going in the across direction and keeping the same word. Then I do the same thing for the vertical plays. They came out in the j, i order. I reassembled them back in the proper i, j order with an indication that we are going in the down direction. And then I took these two sets and just unioned them together. Now, I need some definition for across and down. I can do it this way. I could have just used strings or any unique value that could use a string across and down, but I’m going to say that across is equal to incrementing one at a time in the i direction and zero in the j direction. Down is incrementing zero in the i direction and one in the j direction.

23 Set Anchors

We’re almost there. Things are coming together very nicely. The only thing we’re missing is set anchors. We have to somehow take a row and the row number within the board and figure out where all the anchors are and what the values of those anchors are. To find the anchors we can do that within the row. An anchor is something that’s next to an existing letter. That will get most of the anchors but not quite all of them, because notice if we didn’t have this row here, we wouldn’t know that that spot is an anchor. To find all the anchors, we’re going to have to look at all the rows or at least the two adjacent rows on either side. To find what the anchors are in terms of the set–can anything go there as in this anchor, as in this anchor, any letter can appear there, or is it a restricted set like this one. That we’re also going to have to know–what are the other cross words? Here, if there’s only a U there and nothing down below–this is the edge of the board or there is empty stuff down below–then this anchor can only be the letters that fit in there to make a word going in this direction. Let’s dive right into defining set anchors. This is different than most of the functions written so far in that it actually mutates the row rather than returning a result. We start in the normal way. We’re going to iterate over the row–the good parts of the row. Then what I’m going to do is take the i-j position on the board and find all the neighbors for that board–that is all the squares in the in the north, south, east, and west location. Then I’m going to say what are the anchors? Well, if the square is the star, the starting square, then that’s an anchor by definition. Otherwise, if the square is empty and any of the neighbors is a letter, then that’s an anchor. Now, I’ve arranged that neighbors(board) is a function that returns the neighbors in this order–north, south, east, and west–and now I’m saying we’re operating on a row, if the neighbor to the north or the south is a letter, then we have a cross word that we have to deal with. If not, then it’s an unrestricted anchor. What do I want to do if I have these crosswords? I want to find the crossword on the board. What does that mean? In this location right here, which would be row 2 and column 2, I want to say that the word on the board is an empty square followed by a U. If I go into the interpreter, I want to be able to have this interaction where here is my board. Now if I find crosswords within that board from position 2, 2–that’s that position right there right after the A and above the U–what I want to say is that there is a word, and the word is a dot followed by a U–there it is. If we fill in that anchor, we’re going to have a word, which is a dot followed by a U. Where does the word start? Well, it starts in position 2. It could have started someplace else. If we had a big board, there might have been a word that started all the way up here and went down. For example, we’re not actually going to call find_cross_word on positions that aren’t anchors, but if we did, it would still work. If we find cross words from position 1, 2–that’s the where the A is–what’s the cross word that intersects through that A? Well, that’s JAG. It begins in row number 1. Now we’re up here, we found the cross words, we found the row that the cross word begins in, we found what the word is, and now we’re saying we’re going to fill in this location. It’s going to be an anchor. It’s anchor where the letters are all the letters with which we can replace the dot in dot U and make something which is a word. We can go back to our interpreter and test that out. We can say if W is the dot U, then what is this anchor of all the letters where W replaced by a letter is in WORDS. That’s the anchor with X, M, and N as possibilities. Now we’re going to say insert that anchor into row[i]. Insert into this spot to the right of the A above the U the anchor that says an X, an M, or an N can occur right in that location. That’s setting the anchor if there are cross letters above or below. Otherwise, we have an unrestricted anchor. For example, this anchor here to the left of the D–any letter can go in there. We’ve already defined the global variable any to be the anchor that allows any letter to occur.

24 Final Birds

Sprinting to the finish line. We’re almost there. Just a little bit left to go. Only one more slightly difficult part. Here is find_cross_word, and because I can see the finish line, I’m not going to explain it line-by-line, but you go ahead and read it. Two other small bits–we need this list of neighbors in the north, south, east, west order, and we’ve got to be able to transpose a matrix. It turns out that one way to transpose a matrix is an old trick. You map list to the zip of the application of all the rows in the matrix. If that makes your head hurt, don’t worry about it. You can play with it a little bit to see why it works or you could use this expression here–[ M [ j ][ i ] ] for j in the range and for i in the range. Now I’m excited to see it work. I ran it. It did, in fact, work. Solidified the results by putting them into a test–examples of finding cross words, finding neighbors, transpose–that looks like it works. Transpose of a transpose–you get back what you started with. If we set anchors on our sample board–we get back the sample row that we did by hand. If we call horizontal plays, we get back this list now. I feel a little bit bad here that I wrote this as a test, and yet I can’t really verify that this is exactly the right answer. I haven’t gone through every word in the dictionary to figure out if I got this exactly right. This serves not so much as a unit test–a unit test is something that verifies that your function is doing the right thing–rather this serves as a regression test. Regression test means that we’re just testing it to see if we broke something. We want to get the same result this time as we got next time if we make a small change to our program. We can verify that this looks reasonable–that we got the big word we were looking for, BACKBENCH, other components of it like BENCH and then a bunch of three-letter words, which always show up in Scrabble. That was the horizontal plays. We can also do the all_plays. It’s gratifying(悦人的;令人满足的) that all_plays is a bit longer. It’s gratifying that BACKBENCH is still there, and when we run it, it passes. Bird-by-bird–we can check off one more. Now there is only one left–scoring.

25 Scoring

I could take the results from all plays, so each play is a triple of position, direction, and a word, and then I could add a score to those, but it seems like I’m just taking this lists apart and putting them back together so many time. I’ve already done it three times. I did a row play–took it apart, added it back in j for the horizontal plays, and took that apart, added back in a direction for all_plays. I could do that one more time to insert the score. Maybe that would’ve been the right choice. Maybe I just got fatigued(疲乏的), and I made a mistake in my design sense, but what I decided to do was modify my horizontal_plays and all_plays functions. There are two modifications. Here in horizontal_plays, after I got the play, I calculate a score, and then I insert that into my result. Now my results are no longer just position, word play–they’re score, position, word play. Then I want to do the same thing in all_plays. I want to make my play be a score, position, direction, word tuple, and I’m just, as before, ripping(rip 扯破;撕坏;撕成) these things apart and putting them back together. Now, remember the board. It’s got these double letters, triple word scores, and so on. If you’re old-school it looks like this. This is on a piece of cardboard that’s a physical material. We’ve also written triple and double scores, so I need to come up with some ways of representing these spots on the board and how they’re special. Now, could I squeeze(挤;压迫;压榨) it into my existing representation of a row? Remember a row is a list, and it can have things like that border is a string. Then we have a letter. Then we can have anchors like ANY. Could I have room for putting information about bonus squares on the board–he double and triple letters? Could I have, say, 3W to mean triple-word score as an element of this row? I guess my intuition(直觉) is I don’t think that’s going to work very well. The problem is one of these squares–say this one–could be both an anchor and a triple-word score. So we’d need some representation that allowed both of those. That just doesn’t seem to be easy to extend what I already have. Let’s not override our row notation. Our rows, as we have it–I’m pretty happy with them. I’m not going to allow this. I’m going to keep row exactly as they are. I’m going to introduce another data structure–a parallel(平行的;并行的) board, I’m going to have two boards. One board that I play on and another board that holds the bonuses. Think of it as two layers. One representation of a two-dimensional matrix just holds these double words and triple letters scores and so on, and then on top of that there’s a second two-dimensional array that holds the values of the letters and also holds the anchors. Oh, I got a good score. Now I’m going to have board[j][i] will hold the letter or anchor, and then bonus[ j ][ i ] will hold corresponding bonus–a double word, a triple letter, or just nothing.

26 Scoring 2

Now, just to give you a review of scoring if you aren’t familiar with the rules. The letters score their value–the number that’s on them–times whatever you’ve placed them on if that’s a double or triple letter score. Here I don’t have any double or triple letter scores, so I get the individual values of the letters– 10, 11, 12, 22–times the total word multiplier, and the pink square are double word scores, so that’s a double multiplier–22 times 2 is 44 points for that. Then if the next play goes here, it has to connect. Now, this was a double word, but I don’t get any double, because I didn’t actually place a new tile over an existing double word, so nothing is doubled, but I do get credit for the Z, even though I didn’t place the Z. I get credit for that as 10, not as 20 as it originally scored, but as its face value–10 plus 1–this is a triple 1, so that’s plus 3 plus 1–10, 11, 14, 15. Then if the next play, say, was here, that would just a score 3, 4, 5. I didn’t have any bonuses whatsoever. This bonus here doesn’t count, because I didn’t play over it. If a bonus is already covered up, it doesn’t county any more, so this would just be 5. One more scoring rule. If this is the word on the board–great play putting the Q on a double letter and a double word as well–this would score 20–1, 2, 3, 4 times 2 is 48. The next play could be this, which simultaneously(同时地;一齐) forms 3 words–NO in this direction and then IN and NO in this direction. For NO, we’d get a double letter is 2 plus 1 is 3. For IN, we get just 1 plus 1, so that’s 3 plus 2 is 5. And for NO we get another 3, so that would be a score of 8 altogether. Here is calculate_score. It takes all these variables that we need to specify the play. We’re going to start out a total of my word multiplier–that is, have I got any double or triple letters, and there might be more than one of them. If I had a long word that matches up with some existing letters on the board, then I’d get credit, and if. I covered a double and a triple letter, then I’d multiply by 6 I need to keep track of that. Then I also want to keep track of the cross word totals. Not the word I’m playing but the other cross words, but they are separate from this word multiplier. Figure out where my starting position is from that position. Figure out the direction that I’m moving in–down or across–what the delta are. Figure out what the other direction is. If I’m moving in the cross direction, I want to know that the other direction is the down direction. Now just enumerate the words. Enumerate the letters in the word and the position within the word. Figure out the square on the board and the bonus squares, the word multiplier, if the square was already placed on the board, so if it is a letter–and I’ve got a function for that. I should be defining this here, so this is a bad piece code. I should be calling is_letter(sq) rather than testing directly, since is decided to make that more abstract. Figure out the word multiplier from the bonuses. You only get the bonus if the letter wasn’t already on the board. Figure out the letter multipler–same thing. Increment my points by the points from the letter times the letter multiplier, and now if the square is an anchor and the anchor is not one of these unrestricted anchors, then we want to look for the cross words. If there is a cross word, figure out the cross word score and add that into the total. Why do I have this direction is not down here? Because I’m going to do–cross word score is going to recursively calculate score, and we don’t want it to recurse infinitely. We just want it to recurse once. To explain that a little bit more, note up here in horizontal_plays we’re calling calculate_score, so the only place we call calculate_score is here, and we’re calling it with the across. The way we get the down is because we transpose the board. So for calculate score, we know we’re going to be called with the across direction the first time and then the down direction the second time. Although, I guess I feel a little bit bad that that assumption that we’re always going to be called with across the first time is kind of hardwired into this. That makes calculate_score a little bit brittle. Probably, I should refactor this to stop the recursion in some other way. But I’m so close to the finish line now I don’t want to stop to clean things up. I want to get to the end. Here is cross_word_score. Figure out the position. We find the cross word. That’s a function we already wrote. Then we recursively call calculate_score.

27 Making the Board

Now all that’s left is to set up this matrix bonus, say where the double and triple bonuses are. Here’s what I’ve done. I’ve just drawn a picture of the bonus and I called it the bonus_template. But I only drew one quadrant(四分之一圆), one quarter of the board, because I noticed that they were all symmetric(相称性的;均衡的) and so this function bonus_template takes a quadrant in terms of a string, mirrors each rows and then mirrors each set of rows, where mirror of a sequence is just sequence plus the rest of the sequence except for the last one, so there’s going to be a middle piece that we’ll reflect(反射) it around. I made one template for the Scrabble game and one for the Words With Friends game, and then you choose which bonus you want to use, and then I defined these constants for double words, triple words, double letters, and triple letters, and I wasn’t quite sure what to use. I know I didn’t want to use letters like d and t because I’d get confused with the letters in the hand. I used 2 and 3 for double and triple words, a colon(冒号) because it has 2 dots for double letters, and a semicolon(分号) because it’s a little bit bigger than a colon for triple letters. Even though we’re so close to the end, it’s still good hygiene(卫生;卫生学;保健法) to write tests, so I took some time to write some, ran them, all the tests pass. You can see here’s a tiny little bonus template–a quarter of an array, which looks like that. When you apply bonus_template to it, you get this nice symmetric array. Now what I’d like you to do is modify the show(board) function so that it prints out these bonus entries so that if there’s no letter over this 3 in the corner it should print the 3 rather than just printing a dot.

27 Making the Board (answer)

Here’s my solution–I capture the j and i coordinates by using the enumerate function. I print the square if it’s a letter or if it’s the border that’s off the square. Otherwise, I just print the bonus.

28 Making Plays

Now, one thing I like to be able to do is when I get a play, I want to be able to actually modify the board to indicate what that play is. I want you to write a function to do that for me. It takes a play, but remember a play is tuple of a score, a start position indicated by i, n, j, a direction indicating by delta i and delta j, and the actual word, a string. Let’s make this look a little bit better by making it be a tuple. Write the code that will modify the board and in addition to modifying it, let’s also return the board

28 Making Plays (answer)

Here’s my answer–I just enumerated the letters in the word and the position into the word, updated the board, marching down from the start position j, i, and multiplying n, the position into the word, by the deltas specified by the direction.

29 Best Play

Now, very exciting. We’re at the culmination. One more function to write. That’s the best play. Given a hand and a board, return the highest scoring play. If there are no plays at all, just return None.

29 Best Play (answer)

Here’s my answer. We got all the pieces. We call all plays. We get back. There is a collection of plays. We sort them and take the last one–that’ll be the highest. We don’t even have to specify what to sort by, because of the score was the first element of the play, so we’re automatically sorting by that, and then return the best one if there are any plays otherwise. Otherwise, then I specified no play here in case I change my mind, but I can say no play equals None. Now, I could write something that plays a complete game, but instead I’m just going to have a simple function show_best, which takes a hand and a board, displays the current board, and then displays the best play. When I type it in to the interpreter, this is what I get. It found the backbench that we had sort of laid out there, scored 64 points, and out of all the possible plays, it found the optimal one. So, we did it. We made it all the way through. Congratulations.


参考文献:

  1. Design of Computer Programs - 英文介绍 - Udacity;
  2. Design of Computer Programs - 中文介绍 - Udacity;
  3. Design of Computer Programs - 中文介绍 - 果壳;
  4. Design of Computer Programs - 视频列表 - Udacity;
  5. Design of Computer Programs - 视频列表 - YouTube;
  6. Design of Computer Programs - class wiki - Udacity。

你可能感兴趣的:(软件工程,SICP,Design,of,Computer,Programs,学习)