第1章 语言处理与Python

1、尝试使用Python解释器作为一个计算器,输入表达式,如12/(4+1)。

>>> 12/(4+1)
2

2、26个字母可以组成26的10次方或者26**10个10字母长的字符串。也就是141167095653376L(结尾处的L只表示这是Python长数字格式)。100个字母长的度的字符串可能有多少个?

>>> 26**100
3142930641582938830174357788501626427282669988762475256374173175398995908420104023465432599069702289330964075081611719197835869803511992549376L

3、Python乘法运算可应用于链表。当你输入[‘Monty’,’Python’]*20或者3*sent1会发生什么?
(1)

>>> ['Monty','Python']*20
['Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python']

(2)

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>> 3*sent1
['Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.']

4、复习1.1节关于语言计算的内容。在text2中有多少个词?有多少个不同的词?

>>> len(text2)
141576
>>> len(set(text2))
6833

5、比较表格1-1中幽默和言情小说的词汇多样性得分,哪一个文体中词汇更丰富?
第1章 语言处理与Python_第1张图片
幽默。

6、制作《理智与情感》中四个主角:Elinor,Marianne,Edward和Willoughby的分布图。
在这部小说中关于男性和女性所扮演的不同角色,你能观察到什么?你能找出一对夫妻
吗?

>>> text2
and Sensibility by Jane Austen 1811>
>>> text2.dispersion_plot(["Elinor","Marianne","Edward","Willoughby"])

第1章 语言处理与Python_第2张图片
Elinor与Edward是夫妻。

7、查找text5中的搭配。

>>> text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

8、思考下面的Python表达式:len(set(text4))。说明这个表达式的用途。描述在执行
此计算中涉及的两个步骤。
第1步:由text4生成词典。
第2步:计算词典中的单词量。

9、复习1.2节关于链表和字符串的内容。
a. 定义一个字符串,并且将它分配给一个变量,如my_string = ‘My String’(在
字符串中放一些更有趣的东西)。两种方法输出这个变量的内容,一种是通过简
单地输入变量的名称,然后按回车;另一种是通过使用print语句。

>>> my_string='My String'
>>> my_string
'My String'
>>> print my_string
My String

b. 尝试使用my_string+my_string或者用它乘以一个字符串添加到它自身,
例如:my_string*3。请注意,连接在一起的字符串之间没有空格。怎样能解决
这个问题?

>>> my_string+my_string
'My StringMy String'
>>> (my_string+' ')*3
'My String My String My String '

10、使用的语法my_sent=[“My”,”sent”],定义一个词链表变量my_sent(用你
自己的词或喜欢的话)。
a. 使用’ ‘.join(my_sent)将其转换成一个字符串。

>>> ' '.join(my_sent)
'My sent'

b. 使用split()在你指定的地方将字符串分割回链表。

>>> 'My sent'.split()
['My', 'sent']

11、定义几个包含词链表的变量,例如phrase1,phrase2等。将它们连接在一起组
成不同的组合(使用加法运算符),最终形成完整的句子。len(phrase1+phrase2)
与len(phrase1)+len(phrase2)之间的关系是什么?

>>> phrase1 = ['I','Love','dragon']
>>> phrase2 = ['I','Love','NLP and Python']
>>> len(phrase1+phrase2)
6
>>> len(phrase1)+len(phrase2)
6

相等。

12、考虑下面两个具有相同值的表达式。哪一个在NLP中更常用?为什么?
a. “Monty Python”[6:12]
b. [“Monty”,”Python”][1]
b.更常用。自然语言处理中都已单词为单位处理。

13、我们已经看到如何用词链表表示一个句子,其中每个词是一个字符序列。sent1[2][2]
代表什么意思?为什么?请用其他的索引值做实验。

>>> sent1
['Call', 'me', 'Ishmael', '.']
>>> sent1[2][2]
'h'

代表在句子中第3个单词的第3个字母。

14、在变量sent3中保存的是text3的第一句话。在sent3中the的索引值是1,因为
sent3[1]的值是”the”。sent3中”the”的其他出现的索引值是多少?

>>> sent3
['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> for i in range(len(sent3)):
...     if sent3[i] == "the":
...         print i
... 
1
5
8

15、复习1.4节讨论的条件语句。在聊天语料库(text5)中查找所有以字母b开头的词,按字母顺序显示出来。

>>> [w for w in set(text5) if w.startswith('b')]
[u'brought', u'brings', u'blade', u'babycakeses', u'bomb', u'busy', u'bust', u'bliss', u'blew', u'best', u'bachelorette', u'bigest', u'babies', u'bandito', u'boost', u'bloody', u'by', u'brrrrrrr', u'blowup', u'banjoes', u'bagels', u'besides', u'bitdh', u'bite', u'boyfriend', u'bright', u'beatles', u'breath', u'blah', u'being', u'buying', u'bedford', u'beautiful', u'bummer', u'blues', u'bluer', u'bumped', u'burried', u'brooklyn', u'barrel', u'beams', u'banned', u'barely', u'bloe', u'boed', u'b4', u'behave', u'be', u'bf', u'bc', u'bi', u'bj', u'baby', u'bird', u'babe', u'babi', u'balls', u'benz', u'bend', u'bothering', u'borderline', u'bother', u'babes', u'bred', u'bring', u'bedroom', u'buttons', u'btw', u'ball', u'become', u'betta', u'blowing', u'barfights', u'bike', u'bro', u'brb', u'bra', u'blueberry', u'blocking', u'beach', u'bares', u'break', u'band', u'ballin', u'beleive', u'babay', u'bied', u'brat', u'brad', u'bisexual', u'b-day', u'bunny', u'burito', u'board', u'book', u'born', u'bumber', u'bound', u'born-again', u'brothers', u'belive', u'brown', u'bikes', u'barbie', u'brain', u'blind', u'beachhhh', u'begin', u'between', u'boning', u'backup', u'bbiam', u'boing', u'bouncers', u'beckley', u'brunswick', u'broke', u'beeehave', u'bugs', u'bless', u'blank', u'base', u'birfday', u'boredom', u'both', u'battery', u'bruises', u'byeeeeeeeeeeeee', u'bonus', u'bahahahaa', u'beeeeehave', u'brakes', u'bossy', u'b/c', u'button', u'booty', u'boots', u'belongings', u'bitch', u'bears', u'blankie', u'ben', u'beg', u'bed', u'bare', u'bet', u'border', u'bases', u'bucks', u'baord', u'beats', u'bikini', u'barks', u'biatch', u'boooooooooooglyyyyyy', u'belong', u'before', u'better', u'bootay', u'bong', u'bone', u'bandsaw', u'bar', u'bay', u'bag', u'bad', u'ban', u'bak', u'balance', u'belly', u'butter', u'boinked', u'beattles', u'burp', u'builds', u'black', u'box', u'boy', u'bot', u'bow', u'boi', u'boo', u'bob', u'bites', u'basket', u'blooded', u'bein', u'butt', u'booted', u'burryed', u'behind', u'bottle', u'bread', u'buffalo', u'burger', u'bible', u'buses', u'blood', u'brady', u'bosom', u'brbbb', u'bulls', u'believe', u'b', u'boght', u'build', u'byeee', u'burned', u'bio', u'big', u'blowjob', u'biz', u'bit', u'blech', u'bacl', u'back', u'beans', u'bored', u'boys', u'blinks', u'body', u'boned', u'bones', u'bunch', u'balck', u'beuty', u'beat', u'bear', u'beam', u'busted', u'bull', u'babiess', u'brightened', u'buddyyyyyy', u'blessings', u'been', u'biggest', u'buy', u'bus', u'but', u'buh', u'bum', u'bug', u'backfrontsidewaysandallaroundtheworld', u'balad', u'booboo', u'baked', u'blue', u'beaten', u'breathe', u'bleach', u'bishes', u'bouts', u'biebsa', u'boss', u'bes', u'beer', u'beside', u'breaks', u'burns', u'beanbag', u'birthday', u'babblein', u'burps', u'boobs', u'blunt', u'betrayal', u'byes', u'bowl', u'bodies', u'bbbbbyyyyyyyeeeeeeeee', u'blow', u'byeeee', u'brass', u'basically', u'bumper', u'bout', u'brother', u'burpin', u'buff', u'babble', u'barn', u'blazed', u'biyatch', u'bwahahahahahahahahahaha', u'built', u'bouncer', u'bounced', u'butterscotch', u'bell', u'backatchya', u'bye', u'byb', u'bois', u'boring', u'brwn', u'bagel', u'bought', u'biiiatch', u'bbl', u'bbs', u'backroom', u'because', u'breeding', u'bitches', u'byeeeeeeee', u'boot', u'boom']

16、在Python解释器提示符下输入表达式range(10)。再尝试range(10,20),
range(10,20,2)和range(20,10,-2)。在后续章节中我们将看到这个内置函数的多用途。

>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> range(20,10,-2)
[20, 18, 16, 14, 12]
>>> range(10,20,2)
[10, 12, 14, 16, 18]

17、 使用text9.index()查找词sunset的索引值。你需要将这个词作为一个参数插入到圆
括号之间。通过尝试和出错的过程中,找到完整的句子中包含这个词的切片。

>>> text9.index('sunset')
629
>>> text9[629:630]
[u'sunset']

18、使用链表加法、set和sorted操作,计算句子sent1…sent8的词汇表。

>>> sent = sent1+sent2+sent3+sent4+sent5+sent6+sent7+sent8
>>> word_set = set(sent)
>>> word_li = sorted([w for w in word_set])
>>> word_li
['!', ',', '-', '.', '1', '25', '29', '61', ':', 'ARTHUR', 'Call', 'Citizens', 'Dashwood', 'Fellow', 'God', 'House', 'I', 'In', 'Ishmael', 'JOIN', 'KING', 'MALE', 'Nov.', 'PMing', 'Pierre', 'Representatives', 'SCENE', 'SEXY', 'Senate', 'Sussex', 'The', 'Vinken', 'Whoa', '[', ']', 'a', 'and', 'as', 'attrac', 'been', 'beginning', 'board', 'clop', 'created', 'director', 'discreet', 'earth', 'encounters', 'family', 'for', 'had', 'have', 'heaven', 'in', 'join', 'lady', 'lol', 'long', 'me', 'nonexecutive', 'of', 'old', 'older', 'people', 'problem', 'seeks', 'settled', 'single', 'the', 'there', 'to', 'will', 'wind', 'with', 'years']

19、下面两行之间的差异是什么?哪一个的值比较大?其他文本也是同样情况吗?

>>> len(sorted(set([w.lower() for w in text1])))
17231
>>> len(sorted([w.lower() for w in set(text1)]))
19317

第2种构建词典的方法可能存在重复单词,因此第1种单词数<=第2种单词数。

20、w.isupper()和not w.islower()这两个测试之间的差异是什么?

>>> 'Hello'.isupper()
False
>>> not 'Hello'.islower()
True

21、写一个切片表达式提取text2中最后两个词。

>>> text2[-2::]
[u'THE', u'END']

22、找出聊天语料库(text5)中所有四个字母的词。使用频率分布函数(FreqDist),
以频率从高到低显示这些词。
p22.py

#coding=gbk
from nltk import FreqDist
from nltk.book import text5
word_li = [w for w in text5 if len(w)==4]
fdist = FreqDist(word_li)
sorted_word_li = sorted(fdist.keys(),key=lambda x:fdist[x],reverse=True)
for w in sorted_word_li:
    print "%s\t%d; "%(w,fdist[w]),

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
JOIN    1021;  PART 1016;  that 274;  what  183;  here  181;  ....  170;  have  164;  like  156;  with  152;  chat  142;  your  137;  good  130;  just  125;  lmao  107;  know  103;  room  98;  from   92;  this   86;  well   81;  hiya   78;  back   78;  they   77;  yeah   75;  dont   75;  want   71;  love   60;  guys   58;  some   58;  been   57;  talk   56;  nice   52;  time   50;  when   48;  haha   44;  make   44;  girl   43;  need   43;  U122   42;  MODE   41;  then   40;  much   40;  will   40;  over   39;  work   38;  were   38;  take   37;  song   36;  U115   36;  U121   36;  even   35;  seen   35;  U105   35;  U156   35;  does   35;  more   34;  damn   34;  come   33;  only   33;  hell   29;  them   28;  long   28;  tell   27;  name   27;  call   26;  baby   26;  sure   26;  away   26;  look   26;  play   25;  U114   25;  U110   25;  cool   24;  NICK   24;  down   24;  hate   23;  sexy   23;  said   23;  many   23;  ever   22;  last   22;  hear   21;  life   21;  live   20;  very   19;  must   19;  give   19;  mean   19;  feel   19;  stop   19;  same   19;  LMAO   19;  hugs   18;  What   18;  find   18;  !!!!   18;  cant   18;  nite   17;  busy   17;  left   17;  ????   17;  lost   17;  hair   17;  shit   17;  U104   17;  fine   16;  real   16;  game   16;  fuck   15;  eyes   15;  heya   15;  sits   15;  kill   15;  lets   15;  goes   14;  wait   14;  shut   14;  keep   14;  true   14;  read   14;  U168   13;  pick   13;  free   13;  nope   13;  else   13;  near   13;  told   12;  male   12;  cold   12;  bout   12;  hehe   12;  This   12;  than   12;  U102   12;  hope   12;  awww   12;  gets   12;  used   12;  head   12;  stay   12;  yall   11;  kids   11;  perv   11;  babe   11;  wont   11;  year   11;  doin   11;  face   11;  U107   11;  U119   11;  home   11;  into   11;  .. .   11;  U132   10;  help   10;  Liam   10;  hard   10;  U101   10;  show   10;  mind   10;  week   10;  Well   10;  Yeah   10;  once   10;  hmmm   9;  aint    9;  full    9;  pics    9;  crap    9;  type    9;  hour    9;  such    9;  neck    9;  soon    9;  rock    9;  care    9;  days    9;  dang    9;  mine    9;  runs    9;  ; ..    9;  best    9;  kiss    9;  dead    9;  nick    9;  book    9;  sick    9;  sang    8;  says    8;  word    8;  wana    8;  U139    8;  suck    8;  went    8;  blue    8;  U144    8;  case    8;  heyy    8;  hows    8;  lady    8;  made    8;  wife    8;  U169    7;  dude    7;  ahhh    7;  okay    7;  fast    7;  took    7;  U108    7;  Hiya    7;  That    7;  alot    7;  wear    7;  hand    7;  kick    7;  dear    7;  rule    7;  send    6;  Song    6;  U165    6;  list    6;  <---    6;  next    6;  thru    6;  ride    6;  pink    6;  U520    6;  main    6;  ball    6;  sock    6;  done    6;  part    6;  seem    6;  They    6;  most    6;  U103    6;  ))))    6;  comp    6;  sing    6;  U142    6;  blah    6;  food    6;  oops    6;  U116    6;  knew    6;  Last    6;  U197    6;  whos    6;  U129    6;  U120    6;  gone    6;  poor    6;  goin    6;  meds    5;  fall    5;  When    5;  cali    5;  warm    5;  soul    5;  meet    5;  till    5;  late    5;  heck    5;  feet    5;  miss    5;  legs    5;  lick    5;  also    5;  came    5;  kool    5;  boss    5;  both    5;  Lime    5;  wall    5;  beer    5;  fire    5;  fool    5;  hang    5;  ####    5;  Have    5;  easy    5;  ohhh    5;  joke    5;  caps    5;  xbox    5;  nose    5;  lose    5;  yoko    5;  luck    5;  idea    5;  boys    5;  wish    5;  U128    5;  roll    5;  felt    5;  land    5;  ouch    4;  lord    4;  kent    4;  jerk    4;  sigh    4;  pass    4;  ummm    4;  holy    4;  ,,,,    4;  glad    4;  none    4;  high    4;  lame    4;  U133    4;  U130    4;  U988    4;  U989    4;  huge    4;  fart    4;  date    4;  cute    4;  hook    4;  U820    4;  team    4;  evil    4;  turn    4;  ways    4;  mmmm    4;  self    4;  pain    4;  U219    4;  ones    4;  pfft    4;  ROOM    4;  U146    4;  U154    4;  U819    4;  quit    4;  ugly    4;  open    4;  puff    4;  woot    4;  rest    4;  U117    4;  shes    4;  U196    4;  grrr    4;  each    4;  beat    4;  line    4;  U126    4;  U123    4;  door    4;  shot    4;  Like    4;  skin    3;  imma    3;  hump    3;  hola    3;  Elev    3;  elle    3;  U163    3;  slow    3;  jump    3;  Only    3;  roof    3;  hick    3;  nana    3;  hail    3;  army    3;  deop    3;  hurt    3;  town    3;  Your    3;  bend    3;  U136    3;  guyz    3;  road    3;  wine    3;  AKDT    3;  move    3;  Same    3;  isnt    3;  band    3;  half    3;  DING    3;  hank    3;  hawt    3;  ((((    3;  wazz    3;  wash    3;  CHAT    3;  vote    3;  ring    3;  butt    3;  rain    3;  orgy    3;  bare    3;  piff    3;  slap    3;  snow    3;  note    3;  U109    3;  U106    3;  gold    3;  yawn    3;  gawd    3;  toes    3;  yada    3;  amen    3;  U148    3;  U141    3;  swim    3;  walk    3;  rubs    3;  THAT    3;  ello    3;  itch    3;  tune    3;  Wind    3;  ahem    3;  soft    3;  clap    3;  deal    3;  lead    3;  wack    3;  U153    3;  died    3;  U145    3;  hiii    3;  mary    3;  toss    3;  2006    3;  hint    2;  luvs    2;  fits    2;  zone    2;  ciao    2;  humm    2;  Just    2;  1996    2;  sand    2;  U190    2;  Here    2;  porn    2;  cost    2;  cast    2;  cell    2;  haze    2;  >:->    2;  limp    2;  Nice    2;  john    2;  typo    2;  sort    2;  flaw    2;  club    2;  sore    2;  hold    2;  Down    2;  Lies    2;  root    2;  chip    2;  YOUR    2;  hmph    2;  spot    2;  wOOt    2;  eats    2;  meat    2;  Tisk    2;  Stop    2;  sooo    2;  WITH    2;  U138    2;  hall    2;  drop    2;  From    2;  Live    2;  yeas    2;  whip    2;  U170    2;  U175    2;  Cool    2;  cars    2;  argh    2;  Okay    2;  opps    2;  yard    2;  Ummm    2;  city    2;  hott    2;  bite    2;  mama    2;  kewl    2;  park    2;  past    2;  kind    2;  Love    2;  rent    2;  mins    2;  sell    2;  tyvm    2;  John    2;  trip    2;  NONE    2;  plan    2;  wats    2;  lawl    2;  phil    2;  High    2;  aunt    2;  U100    2;  shop    2;  golf    2;  ltns    2;  Poor    2;  ages    2;  rich    2;  wooo    2;  Days    2;  bear    2;  rofl    2;  ohio    2;  Gosh    2;  ears    2;  blew    2;  HAVE    2;  dumb    2;  !!!.    2;  n9ne    2;  Lmao    2;  flow    2;  gays    2;  drew    2;  Dang    2;  U111    2;  newp    2;  hits    2;  <<<<    2;  twin    2;  Drew    2;  Sure    2;  whoa    2;  mike    2;  ??!!    2;  spin    2;  cash    2;  adds    2;  Tell    2;  gimp    2;  uses    2;  howz    2;  foot    2;  ewww    2;  U112    2;  O.k.    2;  five    2;  tick    2;  pies    2;  DOES    2;  tisk    2;  <333    2;  doll    2;  deaf    2;  born    2;  Ahhh    2;  any1    2;  moon    2;  corn    2;  ex's    2;  burp    2;  Heyy    2;  grrl    2;  ?!?!    2;  KoOL    2;  side    2;  tock    2;  STOP    2;  lies    2;  DONT    2;  area    2;  U155    2;  Ohio    2;  Come    2;  babi    2;  heal    2;  FROM    2;  temp    2;  cmon    2;  deep    2;  Lets    2;  eric    2;  mass    2;  U172    2;  clue    2;  pool    2;  whud    2;  fawk    1;  NAME    1;  1200    1;  Nooo    1;  four    1;  disc    1;  Take    1;  bomb    1;  vega    1;  9:10    1;  pope    1;  COME    1;  :o *    1;  U181    1;  laid    1;  tail    1;  bike    1;  sent    1;  WHOA    1;  cyas    1;  7:45    1;  WHEN    1;  Teck    1;  45.5    1;  jack    1;  eeek    1;  Rang    1;  LAst    1;  NTMN    1;  WILL    1;  Does    1;  prep    1;  oooh    1;  anal    1;  pork    1;  pasa    1;  None    1;  crop    1;  sign    1;  sayn    1;  haaa    1;  kmph    1;  hide    1;  ssid    1;  wide    1;  feat    1;  dirt    1;  9.53    1;  addy    1;  ltnc    1;  daft    1;  Boyz    1;  tips    1;  bird    1;  junk    1;  Rush    1;  coem    1;  toke    1;  ELSE    1;  scum    1;  mkay    1;  sexs    1;  sext    1;  sink    1;  nawp    1;  dork    1;  News    1;  Ctrl    1;  tlak    1;  heee    1;  Back    1;  herE    1;  orta    1;  Kold    1;  MRIs    1;  Home    1;  king    1;  64.8    1;  smax    1;  ROFL    1;  offa    1;  hogs    1;  giva    1;  Evil    1;  VVil    1;  gosh    1;  1900    1;  plow    1;  Oops    1;  PM's    1;  ques    1;  2DAY    1;  hurr    1;  Rule    1;  Chop    1;  hgey    1;  Time    1;  pmsl    1;  z-ro    1;  sori    1;  QUIT    1;  givs    1;  Tiff    1;  worl    1;  pour    1;  fock    1;  Yoko    1;  6:53    1;  Male    1;  6:51    1;  slip    1;  YALL    1;  benz    1;  chit    1;  lol.    1;  chik    1;  Kiss    1;  Lord    1;  scar    1;  Maps    1;  rape    1;  HUGE    1;  Dude    1;  pine    1;  nuff    1;  U137    1;  U134    1;  syck    1;  mess    1;  soda    1;  hong    1;  Will    1;  pigs    1;  numb    1;  Joey    1;  hawT    1;  beam    1;  coat    1;  Deep    1;  CALI    1;  jail    1;  tall    1;  AWAY    1;  dojn    1;  kong    1;  cook    1;  1.98    1;  1.99    1;  Hero    1;  18ST    1;  allo    1;  frst    1;  thnx    1;  LONG    1;  cepn    1;  tory    1;  Away    1;  ally    1;  poop    1;  pure    1;  gooo    1;  docs    1;  bein    1;  GrlZ    1;  nads    1;  mahn    1;  lois    1;  GUYS    1;  halo    1;  term    1;  Came    1;  tere    1;  Rofl    1;  boed    1;  hazy    1;  mode    1;  bong    1;  whou    1;  bone    1;  GIRL    1;  OOPS    1;  fish    1;  SOME    1;  bois    1;  ussy    1;  hooo    1;  Save    1;  cums    1;  Room    1;  yes.    1;  waaa    1;  yout    1;  Haha    1;  thot    1;  39.3    1;  Dood    1;  hill    1;  okey    1;  Hold    1;  akon    1;  U147    1;  shup    1;  Wyte    1;  dman    1;  Judy    1;  base    1;  icky    1;  lisa    1;  weed    1;  Meep    1;  card    1;  raed    1;  yesh    1;  100%    1;  Chat    1;  TIME    1;  loud    1;  ooer    1;  Even    1;  rang    1;  thje    1;  LoVe    1;  crib    1;  xmas    1;  dint    1;  Hugs    1;  Prof    1;  size    1;  hots    1;  dump    1;  mami    1;  !...    1;  mame    1;  dogs    1;  soup    1;  t he    1;  U164    1;  gear    1;  tooo    1;  2:55    1;  HERE    1;  cops    1;  febe    1;  Long    1;  toop    1;  thah    1;  Road    1;  gret    1;  kina    1;  ebay    1;  serg    1;  2Pac    1;  10th    1;  Rick    1;  este    1;  gals    1;  seth    1;  brwn    1;  yeee    1;  bugs    1;  Jane    1;  seee    1;  slam    1;  U158    1;  Then    1;  able    1;  sexi    1;  tenn    1;  barn    1;  noth    1;  buff    1;  surf    1;  "...    1;  Drop    1;  HAHA    1;  paid    1;  aime    1;  wire    1;  mofo    1;  pair    1;  knee    1;  Bone    1;  GOOD    1;  cams    1;  wore    1;  salt    1;  Nova    1;  wind    1;  Slip    1;  teck    1;  Matt    1;  Need    1;  Hill    1;  Kent    1;  TEXT    1;  fear    1;  dick    1;  bust    1;  woof    1;  LIVE    1;  wood    1;  tape    1;  York    1;  mena    1;  geez    1;  lyin    1;  gees    1;  Turn    1;  sum1    1;  SExy    1;  gray    1;  Help    1;  pimp    1;  Over    1;  lala    1;  guns    1;  hiom    1;  CAPS    1;  body    1;  Hott    1;  fair    1;  Very    1;  Reub    1;  seat    1;  sean    1;  sips    1;  Kewl    1;  ladz    1;  jush    1;  Iowa    1;  gags    1;  lots    1;  Nope    1;  ohwa    1;  Rock    1;  tend    1;  caan    1;  wean    1;  bied    1;  mono    1;  grea    1;  grin    1;  blow    1;  lazy    1;  U542    1;  City    1;  otay    1;  Werd    1;  Were    1;  herd    1;  duet    1;  HALO    1;  pull    1;  wuts    1;  U113    1;  brat    1;  Hard    1;  o.k.    1;  nawt    1;  drug    1;  pray    1;  asss    1;  brad    1;  Dawn    1;  wubs    1;  vent    1;  guts    1;  bell    1;  <3's    1;  6:38    1;  lapd    1;  anti    1;  THEY    1;  poll    1;  ok'd    1;  puts    1;  it's    1;  bloe    1;  mark    1;  VBox    1;  SSRI    1;  Sexy    1;  byes    1;  TALK    1;  tjhe    1;  spit    1;  King    1;  Phil    1;  bull    1;  dotn    1;  firs    1;  Cute    1;  lool    1;  wins    1;  3:45    1;  woah    1;  TYPR    1;  ahah    1;  abou    1;  wild    1;  mauh    1;  cock    1;  scuk    1;  MORE    1;  Fade    1;  Kick    1;  goof    1;  Good    1;  cuss    1;  U143    1;  lust    1;  Kids    1;  Hail    1;  SEEN    1;  Eyes    1;  samn    1;  Born    1;  Uhhh    1;  bowl    1;  Seee    1;  nerd    1;  inch    1;  nada    1;  MUAH    1;  urls    1;  keys    1;  mang    1;  Pour    1;  puke    1;  dust    1;  ruff    1;  moms    1;  safe    1;  HOTT    1;  kept    1;  tthe    1;  Mine    1;  Tide    1;  Food    1;  acid    1;  sets    1;  owww    1;  Girl    1;  LOUD    1;  howl    1;  lube    1;  pm's    1;  boot    1;  wrek    1;  jude    1;  vamp    1;  pm'n    1;  waht    1;  U118    1;  Paul    1;  PMSL    1;  ghet    1;  rose    1;  Eggs    1;  jeff    1;  lake    1;  ther    1;  twit    1;  1299    1;  Talk    1;  !???    1;  boom    1;  tits    1;  Mono    1;  98.6    1;  dark    1;  JUST    1;  loss    1;  Show    1;  nude    1;  clay    1;  saME    1;  LATE    1;  Troy    1;  tart    1;  page    1;  KNOW    1;  yoll    1;  West    1;  bacl    1;  fake    1;  Holy    1;  kold    1;  Damn    1;  Tina    1;  <~~~ 1;  FINE    1;  Mary    1;  EVEN    1;  quiz    1;  Life    1;  outs    1;  bred    1;  outa    1;  Awww    1;  exit    1;  prob    1;  U149    1;  enuf    1;  peek    1;  Look    1;  peel    1;  poem    1;  Heya    1;  1930    1;  Ruth    1;  post    1;  mite    1;  SIZE    1;  choc    1;  asks    1;  jeep    1;  ribs    1;  Elle    1;  http    1;  perk    1;  Lion    1;  plus    1;  west    1;  out.    1;  rats    1;  eeww    1;  tiff    1;  arms    1;  lung    1;  yw's    1;  wrap    1;  RN's    1;  east    1;  1985    1;  span    1;  1980    1;  U150    1;  gift    1;  Hand    1;  4:03    1;  uyes    1;  whoo    1;  DAMN    1;  grew    1;  spat    1;  calm    1;  6:41    1;  form    1;  .op.    1;  heat    1;  Been    1;  AKST    1;  rush    1;  Sat.    1;  whys    1;  dawg    1;  site    1;  Care    1;  cure    1;  dyed    1;  Ohhh    1;  FACE    1;  Swim    1;  Heys    1;  Type    1;  Fort    1;  menu    1;  wher    1;  98.5    1;  whew    1;  ogan    1;  test    1;  draw    1;  star    1;  poot    1;  pwns    1;  dies    1;  1cos    1;  evah    1;  poof    1;  nods    1;  4.20    1;  yess    1;  idnt    1;  Jess    1;  push    1;  caca    1;  yell    1; 

23、复习1.4节中条件循环的讨论。使用for和if语句组合循环遍历《巨蟒和圣杯》(text6)的电影
剧本中的词,输出所有的大写词,每行输出一个。

>>> for word in [w for w in text6 if w.isupper()]:
...     print "%s;"%word,
... 
SCENE; KING; ARTHUR; SOLDIER; ARTHUR; I; SOLDIER; ARTHUR; I; I; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; ARTHUR; SOLDIER; A; ARTHUR; SOLDIER; A; ARTHUR; SOLDIER; ARTHUR; SOLDIER; I; ARTHUR; I; SOLDIER; SOLDIER; SOLDIER; I; ARTHUR; SOLDIER; SOLDIER; SOLDIER; SOLDIER; SOLDIER; SOLDIER; SOLDIER; SOLDIER; SCENE; CART; MASTER; CUSTOMER; CART; MASTER; DEAD; PERSON; I; CART; MASTER; CUSTOMER; DEAD; PERSON; I; CART; MASTER; CUSTOMER; DEAD; PERSON; I; CART; MASTER; CUSTOMER; DEAD; PERSON; I; CUSTOMER; CART; MASTER; I; DEAD; PERSON; I; CUSTOMER; CART; MASTER; I; DEAD; PERSON; I; CUSTOMER; CART; MASTER; I; CUSTOMER; CART; MASTER; I; CUSTOMER; CART; MASTER; DEAD; PERSON; I; I; CUSTOMER; DEAD; PERSON; I; I; CUSTOMER; CART; MASTER; CUSTOMER; CART; MASTER; I; CUSTOMER; CART; MASTER; SCENE; ARTHUR; DENNIS; ARTHUR; DENNIS; I; ARTHUR; I; DENNIS; I; I; ARTHUR; I; DENNIS; ARTHUR; I; DENNIS; ARTHUR; I; DENNIS; I; ARTHUR; I; DENNIS; WOMAN; ARTHUR; I; WOMAN; ARTHUR; WOMAN; ARTHUR; I; WOMAN; I; I; DENNIS; A; WOMAN; DENNIS; ARTHUR; I; WOMAN; ARTHUR; WOMAN; ARTHUR; DENNIS; I; ARTHUR; DENNIS; ARTHUR; I; DENNIS; ARTHUR; DENNIS; ARTHUR; I; WOMAN; ARTHUR; I; WOMAN; I; ARTHUR; WOMAN; ARTHUR; I; I; DENNIS; ARTHUR; DENNIS; ARTHUR; DENNIS; I; I; I; ARTHUR; DENNIS; ARTHUR; DENNIS; I; ARTHUR; DENNIS; I; SCENE; BLACK; KNIGHT; BLACK; KNIGHT; GREEN; KNIGHT; BLACK; KNIGHT; GREEN; KNIGHT; BLACK; KNIGHT; BLACK; KNIGHT; GREEN; KNIGHT; GREEN; KNIGHT; BLACK; KNIGHT; GREEN; KNIGHT; BLACK; KNIGHT; ARTHUR; I; I; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; I; I; BLACK; KNIGHT; ARTHUR; I; BLACK; KNIGHT; I; ARTHUR; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; A; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; I; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; I; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; ARTHUR; I; ARTHUR; BLACK; KNIGHT; BLACK; KNIGHT; I; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; I; ARTHUR; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; BLACK; KNIGHT; ARTHUR; BLACK; KNIGHT; I; I; SCENE; MONKS; CROWD; A; A; A; A; MONKS; CROWD; A; A; A; A; A; A; A; A; A; A; A; A; A; VILLAGER; CROWD; BEDEVERE; VILLAGER; CROWD; BEDEVERE; WITCH; I; I; BEDEVERE; WITCH; CROWD; WITCH; BEDEVERE; VILLAGER; BEDEVERE; VILLAGER; VILLAGER; CROWD; BEDEVERE; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGERS; VILLAGER; VILLAGER; VILLAGER; VILLAGER; A; VILLAGERS; A; VILLAGER; A; VILLAGER; RANDOM; BEDEVERE; VILLAGER; BEDEVERE; A; VILLAGER; I; VILLAGER; VILLAGER; CROWD; BEDEVERE; VILLAGER; VILLAGER; VILLAGER; CROWD; BEDEVERE; VILLAGER; VILLAGER; CROWD; BEDEVERE; VILLAGER; VILLAGER; VILLAGER; BEDEVERE; VILLAGER; B; BEDEVERE; CROWD; BEDEVERE; VILLAGER; BEDEVERE; VILLAGER; RANDOM; BEDEVERE; VILLAGER; VILLAGER; VILLAGER; CROWD; BEDEVERE; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGER; VILLAGER; ARTHUR; A; CROWD; BEDEVERE; VILLAGER; BEDEVERE; VILLAGER; A; VILLAGER; A; CROWD; A; A; VILLAGER; BEDEVERE; CROWD; BEDEVERE; CROWD; A; A; A; WITCH; VILLAGER; CROWD; BEDEVERE; ARTHUR; I; BEDEVERE; ARTHUR; BEDEVERE; I; ARTHUR; BEDEVERE; ARTHUR; I; NARRATOR; SCENE; SIR; BEDEVERE; ARTHUR; BEDEVERE; SIR; LAUNCELOT; ARTHUR; SIR; GALAHAD; LAUNCELOT; PATSY; ARTHUR; I; KNIGHTS; PRISONER; KNIGHTS; MAN; I; ARTHUR; KNIGHTS; SCENE; GOD; I; ARTHUR; GOD; I; I; ARTHUR; I; O; GOD; ARTHUR; GOD; ARTHUR; O; GOD; LAUNCELOT; A; A; GALAHAD; SCENE; ARTHUR; FRENCH; GUARD; ARTHUR; FRENCH; GUARD; ARTHUR; FRENCH; GUARD; I; I; ARTHUR; GALAHAD; ARTHUR; FRENCH; GUARD; I; ARTHUR; FRENCH; GUARD; ARTHUR; FRENCH; GUARD; I; I; GALAHAD; FRENCH; GUARD; ARTHUR; FRENCH; GUARD; I; GALAHAD; ARTHUR; FRENCH; GUARD; I; I; GALAHAD; FRENCH; GUARD; I; ARTHUR; I; FRENCH; GUARD; OTHER; FRENCH; GUARD; FRENCH; GUARD; ARTHUR; I; KNIGHTS; ARTHUR; KNIGHTS; FRENCH; GUARD; FRENCH; GUARD; ARTHUR; KNIGHTS; FRENCH; GUARD; FRENCH; GUARDS; LAUNCELOT; I; ARTHUR; BEDEVERE; I; FRENCH; GUARDS; C; A; ARTHUR; BEDEVERE; I; ARTHUR; BEDEVERE; U; I; ARTHUR; BEDEVERE; ARTHUR; KNIGHTS; CRASH; FRENCH; GUARDS; SCENE; VOICE; DIRECTOR; HISTORIAN; KNIGHT; KNIGHT; HISTORIAN; HISTORIAN; S; WIFE; SCENE; NARRATOR; MINSTREL; O; SIR; ROBIN; DENNIS; WOMAN; ALL; HEADS; MINSTREL; ROBIN; I; ALL; HEADS; MINSTREL; ROBIN; I; ALL; HEADS; I; ROBIN; W; I; I; ALL; HEADS; ROBIN; I; LEFT; HEAD; I; MIDDLE; HEAD; I; RIGHT; HEAD; I; MIDDLE; HEAD; I; LEFT; HEAD; I; RIGHT; HEAD; LEFT; HEAD; ROBIN; I; LEFT; HEAD; I; RIGHT; HEAD; MIDDLE; HEAD; LEFT; HEAD; RIGHT; HEAD; MIDDLE; HEAD; LEFT; HEAD; MIDDLE; HEAD; LEFT; HEAD; I; MIDDLE; HEAD; RIGHT; HEAD; LEFT; HEAD; MIDDLE; HEAD; RIGHT; HEAD; LEFT; HEAD; ALL; HEADS; MIDDLE; HEAD; RIGHT; HEAD; MINSTREL; ROBIN; MINSTREL; ROBIN; I; MINSTREL; ROBIN; MINSTREL; ROBIN; I; MINSTREL; ROBIN; I; MINSTREL; ROBIN; MINSTREL; ROBIN; I; CARTOON; MONKS; CARTOON; CHARACTER; CARTOON; MONKS; CARTOON; CHARACTERS; CARTOON; MONKS; CARTOON; CHARACTER; VOICE; CARTOON; CHARACTER; SCENE; NARRATOR; GALAHAD; GIRLS; ZOOT; GALAHAD; ZOOT; GALAHAD; ZOOT; GALAHAD; ZOOT; MIDGET; CRAPPER; O; ZOOT; MIDGET; CRAPPER; ZOOT; GALAHAD; I; I; ZOOT; GALAHAD; ZOOT; GALAHAD; ZOOT; GALAHAD; I; ZOOT; GALAHAD; I; I; ZOOT; I; GALAHAD; ZOOT; PIGLET; GALAHAD; ZOOT; GALAHAD; B; ZOOT; WINSTON; GALAHAD; PIGLET; GALAHAD; PIGLET; GALAHAD; I; PIGLET; GALAHAD; I; PIGLET; GALAHAD; I; I; I; GIRLS; GALAHAD; GIRLS; GALAHAD; DINGO; I; GALAHAD; I; DINGO; GALAHAD; I; I; DINGO; GALAHAD; DINGO; I; GALAHAD; DINGO; I; LEFT; HEAD; DENNIS; OLD; MAN; TIM; THE; ENCHANTER; ARMY; OF; KNIGHTS; DINGO; I; GOD; DINGO; GIRLS; A; A; DINGO; AMAZING; STUNNER; LOVELY; DINGO; GIRLS; A; A; DINGO; GIRLS; GALAHAD; I; LAUNCELOT; GALAHAD; LAUNCELOT; GALAHAD; LAUNCELOT; GALAHAD; LAUNCELOT; DINGO; LAUNCELOT; GALAHAD; LAUNCELOT; GALAHAD; I; LAUNCELOT; GIRLS; GALAHAD; I; DINGO; GIRLS; LAUNCELOT; GALAHAD; I; I; DINGO; GIRLS; LAUNCELOT; GALAHAD; I; DINGO; GIRLS; DINGO; LAUNCELOT; GALAHAD; I; I; LAUNCELOT; GALAHAD; LAUNCELOT; GALAHAD; I; LAUNCELOT; GALAHAD; LAUNCELOT; GALAHAD; I; LAUNCELOT; I; NARRATOR; I; I; CROWD; NARRATOR; I; SCENE; OLD; MAN; ARTHUR; OLD; MAN; ARTHUR; OLD; MAN; ARTHUR; OLD; MAN; ARTHUR; OLD; MAN; ARTHUR; OLD; MAN; ARTHUR; OLD; MAN; SCENE; HEAD; KNIGHT; OF; NI; KNIGHTS; OF; NI; ARTHUR; HEAD; KNIGHT; RANDOM; ARTHUR; HEAD; KNIGHT; BEDEVERE; HEAD; KNIGHT; RANDOM; ARTHUR; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; KNIGHTS; OF; NI; ARTHUR; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; ARTHUR; A; KNIGHTS; OF; NI; ARTHUR; PARTY; ARTHUR; HEAD; KNIGHT; ARTHUR; O; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; CARTOON; CHARACTER; SUN; CARTOON; CHARACTER; SUN; CARTOON; CHARACTER; SUN; CARTOON; CHARACTER; SCENE; NARRATOR; FATHER; PRINCE; HERBERT; FATHER; HERBERT; FATHER; HERBERT; B; I; FATHER; I; I; I; I; I; I; HERBERT; I; I; FATHER; HERBERT; I; FATHER; I; HERBERT; B; I; FATHER; HERBERT; FATHER; HERBERT; I; FATHER; HERBERT; I; I; I; FATHER; I; GUARD; GUARD; FATHER; I; GUARD; FATHER; GUARD; GUARD; FATHER; GUARD; FATHER; GUARD; FATHER; GUARD; GUARD; FATHER; GUARD; FATHER; GUARD; FATHER; GUARD; FATHER; GUARD; FATHER; GUARD; I; FATHER; N; GUARD; FATHER; GUARD; FATHER; GUARD; GUARD; FATHER; GUARD; FATHER; GUARD; GUARD; FATHER; GUARD; FATHER; GUARD; FATHER; GUARD; GUARD; GUARD; I; FATHER; GUARD; GUARD; FATHER; GUARD; FATHER; I; GUARD; I; HERBERT; FATHER; GUARD; FATHER; SCENE; LAUNCELOT; CONCORDE; LAUNCELOT; CONCORDE; LAUNCELOT; I; I; A; A; CONCORDE; I; I; LAUNCELOT; CONCORDE; I; I; I; I; I; LAUNCELOT; I; CONCORDE; I; I; LAUNCELOT; I; I; CONCORDE; LAUNCELOT; CONCORDE; I; LAUNCELOT; CONCORDE; I; I; I; SCENE; PRINCESS; LUCKY; GIRLS; GUEST; SENTRY; SENTRY; SENTRY; LAUNCELOT; SENTRY; LAUNCELOT; PRINCESS; LUCKY; GIRLS; LAUNCELOT; GUESTS; LAUNCELOT; GUARD; LAUNCELOT; O; I; I; HERBERT; LAUNCELOT; I; I; HERBERT; LAUNCELOT; I; HERBERT; I; I; LAUNCELOT; I; HERBERT; FATHER; HERBERT; I; FATHER; LAUNCELOT; I; HERBERT; LAUNCELOT; FATHER; LAUNCELOT; FATHER; LAUNCELOT; I; I; HERBERT; I; FATHER; LAUNCELOT; I; FATHER; I; HERBERT; FATHER; LAUNCELOT; I; FATHER; LAUNCELOT; FATHER; LAUNCELOT; I; I; I; FATHER; HERBERT; LAUNCELOT; I; FATHER; LAUNCELOT; HERBERT; I; FATHER; LAUNCELOT; HERBERT; I; LAUNCELOT; I; HERBERT; LAUNCELOT; I; I; I; FATHER; HERBERT; SCENE; GUESTS; FATHER; GUEST; FATHER; LAUNCELOT; FATHER; LAUNCELOT; I; I; I; GUEST; GUESTS; FATHER; LAUNCELOT; GUEST; GUESTS; FATHER; GUESTS; FATHER; I; I; GUEST; FATHER; GUEST; FATHER; BRIDE; S; FATHER; GUEST; FATHER; I; I; LAUNCELOT; GUEST; GUESTS; CONCORDE; HERBERT; I; FATHER; HERBERT; I; FATHER; HERBERT; I; FATHER; GUESTS; FATHER; GUESTS; FATHER; GUESTS; FATHER; GUESTS; FATHER; GUESTS; CONCORDE; GUESTS; CONCORDE; GUESTS; LAUNCELOT; GUESTS; LAUNCELOT; I; GUESTS; CONCORDE; LAUNCELOT; GUESTS; LAUNCELOT; GUESTS; LAUNCELOT; SCENE; ARTHUR; OLD; CRONE; ARTHUR; CRONE; ARTHUR; I; CRONE; ARTHUR; CRONE; ARTHUR; CRONE; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; ARTHUR; BEDEVERE; CRONE; BEDEVERE; ARTHUR; CRONE; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; BEDEVERE; ROGER; THE; SHRUBBER; ARTHUR; ROGER; ARTHUR; ROGER; I; I; BEDEVERE; ARTHUR; SCENE; ARTHUR; O; HEAD; KNIGHT; I; ARTHUR; HEAD; KNIGHT; KNIGHTS; OF; NI; HEAD; KNIGHT; RANDOM; HEAD; KNIGHT; ARTHUR; O; HEAD; KNIGHT; ARTHUR; RANDOM; HEAD; KNIGHT; KNIGHTS; OF; NI; A; A; A; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; ARTHUR; KNIGHTS; OF; NI; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; I; ARTHUR; KNIGHTS; OF; NI; HEAD; KNIGHT; ARTHUR; KNIGHTS; OF; NI; HEAD; KNIGHT; KNIGHTS; OF; NI; BEDEVERE; MINSTREL; ARTHUR; ROBIN; HEAD; KNIGHT; ARTHUR; MINSTREL; ROBIN; HEAD; KNIGHT; KNIGHTS; OF; NI; ROBIN; I; KNIGHTS; OF; NI; ROBIN; ARTHUR; KNIGHTS; OF; NI; HEAD; KNIGHT; ARTHUR; KNIGHTS; OF; NI; HEAD; KNIGHT; ARTHUR; HEAD; KNIGHT; I; I; I; KNIGHTS; OF; NI; NARRATOR; KNIGHTS; NARRATOR; MINSTREL; NARRATOR; KNIGHTS; NARRATOR; A; CARTOON; CHARACTER; NARRATOR; CARTOON; CHARACTER; NARRATOR; CARTOON; CHARACTER; NARRATOR; CARTOON; CHARACTER; NARRATOR; CARTOON; CHARACTER; NARRATOR; SCENE; KNIGHTS; ARTHUR; TIM; THE; ENCHANTER; I; ARTHUR; TIM; ARTHUR; TIM; ARTHUR; TIM; I; ARTHUR; O; TIM; ROBIN; ARTHUR; KNIGHTS; ARTHUR; BEDEVERE; GALAHAD; ROBIN; BEDEVERE; ROBIN; BEDEVERE; ARTHUR; GALAHAD; ARTHUR; I; I; TIM; A; ARTHUR; A; TIM; A; ARTHUR; I; ROBIN; Y; ARTHUR; GALAHAD; KNIGHTS; TIM; ROBIN; ARTHUR; ROBIN; GALAHAD; ARTHUR; ROBIN; KNIGHTS; ARTHUR; TIM; I; KNIGHTS; TIM; ARTHUR; O; TIM; ARTHUR; SCENE; GALAHAD; ARTHUR; TIM; ARTHUR; GALAHAD; ARTHUR; W; TIM; ARTHUR; TIM; ARTHUR; TIM; ARTHUR; TIM; ARTHUR; TIM; ARTHUR; TIM; ARTHUR; TIM; ROBIN; I; I; TIM; GALAHAD; TIM; GALAHAD; ROBIN; TIM; I; ROBIN; TIM; ARTHUR; BORS; TIM; BORS; ARTHUR; TIM; I; ROBIN; I; TIM; I; I; ARTHUR; TIM; ARTHUR; TIM; KNIGHTS; KNIGHTS; ARTHUR; KNIGHTS; TIM; ARTHUR; LAUNCELOT; GALAHAD; ARTHUR; GALAHAD; ARTHUR; ROBIN; ARTHUR; GALAHAD; ARTHUR; GALAHAD; LAUNCELOT; ARTHUR; LAUNCELOT; ARTHUR; MONKS; ARTHUR; LAUNCELOT; I; ARTHUR; BROTHER; MAYNARD; SECOND; BROTHER; O; MAYNARD; SECOND; BROTHER; MAYNARD; KNIGHTS; ARTHUR; GALAHAD; ARTHUR; SCENE; ARTHUR; LAUNCELOT; GALAHAD; ARTHUR; MAYNARD; GALAHAD; LAUNCELOT; ARTHUR; MAYNARD; ARTHUR; MAYNARD; BEDEVERE; MAYNARD; LAUNCELOT; MAYNARD; ARTHUR; MAYNARD; GALAHAD; ARTHUR; MAYNARD; LAUNCELOT; ARTHUR; BEDEVERE; GALAHAD; BEDEVERE; I; LAUNCELOT; ARTHUR; LAUNCELOT; KNIGHTS; BEDEVERE; LAUNCELOT; BEDEVERE; N; LAUNCELOT; BEDEVERE; I; ARTHUR; GALAHAD; MAYNARD; BROTHER; MAYNARD; BEDEVERE; ARTHUR; KNIGHTS; BEDEVERE; KNIGHTS; NARRATOR; ANIMATOR; NARRATOR; SCENE; GALAHAD; ARTHUR; ROBIN; ARTHUR; BEDEVERE; ARTHUR; GALAHAD; ARTHUR; GALAHAD; ARTHUR; ROBIN; ARTHUR; ROBIN; I; GALAHAD; ARTHUR; ROBIN; ARTHUR; ROBIN; I; LAUNCELOT; I; I; ARTHUR; GALAHAD; ARTHUR; LAUNCELOT; I; ARTHUR; BRIDGEKEEPER; LAUNCELOT; I; BRIDGEKEEPER; LAUNCELOT; BRIDGEKEEPER; LAUNCELOT; BRIDGEKEEPER; LAUNCELOT; BRIDGEKEEPER; LAUNCELOT; ROBIN; BRIDGEKEEPER; ROBIN; I; BRIDGEKEEPER; ROBIN; BRIDGEKEEPER; ROBIN; BRIDGEKEEPER; ROBIN; I; BRIDGEKEEPER; GALAHAD; BRIDGEKEEPER; GALAHAD; I; BRIDGEKEEPER; GALAHAD; BRIDGEKEEPER; ARTHUR; BRIDGEKEEPER; ARTHUR; BRIDGEKEEPER; ARTHUR; BRIDGEKEEPER; I; I; BEDEVERE; ARTHUR; SCENE; ARTHUR; BEDEVERE; ARTHUR; BEDEVERE; ARTHUR; FRENCH; GUARD; ARTHUR; I; FRENCH; GUARD; I; I; ARTHUR; FRENCH; GUARD; I; ARTHUR; FRENCH; GUARDS; ARTHUR; FRENCH; GUARD; ARTHUR; FRENCH; GUARD; FRENCH; GUARDS; ARTHUR; BEDEVERE; ARTHUR; FRENCH; GUARDS; ARTHUR; FRENCH; GUARDS; ARTHUR; FRENCH; GUARDS; ARTHUR; ARMY; OF; KNIGHTS; HISTORIAN; S; WIFE; I; INSPECTOR; OFFICER; HISTORIAN; S; WIFE; OFFICER; INSPECTOR; OFFICER; BEDEVERE; INSPECTOR; OFFICER; INSPECTOR; OFFICER; OFFICER; RANDOM; RANDOM; OFFICER; OFFICER; OFFICER; OFFICER; INSPECTOR; OFFICER; CAMERAMAN;

24、写表达式找出text6中所有符合下列条件的词。结果应该是词链表的形式:[‘word1’,’word2’,…]。
a. 以ize结尾

>>> [w for w in text6 if w.endswith('ize')]
[]

b. 包含字母z

>>> [w for w in text6 if 'z' in w]
[u'zone', u'amazes', u'Fetchez', u'Fetchez', u'zoop', u'zoo', u'zhiv', u'frozen', u'zoosh']

c. 包含字母序列pt

>>> [w for w in text6  if 'pt' in w]
[u'empty', u'aptly', u'Thpppppt', u'Thppt', u'Thppt', u'empty', u'Thppppt', u'temptress', u'temptation', u'ptoo', u'Chapter', u'excepting', u'Thpppt']

d. 除了首字母外是全部小写字母的词(即titlecase)

>>> list(set([w for w in text6 if w.istitle()]))
[u'Welcome', u'Winter', u'Lead', u'Uugh', u'Does', u'Saint', u'Until', u'Today', u'Thou', u'Burn', u'Lucky', u'Uhh', u'Not', u'Now', u'Twenty', u'Where', u'Just', u'Course', u'Go', u'Erbert', u'Uther', u'Actually', u'Cherries', u'Thpppt', u'Bloody', u'Aramaic', u'Mmm', u'Put', u'Haw', u'True', u'Pull', u'Fiends', u'Agh', u'Yup', u'We', u'Arthur', u'Zoot', u'English', u'Alright', u'My', u'Silence', u'Clark', u'Bedevere', u'Bors', u'Back', u'Maynard', u'Fetchez', u'Seek', u'Exactly', u'Doctor', u'Rather', u'When', u'Three', u'Providence', u'Book', u'Therefore', u'Huh', u'Stay', u'Umhm', u'Aaaaaaaah', u'Huy', u'Those', u'Dingo', u'Cider', u'Chop', u'Aauuugh', u'So', u'Found', u'Guy', u'Oui', u'Anarcho', u'Torment', u'Our', u'Your', u'Lie', u'Almighty', u'Galahad', u'Britons', u'Lord', u'Who', u'Beast', u'Loimbard', u'Why', u'A', u'Don', u'Guards', u'Oooh', u'All', u'Aaauugh', u'Assyria', u'Yeaaah', u'One', u'Farewell', u'Greetings', u'Beyond', u'Blue', u'What', u'Ayy', u'His', u'Recently', u'Here', u'Hic', u'Away', u'Wait', u'Concorde', u'Herbert', u'Ere', u'Bad', u'She', u'Mother', u'Shh', u'Erm', u'Tower', u'Robin', u'Summer', u'Chaste', u'Enchanter', u'Skip', u'Four', u'Say', u'Anthrax', u'Mud', u'Armaments', u'Build', u'Which', u'Nador', u'Hiyaah', u'Woa', u'More', u'Picture', u'Holy', u'Very', u'Practice', u'Packing', u'Uuh', u'Hold', u'Huyah', u'Throw', u'Must', u'None', u'This', u'Leaving', u'Ives', u'Nine', u'Stand', u'W', u'Firstly', u'Brother', u'Oooo', u'Eh', u'Amen', u'Jesus', u'Camaaaaaargue', u'Divine', u'Speak', u'Even', u'Hallo', u'Dappy', u'Yay', u'Iiiives', u'Prepare', u'There', u'Please', u'Black', u'Pure', u'Quoi', u'Excalibur', u'Iesu', u'Hmm', u'Midget', u'Angnor', u'B', u'Splendid', u'Aggh', u'Lancelot', u'Victory', u'See', u'Will', u'Shrubberies', u'Court', u'Aauuuves', u'God', u'Father', u'Patsy', u'It', u'Peng', u'Other', u'Then', u'Halt', u'Thee', u'Ridden', u'Aaaah', u'Knight', u'Antioch', u'They', u'Ask', u'With', u'Gallahad', u'Off', u'Thy', u'Well', u'Didn', u'Anybody', u'Isn', u'Grail', u'Neee', u'The', u'Bridge', u'Thsss', u'Hiyah', u'Yapping', u'Robinson', u'Hah', u'Explain', u'Aauuggghhh', u'Hill', u'Forward', u'Behold', u'European', u'Shut', u'Meanwhile', u'Chickennn', u'French', u'Psalms', u'Auuuuuuuugh', u'Ector', u'Aah', u'Keep', u'Quick', u'Once', u'Right', u'Help', u'Over', u'Anyway', u'Aaaugh', u'For', u'France', u'Umm', u'Walk', u'Dramatically', u'Good', u'Run', u'That', u'Arimathea', u'Forgive', u'Ecky', u'King', u'C', u'Could', u'Quiet', u'Hooray', u'S', u'Himself', u'African', u'Launcelot', u'Gable', u'Bravest', u'Bring', u'Shrubber', u'Aaah', u'Yes', u'Death', u'Christ', u'Would', u'Hey', u'Waa', u'Hee', u'Sorry', u'Heh', u'Get', u'Crapper', u'But', u'Hiyya', u'Aaaaaaaaah', u'Schools', u'Hurry', u'Princess', u'Together', u'N', u'Honestly', u'Caerbannog', u'Action', u'Knights', u'Round', u'And', u'Old', u'How', u'Winston', u'Mercea', u'Battle', u'Follow', u'Aaaaugh', u'Open', u'Ahh', u'Bedwere', u'Hya', u'Tis', u'Til', u'Tim', u'Charge', u'Wood', u'You', u'Nay', u'Tell', u'Stop', u'Aaaaaah', u'Excuse', u'Riiight', u'Supposing', u'Aaauggh', u'Attila', u'Do', u'I', u'Clear', u'Alice', u'Apples', u'Bristol', u'Y', u'Order', u'Try', u'Piglet', u'Tall', u'Spring', u'Is', u'Mind', u'Mine', u'Have', u'In', u'Table', u'Dennis', u'If', u'Wayy', u'Thank', u'Ninepence', u'Said', u'Hyy', u'Churches', u'Be', u'Augh', u'Ewing', u'Far', u'Oooohoohohooo', u'Surely', u'Consult', u'By', u'On', u'Unfortunately', u'Oh', u'Did', u'Of', u'Supreme', u'Morning', u'Tale', u'Ow', u'England', u'Or', u'Dis', u'Brave', u'Ohh', u'Pin', u'Pendragon', u'Are', u'Bones', u'Fine', u'Prince', u'Too', u'Iiiiives', u'Since', u'Pie', u'Idiom', u'Between', u'Whoa', u'Listen', u'Monsieur', u'Oooooooh', u'Frank', u'Quite', u'Let', u'Ho', u'Hm', u'Nothing', u'Ha', u'He', u'Chapter', u'Look', u'O', u'Thppppt', u'Um', u'Un', u'Uh', u'Bon', u'Hello', u'First', u'Ages', u'Autumn', u'Looks', u'Olfin', u'Message', u'Really', u'Ni', u'Use', u'Cut', u'No', u'Make', u'Aauuuuugh', u'Two', u'Quickly', u'Everything', u'Thpppppt', u'Nu', u'Rheged', u'Most', u'Hang', u'Ooh', u'Hand', u'Gawain', u'Every', u'Aaagh', u'Come', u'Bread', u'Peril', u'Steady', u'Thppt', u'Ulk', u'Silly', u'Defeat', u'Eee', u'Castle', u'Grenade', u'Camelot', u'Aagh', u'Britain', u'Joseph', u'Badon', u'Sir', u'Hoa', u'Perhaps', u'Hoo', u'Saxons', u'Lake', u'Thursday', u'To', u'Shall', u'May', u'Never', u'Eternal', u'As', u'Cornwall', u'Running', u'Five', u'Gorge', u'Lady', u'Man', u'Great', u'Like', u'Yeaah', u'Remove', u'Swamp', u'U', u'Heee', u'Dragon', u'Ah', u'Am', u'Yeah', u'An', u'Bravely', u'Allo', u'At', u'Ay', u'Roger', u'Chicken']

25、定义sent为词链表[‘she’,’sells’,’sea’,’shells’,’by’,’the’,’sea’,’shore’]。
编写代码执行以下任务:
a. 输出所有sh开头的单词

>>> sent = ['she','sells','sea','shells','by','the','sea','shore']
>>> print [w for w in sent if w.startswith('sh')]
['she', 'shells', 'shore']

b. 输出所有长度超过4个字符的词

>>> print [w for w in sent if len(w)>4]
['sells', 'shells', 'shore']

26、下面的Python代码是做什么的?sum([len(w) for w in text1]),你可以用它来算出
一个文本的平均字长吗?
平均字长计算公式

>>> sum([len(w) for w in text1])*1.0/len(text1)
3.830411128023649

27、定义一个名为vocab_size(text)的函数,以文本作为唯一的参数,返回文本的词汇量。

def vocab_size(text):
    return len(set(text))

28、定义一个函数percent(word,text),计算一个给定的词在文本中出现的频率,结果以百分比显示。

def percent(word,text):
    freq = len([w for w in text if w == word])*1.0/len(text)
    print "%.2f"%freq

29、我们一直在使用集合存储词汇表。试试下面的Python表达式:set(sent3)

>>> set(text3)False
>>> s1 = ['I','Love']
>>> s2 = ['I','Love','dragon']
>>> set(s1)True

表达式 set1 < set2 用来判断set1是否为set2的子集。

你可能感兴趣的:(第1章 语言处理与Python)