文本文件的词频统计(包含excludes排除库)

def getTxt():
    txt=open("hamlet.txt","r").read()
    txt=txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{}|~':
        txt=txt.replace(ch," ")
    return txt
hamletTxt=getTxt()
words=hamletTxt.split()
counts={}
for word in words:
    counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
excludes=['the','and','to','of','you','i','a','my','in',\
          'it','that','is',' not','his','this','but',\
          'with','for','not','your','me','be','as','he',\
          'what','him','so','have','will','do','no','we',\
          'are','on','all','our','by','or','shall','if',\
          'o','thou','they','good','come','now','more',\
          'let','from','her','how','at','thy']

i=0
while i<20:
    word, count = items[i]
    if word not in excludes:
        print("{0:<10}{1:>5}".format(word,count))
        i+=1
    else:
        del items[i]
'''以下错误代码
i=0
j=0
while i<10:
    word,count=items[j]
    if word not in excludes:
        print("{0:<10}{1:5}".format(word,count))
        i=i+1
    else:j=j+1
'''


哈姆雷特文本下载 点击打开链接

运行结果

hamlet      462
lord            309
king           194
horatio     157
claudius    120
queen       117
polonius    116
laertes     103
gertrude     95
ophelia      86

你可能感兴趣的:(文本文件的词频统计(包含excludes排除库))