如何统计英文文本中词汇的出现次数

def getText():
    txt = open('hamlet.txt', 'r').read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_{|}~':
        txt = txt.replace(ch, ' ')
    return txt

txt = getText()
words = txt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10):
    word, count = items[i]
    print('{0:<10}{1:>5}'.format(word, count))

我统计的英文文本是哈姆雷特,只需要把你想要统计的文本拷贝到项目的根目录,然后进行相应的修改即可。

你可能感兴趣的:(如何统计英文文本中词汇的出现次数)