基础11·jieba库下的词频统计(jieba库精确模式,删除多个指定字符串)

参考点
jieba库精确模式,删除多个指定字符串


import jieba

txt = open("C://Users/Administrator/Desktop/"+"三国演义(前四回).txt", "r").read()

words = jieba.lcut(txt)                        #jieba.lcut():jieba库的精确分割
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    counts[rword] = counts.get(rword, 0) + 1
    
excludes = {"将军", "却说", "荆州", "二人", "不可", "不能", "如此","朝廷","天下","陈留王"}
for word in excludes:
    del counts[word]                              #删除多个指定字符串
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

for i in range(10):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))
    

结果展示:


Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.211 seconds.
Prefix dict has been built succesfully.
刘备           53
董卓           23
何进           18
黄巾           17
张飞           15
张宝           14
关羽           13
曹操           13
张让           13
太后           13

你可能感兴趣的:(基础编程,jieba库下词频统计)