python学习笔记——wordCloud生成《三国演义》出现次数最多的50个名字

jieba分词

参考:https://github.com/fxsjy/jieba

词云

参考: https://github.com/amueller/word_cloud

其他注意

1.调用open()函数时,要注意填写参数encoding = ‘UTF-8’,否则会乱码
2.调用WordCloud()函数时,要注意填写font_path="msyh.ttc"识别中文字体,否则会显示不出有汉字的词云
3.用到了字典通过值进行排序

代码

import matplotlib.pyplot as plt
from wordcloud import WordCloud
import jieba.posseg as pseg

f = open(r'C:\Users\dell\Desktop\31878\all.txt', 'r', encoding='UTF-8').read()
words = pseg.cut(f)

nameDict = dict()
strName = " "
i = 0
for word, flag in words:
    if flag == "nr":
        if word in nameDict:
            nameDict[word] += 1
        else:
            nameDict.setdefault(word, 1)

nameDictSorted = sorted(nameDict.items(), key=lambda x: x[1], reverse=True)
for key, value in nameDictSorted:
    strName += key + ","
    i += 1
    if i == 50:
        break

wordcloud = WordCloud(background_color="white"
                      , font_path="msyh.ttc"
                      , width=1000
                      , height=860
                      , margin=2).generate(strName)

plt.imshow(wordcloud)
plt.axis("off")
plt.show()
wordcloud.to_file('test.png')

结果

python学习笔记——wordCloud生成《三国演义》出现次数最多的50个名字_第1张图片
结果并不十分令人满意,其中有一些类似人名的词语(如:曹兵,魏兵)和一些没切割好的人名(如:孔明曰)

你可能感兴趣的:(机器学习)