需要依赖的库:
jieba
wordcloud
matplotlib
scipy
安装方式: 命令行输入 pip install jieba / pip install wordcloud
先用jieba分词对简单句子进行划分:
import jieba
sentence = "我来到了异世界,转生成一只史莱姆。萌王万岁!"
print("Default Mode: " + "/".join(jieba.cut(sentence, cut_all=False, HMM=True)))
print("Full Mode: " + "/".join(jieba.cut(sentence, cut_all=True)))
print("HMM OFF: " + "/".join(jieba.cut(sentence, cut_all=False, HMM=False)))
print("Search Engine Mode: " + "/".join(jieba.cut_for_search(sentence, cut_all=True, HMM=False)))
上述输出中,“异世界”一词被划分开,可以通过调节单个词语的语频,使其能(或不能)被分出来。也可以选择调整词典。
jieba.suggest_freq("异世界", tune=True)
jieba.add_word("萌王")
接下来看下划分文本
示例文本是最近看的一本小说
import jieba.analyse as anl
text = open("textfile.txt").read()
keyword = anl.extract_tags(text, 200, withWeight=True)
print(keyword)
keyword = anl.textrank(text, 200, withWeight=True, allowPOS=('v','vd','n','nr','ns','nt','nz'))
使用方法和上面一致,输出省略。
最后用jieba分词的结果做词云,准备一张用作背景的图片
注意:制作中文词云,需要设置中文字体
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pylab as plt
from scipy.misc import imread
d = {k: v for (k,v) in keyword} #接上
back_coloring = imread("picfile.png")
image_colors = ImageColorGenerator(back_coloring)
wordcloud = WordCloud(background_color="white", mask=back_coloring, max_words=2000, max_font_size=100, width=1000, height=860, margin=2)
wordcloud.generate_from_frequencies(d)
plt.figure()
plt.imshow(wordcloud.recolor(color_func=image_colors))
plt.axis("off")
plt.show()
wordcloud.to_file("outfile.png")