我做了斗罗大陆的云词图:
准备工作:
1、安装python3.
2、安装工具包jieba,wordcloud,imageio,matplotlib.
3、准备效果图的底图,例如我用的是一张苹果的图片,词语会填充图片非白色区域的位置.
4、准备一个字体的ttf文件,我有一些网上随便下的,链接: https://pan.baidu.com/s/1u-9V03tNBgRJAQ-227yGvQ 提取码: 5qyi
然后上代码:
import jieba
from wordcloud import WordCloud
from imageio import imread
import matplotlib.pyplot as plt
path = '/wordcloud/douluo.txt'
f = open(path, 'r', encoding='gbk').read() #需要处理的文件路径
seg_list = jieba.cut(f)
dict1 = {'first': 1, 'key': 1}
dict2 = {'first': 1, 'key': 1}
#如果你希望能过滤掉一些重复率很高但是无关紧要的词汇的话,把下面的注释去掉即可
for seg in seg_list:
#if seg in dict2.keys():
# continue
if seg in dict1.keys():
dict1[seg] += 1
#if dict1[seg] > 5000: #显示词汇最高重复频率
# dict1.pop(seg)
# dict2[seg] = 1
else:
dict1[seg] = 1
tuple1 = sorted(dict1.items(), key=lambda x: x[1], reverse=True)
cut_text = ' '.join(['%s'%(k[0]) for k in tuple1])
color_mask = imread('/wordcloud/apple.jpg') #底图路径
cloud = WordCloud(font_path='/wordcloud/bb4134.ttf', #字体文件路径
background_color="white",
mask=color_mask,
max_words=1000,
max_font_size=100)
word_cloud = cloud.generate(cut_text)
plt.axis('off')
plt.imshow(word_cloud)
plt.show()
例如,当我把注释去掉,设置显示的词汇最高重复频率为5000时,产生的云词图如下: