目录
一、jieba库安装
二、wordcloud库安装
三、Hamlet词云生成程序代码
四、生成hamletwordcloud.png词云图
4.1 shaanxi.png背景图
4.2 hamlet.txt文件
五、带排除的三国演义中文分词
六、三国演义.txt文件
七、运行结果
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jieba
wordcloud是优秀的词云展示第三方库,以词语为基本单位,通过图形可视化的方式,更加直观和艺术的展示文本。
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple wordcloud
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 18 21:06:34 2022
@author: zcq
"""
import jieba
import wordcloud
from PIL import Image
import numpy as np
f = open("hamlet.txt", "r", encoding="utf-8")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
mask = np.array(Image.open("shaanxi.png"))
w = wordcloud.WordCloud( mask=mask, \
width = 1000, height = 700,\
background_color = "white",
font_path = "msyh.ttc"
)
w.generate(txt)
w.to_file("hamletwordcloud.png")
import jieba
f = open("三国演义.txt","r",encoding='utf-8')
ls = jieba.lcut(f.read())
#ls = f.read().split()
f.close()
excludes={"将军","却说","二人","不可","荆州","不能","如此"}
counts={}
for word in ls:
if len(word)==1:
continue
elif word=="诸葛亮" or word =="孔明曰":
rword="孔明"
elif word=="关公" or word =="云长":
rword="关羽"
elif word=="玄德" or word =="玄德曰":
rword="刘备"
elif word=="孟德" or word =="丞相":
rword="曹操"
else:
rword =word
counts[rword]=counts.get(rword,0)+1
for word in excludes:
del(counts[word])
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse =True)
for i in range(20):
word,count=items[i]
print ("{0:<10}{1:>5}".format(word,count))
runfile('E:/Oliver学Python/program/threeking.py', wdir='E:/Oliver学Python/program')
曹操 1385
孔明 1342
刘备 1236
关羽 759
张飞 343
商议 335
如何 326
主公 318
军士 300
吕布 296
军马 284
左右 283
引兵 273
次日 262
大喜 259
孙权 256
天下 252
赵云 252
东吴 244
于是 242
资源下载链接:https://pan.baidu.com/s/1bbIk8ElMtfF10-TTT4B8rg
提取码:ttxs