Pyecharts、worldcloud简单词云绘制

txt= '''
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
'''

pyecharts

# 转小写,去符号,分词
txt=txt.lower()
sig='~!@#$%^&*_+|?<>:"\]\'[{}-.,'
for i in sig:
    txt=txt.replace(i," ")
words = txt.split()
counts={}
#统计词频,字典方法dict.get(key, default=None)
for i in words:
    counts[i]=counts.get(i,0) + 1
# dict.items()方法返回可遍历的(键,值)元组,并转为列表
items = list(counts.items())
# 词云绘制
import pyecharts.options as opt
from pyecharts.charts import WordCloud
c = WordCloud()
c.add(series_name="",data_pair=items,
     shape='circle')
c.render()
# 以HTML格式输出,可交互;或者c.render_notebook()可以在jupyter notebook中直接显示

Pyecharts、worldcloud简单词云绘制_第1张图片

WordCloud

from wordcloud import WordCloud
import matplotlib.pyplot as plt
wc = WordCloud(background_color='white',
              scale=32,
              max_words=100).generate(txt)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')#去除坐标轴
wc.to_file("python_this.png")

英文词云制作不易出现编码问题,中文词云就会有这方面的问题,相比WordCloud pyecharts对中文要更友好.

中文分词

from wordcloud import WordCloud,ImageColorGenerator
import matplotlib.pyplot as plt
import jieba
%matplotlib inline
#分词、去停用词
txt=open('老人与海.txt',encoding='utf-8').read()
txt_depart=jieba.lcut(txt)
stopwords = [line.strip() for line in open('stopwords.txt', encoding = 'utf-8').readlines()]
#手动添加几个停用词
stopwords= stopwords+['\n','\u3000','\ufeff']
#列表推导式去停用词
sentence=[word for word in txt_depart if word not in stopwords]
#文本转为字符串
string=' '.join(sentence)
#绘制
#添加背景图
background_mask = plt.imread('老人与海背景图2.jpg')
cloud=WordCloud(font_path='HYQiHei-25J.ttf',
                background_color='white',
                random_state=42,
                max_font_size=100,
                max_words=2000,
                mask=background_mask)
#绘制图云
word_cloud=cloud.generate(string)
#匹配图片
img_colors = ImageColorGenerator(background_mask)
word_cloud.recolor(color_func=img_colors)
plt.axis('off')
plt.imshow(word_cloud)
plt.show
word_cloud.to_file('old_man_and_sea.jpg')

wordcloud对于色彩对比比较鲜明的图,匹配出来效果会好一些,但是更多情况其实感觉并不是很好。

 

 

你可能感兴趣的:(python可视化)