爱丽丝梦游仙境---python云图

目录

WordCloud功能

文章和底片来源

无底片云图

 有底片云图

 中文云图


WordCloud功能

(1) 文本预处理

(2) 词频统计

(3) 将高频词以图片形式进行彩色渲染

文章和底片来源

https://github.com/amueller/word_cloud/tree/master/examples

无底片云图

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS

d = path.dirname(__file__)
text = open(path.join(d, 'D:\python\CompatingD\Cloud_map\\1.txt')).read()
wordcloud = WordCloud().generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

 爱丽丝梦游仙境---python云图_第1张图片

 有底片云图

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS


d = path.dirname(__file__)
text = open(path.join(d, 'D:\python\CompatingD\Cloud_map\\alice.txt')).read()

alice_mask = np.array(Image.open(path.join(d, "D:\python\CompatingD\Cloud_map\\alice_mask.png")))

stopwords = set(STOPWORDS)
stopwords.add("said")

wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask, stopwords=stopwords)

wc.generate(text)

wc.to_file(path.join(d, "alice.png"))

plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()

爱丽丝!!!

爱丽丝梦游仙境---python云图_第2张图片

 中文云图

 simsun.ttf是中文语言包,放到同一目录下即可

from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
import jieba

file = open('D:\python\CompatingD\Cloud_map\Chinese.txt')
text = file.read()
text = ' '.join(jieba.cut(text))

wordcloud = WordCloud(font_path="simsun.ttf").generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

爱丽丝梦游仙境---python云图_第3张图片

 ****中文文章请自行翻译

你可能感兴趣的:(数据清洗,python进行数据分析)