购物狂论坛帖子标题词频分析

针对前段时间爬取的购物狂育儿板块帖子,用结巴分词进行分词,并排除无意义的停用词,并对词频结果生成词云图。分析一下大家目前针对小BABY最关注哪些方面。

import jieba.analyse
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud,ImageColorGenerator

f = open('c:/1.txt','r')
text = f.read()


result=jieba.analyse.textrank(text,topK=200,withWeight=True)
keywords = dict()
stopword = ["推荐","求助","请问","知道","儿童","请教","没有","问题","需要","记录","大家","分享",
            "适合","方法","重庆","有没有","麻麻","小朋友","看看","牌子","宝妈","摄影",
            "问题","开始","地方","时间","小儿","经验","时间","不吃","妈妈","娃儿","孩子",
            "爸爸","咨询","体验","不能","时候","还有","活动","起来","成长","婴儿","育儿",
            "母婴","进来","父母","新手","家长","亲们","喜欢","东西","东西","出生","妹妹",
            "帮忙","小孩","好用","照片","有点","感觉","免费","应该","准备","好用","娃娃",
            "妈咪","没得","注意","看到","支招","选择","购物狂","不会","出来","婆子",
            "日记","参加","遇到","辣妈","生育","新生儿","美妈","情况","觉得","发现",
            "台历","添加","幼儿","转让","座椅","了解","归来","报告","急求","跪求",
            "朋友","纠结","办法","经历",]
for i in result:
    if i[0] in stopword:
        pass
    else:
        keywords[i[0]]=i[1]
print(keywords)


image= Image.open('c:/1.jpg')
graph = np.array(image)
wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph)
wc.generate_from_frequencies(keywords)
image_color = ImageColorGenerator(graph)
# plt.imshow(wc)
# plt.imshow(wc.recolor(color_func=image_color))
# plt.axis("off")
# plt.show()
#plt.savefig('test.jpg',dpi=600)
wc.to_file('gwk.jpg')
gwk.jpg

你可能感兴趣的:(购物狂论坛帖子标题词频分析)