李小璐事件网友情感分析

截止目前李小璐做头发事件已经被78万转发,100万条评论,以及66533个点赞。


image.png

那么我们就从数据分析的角度来探索一下网友对此的情感表现。

原料和工具

  1. 李小璐微博73083条评论
  2. Python3.6
  3. WordCloud,词云

实施过程

1.评论数据爬取
2.文本数据的清洗与处理
3.制作词云

  1. 评论数据爬取
    首先我们需要获得评论数据,代码如下
import urllib3
import json
from pyquery import PyQuery

headers = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7',
    'Connection': 'keep-alive',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Cookie':'SINAGLOBAL=3218402220875.7734.1513508172096; YF-Page-G0=19f6802eb103b391998cb31325aed3bc; _s_tentry=passport.weibo.com; Apache=8374388974272.269.1516624844916; ULV=1516624844942:9:5:1:8374388974272.269.1516624844916:1516364804513; YF-V5-G0=9717632f62066ddd544bf04f733ad50a; login_sid_t=f1251206b11e40e767e3d75ad41ed0da; cross_origin_proto=SSL; YF-Ugrow-G0=ea90f703b7694b74b62d38420b5273df; UOR=,,www.baidu.com; WBtopGlobal_register_version=49306022eb5a5f0b; SCF=Al4NxlKT01wukinDewkd_1IJg1ka4Y5rTQudGjOM-wkngo65UAZrDbGeQsychIVOFn90bBDSbfUlW0yNgnbm1-0.; SUB=_2A253YawqDeThGeVM61UV8S_OyjuIHXVUFprirDV8PUNbmtBeLWzgkW9NTT7Ndhfp_PpH_6-dctyomiTWAScQaWJM; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW87Emdazw_fpCjAs.anFAM5JpX5K2hUgL.FoeEehMXeK2EeKM2dJLoIpf9UCH8SEHFeCHFeEH8SEHFeb-4ebH8SC-RSFHFxntt; SUHB=0sqRIU5kGNIOvf; ALF=1517229815; SSOLoginState=1516625018; [email protected]; wvr=6; wb_cmtLike_3207411217=1; wb_cusLike_3207411217=N',
    'Host': 'weibo.com',
    'Referer': 'https://weibo.com/1537790411/Frishwdoh?filter=hot&root_comment_id=0&type=comment',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'
}

output = open('comment', 'w', encoding='utf8')

for i in range(200):
    http = urllib3.PoolManager()
    url = "https://weibo.com/aj/v6/comment/big?ajwvr=6&id=4165058017677973&root_comment_max_id=4197304577625422&root_comment_max_id_type=0&root_comment_ext_param=&page=" + str(i + 1) + "&filter=all"
    print(url)
    res = http.request("GET", url, headers=headers)

    result = json.loads(res.data)

    # print(result['data']['html'])
    p = PyQuery(result['data']['html'])

    # print(p('.WB_text').text())

    for item in p('.WB_text').items():
        text = item.text().split(":")[1] + "\n"
        output.write(text)
        print(text)

output.close()

历时1个多小时爬取了73083条数据。
2 .对文本数据进行分词,清洗,并输出网友情感
代码如下

import pickle
from os import path
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

def make_worldcloud(file_path):
    text_from_file_with_apath = open(file_path,'r',encoding='UTF-8').read()
    wordlist_after_jieba = jieba.cut(text_from_file_with_apath, cut_all=False)
    wl_space_split = " ".join(wordlist_after_jieba)
    print(wl_space_split)
    backgroud_Image = plt.imread('心1.jpg')
    print('加载图片成功!')
    
    stopwords = STOPWORDS.copy()
    stopwords.add("哈哈")
    stopwords.add('回复')
    stopwords.add('李小璐')

    wc = WordCloud(
        width=1024,
        height=768,
        background_color='white',
        mask=backgroud_Image,
        font_path='simsun.ttf', 
        max_words=600, 
        stopwords=stopwords,
        max_font_size=400,
        random_state=50,
    )
    wc.generate_from_text(wl_space_split)#开始加载文本
    image_colors = ImageColorGenerator(backgroud_Image)

    wc.recolor(color_func= image_colors)
    plt.imshow(wc)
    plt.axis('off')# 是否显示x轴、y轴下标
    plt.show()#显示
    
    d = path.dirname(__file__)
    # os.path.join():  
    wc.to_file(path.join(d, "心1.jpg"))
    print('生成词云成功!')

make_worldcloud('微博评论/李小璐')

我们来看一下这73083条数据的词云分布


image.png

结果看到,基本词云反应了此事件的网友情绪,出现最多的是出轨,恶心,贾乃亮等字。
以上就是对此舆论事件的一个大概分析了,个人业余所做,没有调侃意思。

你可能感兴趣的:(李小璐事件网友情感分析)