爬取B站视频弹幕生成词云

效果预览

以B站UP主可乐三太火影1-720集解说视频2020-08-07弹幕爬取为例,生成词云

代码

# -*- coding: utf-8 -*-
# @Time    : 2020/8/8 22:11
# @Author  : 马拉小龙虾
# @FileName: B站弹幕.py
# @Software: PyCharm Community Edition
# @Blog    :https://blog.csdn.net/weixin_43636302

import requests
import re
import csv
import jieba
import wordcloud
import imageio


url='https://api.bilibili.com/x/v2/dm/history?type=1&oid=221043705&date=2020-08-07'
headers={
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36",
    "cookie": "CURRENT_FNVAL=16; _uuid=B326CD2D-ADC8-3C72-334C-4A2A40721CC147846infoc; buvid3=A24B3812-B914-4A54-B0E7-961056380B82155813infoc; LIVE_BUVID=AUTO8815874584533113; DedeUserID=384518184; DedeUserID__ckMd5=83c10ef34d2c30d2; SESSDATA=94aef4fd%2C1603010487%2C36962*41; bili_jct=b0d997baee67db6eda7444a7291f275f; rpdid=|(J~R)uR|Jk)0J'ul)~Rl)Rml; PVID=1; sid=ix435wln; bfe_id=fdfaf33a01b88dd4692ca80f00c2de7f"
}
res=requests.get(url=url,headers=headers)
res.encoding = 'utf-8'
print(res.text)
print(res.content.decode(encoding='utf-8'))
txt=res.content.decode(encoding='utf-8')
danmu=re.findall('p.*?>(.*?)<',txt)
print(danmu)
# f=open('danmu.csv','w',newline='',encoding='utf-8-sig')
# writer=csv.writer(f)
# # writer.writerow(danmu)
# for i in danmu:
#     writer.writerow([i])
f2=open('danmu.csv','r',newline='',encoding='utf-8')
txt2=f2.read()
print(txt2)
txt_list=jieba.lcut(txt2)
print(txt_list)
string=" ".join(txt_list)
print(string)

mk=imageio.imread('鸣人3.jpg')
w=wordcloud.WordCloud(
    width=1000,
    height=800,
    background_color='white',
    font_path='msyh.ttc',
    scale=15,
    mask=mk,
    stopwords={" "},
    contour_width=5,
    contour_color='red'
)

w.generate(string)
w.to_file('b_danmu.png')

你可能感兴趣的:(Python,爬虫,词云,爬虫,弹幕,哔哩哔哩)