爬虫B站任意视频 弹幕文字+时间

import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
time_nature=[]
comments=[]

url = input('请输入B站视频链接: ')
res = requests.get(url)
cid = re.findall(r'"cid":(.*?),', res.text)[0]     #其中cid是弹幕对应的id

url = f'https://comment.bilibili.com/{cid}.xml'

print(url)


request = requests.get(url)#获取页面
request.encoding='utf8'#因为是中文,我们需要进行转码,否则出来的都是unicode
soup = BeautifulSoup(request.text, 'lxml') 
results = soup.find_all('d')
for t in soup.find_all('d'):  # for循环遍历所有d标签,并把返回列表中的内容赋给t      
    time_nature.append(t.attrs['p'])            
    comments.append(t.text)              
    print(t.attrs['p']) 
    print(t.text)  

df = pd.DataFrame()
df['时间属性'] = time_nature
df['弹幕内容'] = comments

df.to_excel('b站弹幕.xls')





爬虫B站任意视频 弹幕文字+时间_第1张图片
爬虫B站任意视频 弹幕文字+时间_第2张图片

你可能感兴趣的:(python爬虫,python)