B站弹幕爬虫

API接口:
http://comment.bilibili.com/72036817.xml
https://api.bilibili.com/x/v1/dm/list.so?oid=9931722
数字是av号

但不是全部弹幕,只有一千条

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://comment.bilibili.com/72036817.xml'
html = requests.get(url).content
html_data = str(html, 'utf-8')
soup = BeautifulSoup(html_data, 'lxml')
results = soup.find_all('d')

comments = [comment.text for comment in results]
comments_dict = {'comments': comments}

df = pd.DataFrame(comments_dict)
df.to_csv('bilibili.csv', encoding='utf-8')

你可能感兴趣的:(B站弹幕爬虫)