著名站点的爬虫 —— 豆瓣

0. 网址分类

  • 大类:
    • https://book.douban.com/
    • https://music.douban.com/
    • https://movie.douban.com/
      • https://movie.douban.com/subject/电影ID/
  • 小类:
    • 评论:https://movie.douban.com/subject/xxx/comments

1. 爬取“喜欢这部剧集的人也喜欢 ”

import requests
from bs4 import BeautifulSoup

url = "https://movie.douban.com/subject/25953429/"
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

also_likes = set()
links = soup.find_all('dd')
for link in links:
    also_like = link.find_next('a')['href']
    also_likes.add(also_like)

2. 电影评论

https://mp.weixin.qq.com/s/uTIhyNVE7W6mGMneSKQNlw

转载于:https://www.cnblogs.com/mtcnn/p/9421077.html

你可能感兴趣的:(著名站点的爬虫 —— 豆瓣)