2019-01-25豆瓣书评爬取

https://www.douban.com/robots.txt

robots.txt 相应网站的爬虫协议,注意看有没有不让抓取的网页

import requests
from bs4 import BeautifulSoup
r =requests.get("https://book.douban.com/subject/4923621/")
soup =BeautifulSoup(r.text,"lxml")
pattern =soup.find_all("span","short")
for item in pattern:
print(item.string)

导入requests,BeautifulSoup模块,requests.get抓取网页,BeautifulSoup(r.text,"lxml")解析成标签tag,find_all转换成字典,然后打印item.string字符串

你可能感兴趣的:(2019-01-25豆瓣书评爬取)