Urllib+BeautifulSoup(抓取豆瓣书评)

更多爬虫实例请见 https://blog.csdn.net/weixin_39777626/article/details/81564819

from urllib.request import urlopen
from bs4 import BeautifulSoup

list=[]
def getUrl(url):
    try:
        douban=urlopen(url)
        bs4=BeautifulSoup(douban,'lxml')
        comments=bs4.find_all('div',class_='comment')
        for comment in comments:
            comment=comment.find('p',class_='comment-content')
            list.append(comment.text)
        return list
    except:
        return '...'
    
for i in range(1,9):
    getUrl('https://book.douban.com/subject/26829016/comments/hot?p=%d'%i)
    
for j in list:
    print(j)

更多爬虫实例请见 https://blog.csdn.net/weixin_39777626/article/details/81564819

你可能感兴趣的:(爬虫)