爬虫---练习源码

选取的是网上对一些球员的评价,来评选谁更加伟大一点 

import csv
import requests
import re
import time

def main(page):
    url = f'https://tieba.baidu.com/p/7882177660?pn={page}'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
    }
    resp = requests.get(url,headers=headers)
    html = resp.text
    # 评论内容
    comments = re.findall('style="display:;">                    (.*?)
',html) # 评论用户 users = re.findall('class="p_author_name j_user_card" href=".*?" target="_blank">(.*?)',html) # 评论时间 comment_times = re.findall('楼(.*?)50: continue csvwriter.writerow((u,t,c)) print(u,t,c) print(f'第{page}页爬取完毕') if __name__ == '__main__': with open('01.csv','a',encoding='utf-8')as f: csvwriter = csv.writer(f) csvwriter.writerow(('评论用户','评论时间','评论内容')) for page in range(1,8): # 爬取前7页的内容 main(page) time.sleep(2)

爬虫---练习源码_第1张图片

 

你可能感兴趣的:(爬虫)