浏览器伪装

服务器根据User-Agent判断是爬虫还是浏览器

from urllib import request

url = 'https://blog.csdn.net/liona_koukou/article/details/74391977'
header = ('User-Agent',
          'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36')
# build_opener添加报头信息
opener = request.build_opener()
opener.addheaders = [header]
# 通过有报头的方式打开url
data = opener.open(url).read()
with open('test04.html', 'wb') as f:
    f.write(data)

你可能感兴趣的:(爬虫)