python爬虫模式_python爬虫学习笔记(2)-----代理模式

一、UserAgent

UserAgent 中文意思是用户代理,简称UA,它是一个特殊字符串头,使得服务器能够识别用户

设置UA的两种方式:

1、heads

1 from urllib importrequest, error2 if '__name__' == '__main__':3 url = "http://www.baidu.com"

4 try:5 headers ={}6 headers['User-Agrnt'] = "User-Agent","Mozilla/5.0 (compatible; MSIE 9.0; Windows Phone OS 7.5; Trident/5.0; IEMobile/9.0; HTC; Titan)"

7 req = request.Request(url, headers =headers)8 rsp =request.urlopen(req)9 html =rsp.read().decode()10 print(html)11 excepterror.HTTPError as e:12 print(e)13 excepterror.URLError as e:14 print(e)15 exceptException as e:16 print(e)

2、使用add_header

1 from urllib importrequest, error2 if __name__ = '__main__':3 url = "http://www.baidu.com"

4 try:5 req =request.Request(url)6 req.add_header('User-Agent',"Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.33 Safari/534.3 SE 2.X MetaSr 1.0")7 rsp =request.urlopen(req)8 html =rsp.read().decode()9 print(html)10 excepterror.HTTPError as e:11 print(e)12 excepterror.URLEror as e:13 print(e)14 exceptExcecption as e:15 print(e)

二、ProxyHandler(代理服务器)

由于很多网站会监测某一段时间内某个IP的访问次数,如果访问次数过多过快,它会禁止这个IP访问。所以我们可以设置一些代理服务器,就算IP被禁止依然可以换个IP继续访问。代理用来隐藏真实访问,代理也不许频繁访问某一个固定网址,所以,代理一定得多

基本使用步骤:

1、设置代理地址

2、创建ProxyHandler

3、创建Opener

4、安装Opener

获取代理服务器地址

- www.xicidaili.com

-  www.goubanjia.com

1 from urllib importrequest, error2 if __name__ == '__main__':3 url = "http://www.baidu.com"

4 #设置代理地址

5 proxy = {'http': '27.203.245.212:8060'}6 #创建ProxyHandler

7 proxy_handler =request.ProxyHandler(proxy)8 #创建Opener

9 opener =request.build_opener(proxy_handler)10 #安装Opener

11 request.install_opener(opener)12

13 try:14 rsp =request.Request(url)15 req =request.urlopen(rsp)16 html = req.read().decode('utf-8')17 print(html)18 excepterror.HTTPError as e:19 print(e)20 excepterror.URLError as e:21 print(e)22 exceptException as e:23 print(e)

如果输出显示:   把第五行的IP地址换一个即可

你可能感兴趣的:(python爬虫模式)