Python之亚马逊反爬虫User-Agent和IP

#1.User-Agent

    user_agent={"user-agent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)"}
    user_agents =['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.8 (KHTML, like Gecko) Beamrise/17.2.0.9 Chrome/17.0.939.0 Safari/535.8','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.36 Safari/535.7','Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.0.249.0 Safari/532.5','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)','Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_8;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50','Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50','Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11','Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11']
    #user_agent=random.choice(user_agents)
    i=random.randint(0,7)
    user_agent['user-agent']=user_agents[i]
    data=None
    request = urllib2.Request(url,data,user_agent)
    html = urllib2.urlopen(request).read()

利用一个数组随机调用不同的UA

常见UA:

1.Firefox

  • Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
  • Mozilla/5.0 (Android; Mobile; rv:14.0) Gecko/14.0 Firefox/14.0

2.Google Chrome

  • Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36
  • Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19

#2.代理IP

代理ip可从网上自行爬取

    url = 'http://www.ip138.com/'
    #代理IP
    proxy = {'http':'111.111.111.111:8080'}
    #创建ProxyHandler
    proxy_support = request.ProxyHandler(proxy)
    #创建Opener
    opener = request.build_opener(proxy_support)
    #添加User Angent
    opener.addheaders = [('User-Agent','Mozilla/5.0')]
    #安装OPener
    request.install_opener(opener)
    #使用安装好的Opener
    response = request.urlopen(url)
    #读取
    html = response.read()

你可能感兴趣的:(python,web,windows)