Python 爬虫学习 urllib2

用urllib2抓取被限制的网站页面

  1. # coding:utf-8
    
    
    
    import urllib2
    
    
    
    url = "http://blog.csdn.net/troubleshooter"
    
    
    
    html = urllib2.urlopen(url)
    
    
    
    print html.read()
    
    

      返回403错误

  2. 模拟用户访问
    # coding:utf-8
    
    
    
    import urllib2
    
    
    
    url = "http://blog.csdn.net/troubleshooter"
    
    
    
    url_headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36',
    
    				'Referer':'http://www.cnblogs.com/evilxr/p/4038902.html',
    
    				'Host':'blog.csdn.net',
    
    				'GET':url
    
    				}
    
    
    
    
    
    
    
    req = urllib2.Request(url, headers=url_headers)
    
    html = urllib2.urlopen(req)
    
    print html.getcode()
    200
    
    [Finished in 0.4s]
    
    

      

      

你可能感兴趣的:(python)