1. 问题的提出
在代码中,忽然碰到了如下错误:
'2017-04-25 13:15:13 PM' HttpUtil.py[line:30] ERROR HttpUtil:Header value 1 must be of type str or bytes, not <class 'int'>使用的代码如下:
response = requests.get(url, headers=headers, cookies=cookies)主体的代码是使用requests进行Http的页面调用访问。
引入import traceback来打印错误信息, 在异常捕获部分添加traceback的打印语句:
traceback.print_exc()其将详细的异常栈打印到控制台上去。
Traceback (most recent call last): File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/crawler/web-crawler/dyproxy/IPProxyPool/spider/ProxyCrawl.py", line 34, in startProxyCrawl crawl.run() File "/opt/crawler/web-crawler/dyproxy/IPProxyPool/spider/ProxyCrawl.py", line 49, in run parserList = self.urltaker.parse_list() File "/opt/crawler/web-crawler/dyproxy/IPProxyPool/spider/urls/URLRetriever.py", line 35, in parse_list site_config['urls'] = getattr(self, 'retrieve_' + site.xpath("@name")[0])(site.xpath("@url")[0], site.xpath('./setting/@url')[0]) File "/opt/crawler/web-crawler/dyproxy/IPProxyPool/spider/urls/URLRetriever.py", line 110, in retrieve_mimiip response = HttpUtil.get(site_url % proxy, headers=all_headers['mimiip']) File "/opt/crawler/web-crawler/dyproxy/IPProxyPool/util/HttpUtil.py", line 31, in get traceback.print_exc() File "/usr/local/python3/lib/python3.6/traceback.py", line 159, in print_exc print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain) File "/usr/local/python3/lib/python3.6/traceback.py", line 101, in print_exception print(line, file=file, end="")基本确定就是headers或者cookies的问题。
2. 问题分析
基于错误信息,感觉是headers中的数据类型问题,期望string,但是实际上是int类型。经过检查发现是在自定义的header中:
"mimiip": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding":"gzip, deflate", "Accept-Language":"zh-CN,zh;q=0.8,de;q=0.6,en;q=0.4,zh-TW;q=0.2", "Cache-Control":"Max-Age=0", "Connection":"keep-alive", "DNT":1, "Host":"www.mimiip.com", "Referer":"http://www.mimiip.com/gngao/", "If-None-Match":"169bceb52ed4a052e3b0f82bda5b0462", "Upgrade-Insecure-Requests":1, "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3063.4 Safari/537.36" }其中使用了两个数字类型的,估计应该是这个位置的问题。
将上述中的int类型的数据,修改为string类型,则可以正常访问。
4. 总结
在Http请求中,所有的都是字符串,没有数字类型的,尤其是在构建的请求中,请注意这个问题。