python urllib3 multipart/form-data 的坑
################################3.上传文件时带了个header与urllib3.request.py不兼容
传header的Content-Type时,应区分大小写
def request_encode_body(self, method, url, fields=None, headers=None,
encode_multipart=True, multipart_boundary=None,
**urlopen_kw):
...
extra_kw['body'] = body
extra_kw['headers'] = {'Content-Type': content_type}#这里header中必须是Content-Type,如果用户传来的header中已经有了'content-type', 则最终实际会出现2个这个重复值, 访问接口时会报错
extra_kw['headers'].update(headers)
...
#实际使用的header为:
#'content-type': 'multipart/form-data; boundary=----WebKitFormBoundary0b1047518e88296',
#'Content-Type': 'multipart/form-data; boundary=----WebKitFormBoundary0b1047518e88296'
################################2.上传文件后缀总是为null的问题解决
上传文件后缀总是为null的问题解决
注意这里的example.jpeg只能是文件名, 不能是带路径的, 若写成"\\image\\example.jpeg"上传的文件后缀就成null了.
fields={
'text': ('None', "upload a string",'None'),
'file': ('example.jpeg',file_data,"image/jpeg"),
}
##################################1
在一个项目中用到python urllib3上传文件
urllib3.disable_warnings()
http = urllib3.PoolManager()
with open('example.jpeg','rb') as fp:
file_data = fp.read()
r = http.request(
'POST',
'http://httpbin.org/post',
headers = {'Content-Type':"multipart/form-data"},
fields={
'text': ('None', "upload a string",'None'),
'file': ('example.jpeg',file_data,"image/jpeg"),
})
结果上传错误, 试着打印出urllib3组成的post body, 打开D:\python27\Lib\site-packages\urllib3\request.py的request_encode_body函数, 在return 前面加上print extra_kw, 打印出的body结果如下:
我使用postman发送该接口上传文件都是对的:
点击Send下面一行的红色 Code,HTTP显示的是:
发现这里的boundary与python中的urllib3不一样. 原来urllib3 是post form-data时, 没有传multipart_boundary该参数会自动生成:D:\python27\Lib\site-packages\urllib3\filepost.py
def encode_multipart_formdata(fields, boundary=None):
...
if boundary is None:
boundary = choose_boundary()
def choose_boundary():
"""
Our embarrassingly-simple replacement for mimetools.choose_boundary.
"""
return uuid4().hex#生成的是32为的唯一码
试着在python中修改为以WebKitFormBoundary开头的试着再执行, 还是报错, 用同样的方法, 查找到headers的Content-Type没有带上boundary,在header中明明带了Content-Type:multipart/form-data, 调试发现:header中传了Content-Type后, urllib3生成的带boundary的Content-Type会被传进来的覆盖掉,如下:
D:\python27\Lib\site-packages\urllib3\request.py
def request_encode_body(self, method, url, fields=None, headers=None,
encode_multipart=True, multipart_boundary=None,
**urlopen_kw):
if headers is None:
headers = self.headers
extra_kw = {'headers': {}}
if fields:
if 'body' in urlopen_kw:
raise TypeError(
"request got values for both 'fields' and 'body', can only specify one.")
if encode_multipart:
body, content_type = encode_multipart_formdata(fields, boundary=multipart_boundary)#产生了新的content-type
else:
body, content_type = urlencode(fields), 'application/x-www-form-urlencoded'
extra_kw['body'] = body
extra_kw['headers'] = {'Content-Type': content_type}#也将产生的新的content-type加到headers中了
extra_kw['headers'].update(headers)#一更新headers又还原成传进来的headers了
extra_kw.update(urlopen_kw)
return self.urlopen(method, url, **extra_kw)
总结可行方案:
1.http.request传的参数中带fields时,带上特定的boundary(以WebKitFormBoundary开头的, 原因未知, 知道的朋友可以留言告知),headers中不要带Content-Type;
2.http.request传的参数要带上headers的话,格式应该是:Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW
最终的代码修改:
urllib3.disable_warnings()
http = urllib3.PoolManager()
with open('example.jpeg','rb') as fp:
file_data = fp.read()
boundary = '----WebKitFormBoundary7MA4YWxkTrZu0gW'
#方案1
r = http.request(
'POST',
'http://httpbin.org/post',
headers = {},
multipart_boundary = boundary,
fields={
'text': ('None', "upload a string",'None'),
'file': ('example.jpeg',file_data,"image/jpeg"),
})
#方案2
r = http.request(
'POST',
'http://httpbin.org/post',
headers = {'Content-Type':"multipart/form-data;%s"%boundary},
multipart_boundary = boundary,
fields={
'text': ('None', "upload a string",'None'),
'file': ('example.jpeg',file_data,"image/jpeg"),
})