原创文章,转载请务必在文章开头处注明出自 nashuiliang,并给出原文链接
问题阐述
Python 用 requests 包(版本2.9.1), 使用接口 requests.post
def post(url, data=None, json=None, **kwargs):
"""Sends a POST request.
:param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param **kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response ` object
:rtype: requests.Response
"""
return request('post', url, data=data, json=json, **kwargs)
上传文件,文件名为中文 我们chuangwang.txt
url = 'http://jianxun.io/f'
files = {'file1': (u'我们chuangwang.txt'.encode("utf-8"),
open('/tmp/a.txt', 'rb'), 'text/plain')}
r = requests.post(url, files=files)
print r.text
抛出 UnicodeDecodeError
异常,异常堆栈为
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/api.py", line 107, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/sessions.py", line 454, in request
prep = self.prepare_request(req)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/sessions.py", line 388, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 296, in prepare
self.prepare_body(data, files, json)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 447, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 153, in _encode_files
rf.make_multipart(content_type=ft)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 174, in make_multipart
(('name', self._name), ('filename', self._filename))
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 134, in _render_parts
parts.append(self._render_part(name, value))
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 114, in _render_part
return format_header_param(name, value)
File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 38, in format_header_param
result.encode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 10: ordinal not in range(128)
requests 包默认将 name
(例子中为file1
), filename
(例子中为u'.txt'.encode("utf-8")
, 即为\xe7\xae\x80\xe4\xb9\xa6.txt
), 编码成 ASCII, 但是\xe7\xae\x80\xe4\xb9\xa6
已经超过ASCII所代之的范围,具体的范围,参见:ASCII表。具体的 Python2 编码问题这里不进行仔细说明。(这可是工程师们必须要越过的坎)
requests 包无法实现 Unicode 文件名的上传,下面自己利用 urllib2 库裸写 HTTP 的 Body 实现 Unicode 文件名上传。
Python HTTP Form-based
参见文档 RFC1867 的说明。
最终 HTTP body 和部分 HTTP header 格式如下(并附带解释说明)
01> Content-Type: multipart/form-data; boundary=hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
02> Content-Length: 424\r\n
03> \r\n
04> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
05> Content-Disposition: form-data; name="fromname"\r\n
06> \r\n
07> chuangwang\r\n
08> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
09> Content-Disposition: form-data; name="from"\r\n
10> \r\n
11> [email protected]\r\n
12> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
13> Content-Disposition: form-data; name="file"; filename="\xe6\x88\x91\xe4\xbb\xacchuangwang.txt"\r\n
14> Content-Type: text/plain\r\n
15> Content-Transfer-Encoding: binary\r\n
16> \r\n
17> chuangwang\n\r\n
18> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz--\r\n
解释如下:
- Line 01 ~ 02: 为 HTTP header 的一部分
- Line 02:
Content-Type
的值需要设为 multipart/form-data - Line 01 ~ 18: CRLF(
\r\n
) is line separator - Line 01 ~ 18:
boundary
(hLSZW1P5ozZjetVIbDWwihCQwNVVTz) 是个分隔符,随机生成的,只要保证你的数据中不存在就可以。--boundary\r\n
为块分隔符,将 Body 分成多个部分,每个部分开始都为Content-Disposition
,如果某一部分 form-data 不是文件,前缀为Content-Disposition: form-data; name="name"
,如果 某一部分 form-data 为文件,前缀为Content-Disposition: form-data; name="name"; filename="filename"
;Content-Type
等可选的值;CRLF;数据内容;CRLF。 - Line 18:
--hLSZW1P5ozZjetVIbDWwihCQwNVVTz--\r\n
为结束行,格式为--{boundary}--\r\n
注明:
- 有表示成
title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates
详情请参照 RFC5987- 格式不能错,该有空格的有空格,该用 CRLF 的要有,详细的说明请参照 RFC1867
Python 代码实现
encode_files_multipart
实现如下
_BOUNDARY_ALLOWED_CHARS = string.digits + string.ascii_letters
def encode_files_multipart(data, files):
boundary = ''.join(random.choice(_BOUNDARY_ALLOWED_CHARS) for i in range(30))
special_separator = "--" + boundary
lines = []
for name, value in data.items():
lines.extend((
special_separator,
'Content-Disposition: form-data; name="%s"' % str(name),
'',
str(value.encode("utf-8")),
))
for name, value in files.items():
filename = value["filename"]
lines.extend((
special_separator,
'Content-Disposition: form-data; name="%s"; filename="%s"' % (str(name), str(filename)),
'Content-Type: %s' % value["mimetype"],
'Content-Transfer-Encoding: binary',
'',
value['content'],
))
lines.extend((
special_separator + "--",
'',
))
body = '\r\n'.join(lines)
headers = {
'Content-Type': 'multipart/form-data; boundary=%s' % boundary,
'Content-Length': str(len(body)),
}
return (body, headers)
function: encode_files_multipart
返回处理后的 header 和 body。
调用代码:
fields = {
'from': '[email protected]',
'fromname': 'chuangwang',
}
files = {"file": {"filename": u"我们chuangwang.txt".encode("utf-8"), "content": open("/tmp/a.txt", "rb").read(), "mimetype": "text/plain"}}
data, headers = encode_multipart(fields, files)
headers["Accept"] = "*/*"
headers["User-Agent"] = "python-requests/2.7.10"
r = urllib2.Request("https://jianxun.io/f", data=data, headers=headers)
response = urllib2.urlopen(r)
print response.read()
注明:函数
encode_files_multipart
仅仅实现很少的功能,并没做大量的兼容性,开发者可以根据其自己拓展。
如果大家有其他的办法或想法,欢迎交流 (email: [email protected])