Python HTTP Form-based File Upload

原创文章,转载请务必在文章开头处注明出自 nashuiliang,并给出原文链接

Python HTTP Form-based File Upload_第1张图片
Screen Shot 2016-01-13 at 10.38.29 PM.png

问题阐述

Python 用 requests 包(版本2.9.1), 使用接口 requests.post

def post(url, data=None, json=None, **kwargs):
    """Sends a POST request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param **kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response ` object
    :rtype: requests.Response
    """

    return request('post', url, data=data, json=json, **kwargs)

上传文件,文件名为中文 我们chuangwang.txt

url = 'http://jianxun.io/f'
files = {'file1': (u'我们chuangwang.txt'.encode("utf-8"),
                  open('/tmp/a.txt', 'rb'), 'text/plain')}
r = requests.post(url, files=files)
print r.text

抛出 UnicodeDecodeError 异常,异常堆栈为

 File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 296, in prepare
    self.prepare_body(data, files, json)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 447, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/models.py", line 153, in _encode_files
    rf.make_multipart(content_type=ft)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 174, in make_multipart
    (('name', self._name), ('filename', self._filename))
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 134, in _render_parts
    parts.append(self._render_part(name, value))
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 114, in _render_part
    return format_header_param(name, value)
  File "/home/vagrant/.buildout/requests-2.9.1-py2.7.egg/requests/packages/urllib3/fields.py", line 38, in format_header_param
    result.encode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 10: ordinal not in range(128)

requests 包默认将 name (例子中为file1), filename (例子中为u'.txt'.encode("utf-8"), 即为\xe7\xae\x80\xe4\xb9\xa6.txt), 编码成 ASCII, 但是\xe7\xae\x80\xe4\xb9\xa6 已经超过ASCII所代之的范围,具体的范围,参见:ASCII表。具体的 Python2 编码问题这里不进行仔细说明。(这可是工程师们必须要越过的坎)
requests 包无法实现 Unicode 文件名的上传,下面自己利用 urllib2 库裸写 HTTP 的 Body 实现 Unicode 文件名上传。

Python HTTP Form-based

参见文档 RFC1867 的说明。
最终 HTTP body 和部分 HTTP header 格式如下(并附带解释说明)

01> Content-Type: multipart/form-data; boundary=hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
02> Content-Length: 424\r\n
03> \r\n
04> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
05> Content-Disposition: form-data; name="fromname"\r\n
06> \r\n
07> chuangwang\r\n
08> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
09> Content-Disposition: form-data; name="from"\r\n
10> \r\n
11> [email protected]\r\n
12> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz\r\n
13> Content-Disposition: form-data; name="file"; filename="\xe6\x88\x91\xe4\xbb\xacchuangwang.txt"\r\n
14> Content-Type: text/plain\r\n
15> Content-Transfer-Encoding: binary\r\n
16> \r\n
17> chuangwang\n\r\n
18> --hLSZW1P5ozZjetVIbDWwihCQwNVVTz--\r\n

解释如下:

  • Line 01 ~ 02: 为 HTTP header 的一部分
  • Line 02: Content-Type 的值需要设为 multipart/form-data
  • Line 01 ~ 18: CRLF(\r\n) is line separator
  • Line 01 ~ 18: boundary(hLSZW1P5ozZjetVIbDWwihCQwNVVTz) 是个分隔符,随机生成的,只要保证你的数据中不存在就可以。--boundary\r\n 为块分隔符,将 Body 分成多个部分,每个部分开始都为Content-Disposition,如果某一部分 form-data 不是文件,前缀为Content-Disposition: form-data; name="name",如果 某一部分 form-data 为文件,前缀为Content-Disposition: form-data; name="name"; filename="filename"Content-Type等可选的值;CRLF;数据内容;CRLF。
  • Line 18: --hLSZW1P5ozZjetVIbDWwihCQwNVVTz--\r\n 为结束行,格式为--{boundary}--\r\n

注明:

  • 有表示成title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates详情请参照 RFC5987
  • 格式不能错,该有空格的有空格,该用 CRLF 的要有,详细的说明请参照 RFC1867

Python 代码实现

encode_files_multipart实现如下

_BOUNDARY_ALLOWED_CHARS = string.digits + string.ascii_letters
def encode_files_multipart(data, files):
    boundary = ''.join(random.choice(_BOUNDARY_ALLOWED_CHARS) for i in range(30))
    special_separator = "--" + boundary
    lines = []

    for name, value in data.items():
        lines.extend((
            special_separator,
            'Content-Disposition: form-data; name="%s"' % str(name),
            '',
            str(value.encode("utf-8")),
        ))

    for name, value in files.items():
        filename = value["filename"]
        lines.extend((
            special_separator,
            'Content-Disposition: form-data; name="%s"; filename="%s"' % (str(name), str(filename)),
            'Content-Type: %s' % value["mimetype"],
            'Content-Transfer-Encoding: binary',
            '',
            value['content'],
        ))

    lines.extend((
        special_separator + "--",
        '',
    ))
    body = '\r\n'.join(lines)

    headers = {
        'Content-Type': 'multipart/form-data; boundary=%s' % boundary,
        'Content-Length': str(len(body)),
    }

    return (body, headers)

function: encode_files_multipart 返回处理后的 header 和 body。
调用代码:

    fields = {
        'from': '[email protected]',
        'fromname': 'chuangwang',
    }
    files = {"file": {"filename": u"我们chuangwang.txt".encode("utf-8"), "content": open("/tmp/a.txt", "rb").read(), "mimetype": "text/plain"}}
    data, headers = encode_multipart(fields, files)
    headers["Accept"] = "*/*"
    headers["User-Agent"] = "python-requests/2.7.10"
    r = urllib2.Request("https://jianxun.io/f", data=data, headers=headers)
    response = urllib2.urlopen(r)
    print response.read()

注明:函数encode_files_multipart 仅仅实现很少的功能,并没做大量的兼容性,开发者可以根据其自己拓展。

如果大家有其他的办法或想法,欢迎交流 (email: [email protected])

你可能感兴趣的:(Python HTTP Form-based File Upload)