首先明确b'xxx'
这种样式的不是字符串类型,而是二进制数据:
In [1]: s = 'hello world'
In [2]: s.encode('ascii')
Out[2]: b'hello world'
In [3]: type(s)
Out[3]: str
In [4]: type(s.encode('ascii'))
Out[4]: bytes
如果不含中文,直接使用str/decode
均可,但str出来的多了个b'
,需要注意:
In [5]: b = s.encode('ascii')
In [6]: b
Out[6]: b'hello world'
In [7]: str(b)
Out[7]: "b'hello world'"
In [8]: b.decode('utf-8')
Out[8]: 'hello world'
b'\xe5\x93\x88\xe5\x96\xbd'
如果对编码比较熟悉的伙伴直接看出来这是utf-8编码的二进制串,此时直接解码即可:
In [15]: b
Out[15]: b'\xe5\x93\x88\xe5\x96\xbd'
In [16]: b.decode('utf-8')
Out[16]: '哈喽'
b'{"errno":0,"data":[{"k":"\\u5468\\u6770\\u4f26","v":"\\u540d. Jay Chou; The New King of Asian Pop \\u4ee3. \\u65e0\\u4e0e\\u4f26\\u6bd4"}]}'
\u开头稍微有点陌生,但搜索一下可以查到解码方法:
In [17]: b'{"errno":0,"data":[{"k":"\\u5468\\u6770\\u4f26","v":"\\u540d. Jay Cho
...: u; The New King of Asian Pop \\u4ee3. \\u65e0\\u4e0e\\u4f26\\u6bd4"}]}'
...: .decode('unicode_escape')
Out[17]: '{"errno":0,"data":[{"k":"周杰伦","v":"名. Jay Chou; The New King of Asian Pop 代. 无与伦比"}]}'
In [20]: json.loads(b'{"errno":0,"data":[{"k":"\\u5468\\u6770\\u4f26","v":"\\u540d. Jay Chou; The New King of Asian Pop \\u
...: 4ee3. \\u65e0\\u4e0e\\u4f26\\u6bd4"}]}')
Out[20]:
{'errno': 0,
'data': [{'k': '周杰伦', 'v': '名. Jay Chou; The New King of Asian Pop 代. 无与伦比'}]}