环境:python 2.7.*
做法:
在.py文件中,要加
# -*- coding:utf-9 -*-
如果要支持中文,必须在中文前面加u,如u‘我是pythoner’
解释:
>>> test2 = u"我是pythoner"
>>> test1
'\xe6\x88\x91\xe6\x98\xafpythoner'
>>> test2
u'\u6211\u662fpythoner'
>>> test11 = test1.encode('gb2312')
>>> test15 = test1.decode('utf-8').encode('utf-8')
>>> test15
'\xe6\x88\x91\xe6\x98\xafpythoner'
>>> test16 = test1.decode('utf-8').encode('gb2312')
>>> test16
'\xce\xd2\xca\xc7pythoner'
>>> test21 = test2.encode('gb2312')
>>> test21
'\xce\xd2\xca\xc7pythoner'
>>> test22 = test21.decode('gb2312')
>>> test22
u'\u6211\u662fpythoner'
>>> test23 = test2.encode('gb18030')
>>> test23
'\xce\xd2\xca\xc7pythoner'
>>> test24 = test23.decode('gb2312')
>>> test24
u'\u6211\u662fpythoner'
>>> test25 = test23.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
结论:
本来是gb2312码,只能解码为utf-8 ,后编码为gb18030码
本来是utf-8码,编码为其他码,解码的时候,也要为相应的码。