python支持中文

环境:python 2.7.*

做法:

在.py文件中,要加

# -*- coding:utf-9 -*-

如果要支持中文,必须在中文前面加u,如u‘我是pythoner’


解释:

>>> test1 = "我是pythoner"

>>> test2 = u"我是pythoner"

>>> test1
'\xe6\x88\x91\xe6\x98\xafpythoner'
>>> test2
u'\u6211\u662fpythoner'


>>> test11 = test1.encode('gb2312')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)
>>> test12 = test1.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)
>>> test13 = test1.decode('utf-8')
>>> test13
u'\u6211\u662fpythoner'
>>> test14 = test1.decode('gb2312')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 0-1: illegal multibyte sequence

>>> test15 = test1.decode('utf-8').encode('utf-8')
>>> test15
'\xe6\x88\x91\xe6\x98\xafpythoner'
>>> test16 = test1.decode('utf-8').encode('gb2312')
>>> test16
'\xce\xd2\xca\xc7pythoner'

>>> test21 = test2.encode('gb2312')
>>> test21
'\xce\xd2\xca\xc7pythoner'
>>> test22 = test21.decode('gb2312')
>>> test22
u'\u6211\u662fpythoner'
>>> test23 = test2.encode('gb18030')
>>> test23
'\xce\xd2\xca\xc7pythoner'
>>> test24 = test23.decode('gb2312')
>>> test24
u'\u6211\u662fpythoner'

>>> test25 = test23.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)


结论:

        本来是gb2312码,只能解码为utf-8 ,后编码为gb18030码

        本来是utf-8码,编码为其他码,解码的时候,也要为相应的码。

你可能感兴趣的:(python,utf-8,Codec)