阅读更多
python 中的字符集设定问题
python默认的编码是ascii,为改变默认编码,在文件的第一行,或者紧挨"#!"所在行的后面添加
# -*- coding: codetype -*-
codetype 可以是已经识别的一种,中文下,可以是 gbk,gb2312,gb18030,big5,需要相应的库支持
显示问题
ss="python问题"
str(ss)
'python\xce\xca\xcc\xe2'
repr(ss)
"'python\\xce\\xca\\xcc\\xe2'"
后者表示ss的存储格式
print ss
则会将ss解码,输出“python问题”
python 中的编码转码问题
print str.decode.__doc__
S.decode([encoding[,errors]]) -> object
Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
按照指定的编码类型给字符串解码,解码后的编码为默认编码
errors指定处理错误的行为
如果出错了,不需要报告,设定errors ="ignore"
默认报告任何错误
errors = 'strict' , Raise UnicodeError (or a subclass); this is the default.
errors="replace" ,错误的编码用"?"替换
print str.encode.__doc__
S.encode([encoding[,errors]]) -> object
Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.
...
errors ='xmlcharrefreplace' ,使用XML的字符引用 (only for encoding).
errors = 'backslashreplace' Replace with backslashed escape sequences (only for encoding).
将一个串从gb2312转成'utf8'
ss.decode("gb2312","ignore").encode("utf-8")
print unicode.__doc__
unicode(string [, encoding[, errors]]) -> object
Create a new Unicode object from the given encoded string.
encoding defaults to the current default string encoding.
errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.