鍚勭缂栫爜鍦ㄥ唴瀛樹腑鎵�鍗犵殑澶у皬:
ascii: 鑻辨枃:8bit (1B)
uft-8: 鑻辨枃:8bit (1B)
涓枃:24bit (3B)
GBK: 鑻辨枃:8bit (1B)
涓枃:16bit (2B)
unicode: 鑻辨枃:32bit (4B)
涓枃:32bit (4B)
python3浠g爜鎵ц杩囩▼:
- 瑙i噴鍣ㄦ壘鍒颁唬鐮佹枃浠�(鏂囦欢浠tf8/GBK..瀛樺偍)锛�
- 鎶婁唬鐮佸瓧绗︿覆鎸夋枃浠跺ご瀹氫箟鐨勭紪鐮佽繘琛岃В鐮佸埌鍐呭瓨锛岃浆鎴恥nicode
- 鎵�鏈夌殑鍙橀噺瀛楃閮戒細浠nicode缂栫爜澹版槑(str鐨勭紪鐮佹柟寮忓氨鏄痷nicode)
unicode鍙湪鍐呭瓨涓繘琛屾樉绀�, 浼犺緭鍜屽瓨鍌ㄩ渶瑕佺敤鍒皍tf8/GBK.., 鎵�浠ュ繀椤昏浆鎴恥tf8/GBK..
str鍜宐ytes鐨勫尯鍒氨鏄紪鐮佹柟寮忕殑涓嶅悓:
str(unicode缂栫爜) ==> bytes(utf8/GBK..) ==> 瀛樺偍, 浼犺緭
bytes = str.encode('utf-8') # 缂栫爜
str = bytes.decode('utf-8') # 瑙g爜
python3涓璼tr鍜宐ytes琛ㄧ幇鍜岀紪鐮�:
鑻辨枃:
str: 琛ㄧ幇鏂瑰紡==>'a'
缂栫爜鏂瑰紡==>0101 unicode
bytes: 琛ㄧ幇鏂瑰紡==>b'a'
缂栫爜鏂瑰紡==>0101 utf8/GBK..
涓枃:
str: 琛ㄧ幇鏂瑰紡==>'涓�'
缂栫爜鏂瑰紡==>0101 unicode
bytes: 琛ㄧ幇鏂瑰紡==>b'x\e9'
缂栫爜鏂瑰紡==>0101 utf8/GBK..
鍦╬ython2涓�:
- u'xxx'涓簎nicode瀵硅薄, 灏辨槸python3涓殑str
- bytes鍜宻tr鏄悓涓�涓被鍨�
s = 'a'
print (s, type(s)) # 'a',
s = u'涓枃'
print(s, type(s)) # u'\u4e2d\u6587',
# 缂栫爜鍙樻垚utf-8, 涓�涓腑鏂囦笁涓瓧鑺�
s1 = s.encode('utf-8')
print(s1, type(s1)) # '\xe4\xb8\xad\xe6\x96\x87',
# bytes鍜宻tr鏄悓涓�涓被鍨�
s1 = 'a'
s2 = bytes('a')
print(s1 is s2) # True