python缂栫爜闂

鍚勭缂栫爜鍦ㄥ唴瀛樹腑鎵�鍗犵殑澶у皬:

ascii:    鑻辨枃:8bit (1B)

uft-8:    鑻辨枃:8bit (1B)
          涓枃:24bit (3B)

GBK:      鑻辨枃:8bit (1B)
          涓枃:16bit (2B)

unicode:  鑻辨枃:32bit (4B)
          涓枃:32bit (4B)

python3浠g爜鎵ц杩囩▼:

  1. 瑙i噴鍣ㄦ壘鍒颁唬鐮佹枃浠�(鏂囦欢浠tf8/GBK..瀛樺偍)锛�
  2. 鎶婁唬鐮佸瓧绗︿覆鎸夋枃浠跺ご瀹氫箟鐨勭紪鐮佽繘琛岃В鐮佸埌鍐呭瓨锛岃浆鎴恥nicode
  3. 鎵�鏈夌殑鍙橀噺瀛楃閮戒細浠nicode缂栫爜澹版槑(str鐨勭紪鐮佹柟寮忓氨鏄痷nicode)

unicode鍙湪鍐呭瓨涓繘琛屾樉绀�, 浼犺緭鍜屽瓨鍌ㄩ渶瑕佺敤鍒皍tf8/GBK.., 鎵�浠ュ繀椤昏浆鎴恥tf8/GBK..

str鍜宐ytes鐨勫尯鍒氨鏄紪鐮佹柟寮忕殑涓嶅悓:


str(unicode缂栫爜)      ==>     bytes(utf8/GBK..)       ==>         瀛樺偍, 浼犺緭
bytes = str.encode('utf-8')               # 缂栫爜
str = bytes.decode('utf-8')               # 瑙g爜

python3涓璼tr鍜宐ytes琛ㄧ幇鍜岀紪鐮�:

鑻辨枃:
    str:    琛ㄧ幇鏂瑰紡==>'a'
            缂栫爜鏂瑰紡==>0101      unicode

    bytes:  琛ㄧ幇鏂瑰紡==>b'a'
            缂栫爜鏂瑰紡==>0101      utf8/GBK..


涓枃:
    str:    琛ㄧ幇鏂瑰紡==>'涓�'
            缂栫爜鏂瑰紡==>0101      unicode

    bytes:  琛ㄧ幇鏂瑰紡==>b'x\e9'
            缂栫爜鏂瑰紡==>0101      utf8/GBK..

鍦╬ython2涓�:

  1. u'xxx'涓簎nicode瀵硅薄, 灏辨槸python3涓殑str
  2. bytes鍜宻tr鏄悓涓�涓被鍨�
s = 'a'
print (s, type(s))              # 'a', 


s = u'涓枃'
print(s, type(s))               # u'\u4e2d\u6587', 
# 缂栫爜鍙樻垚utf-8, 涓�涓腑鏂囦笁涓瓧鑺�
s1 = s.encode('utf-8')
print(s1, type(s1))             # '\xe4\xb8\xad\xe6\x96\x87', 


# bytes鍜宻tr鏄悓涓�涓被鍨�
s1 = 'a'
s2 = bytes('a')
print(s1 is s2)                 # True

你可能感兴趣的:(python缂栫爜闂)