中文与Unicode码互转(utf-8)

实验一、 

text = u'你好,今天天气不错'
text
print(text)

text = '\u4f60\u597d\uff0c\u4eca\u5929\u5929\u6c14\u4e0d\u9519'
text
print(text)

text = u'\u4f60\u597d\uff0c\u4eca\u5929\u5929\u6c14\u4e0d\u9519'
text
print(text)

text = '\\u4f60\\u597d\\uff0c\\u4eca\\u5929\\u5929\\u6c14\\u4e0d\\u9519'
text
print(text)
text = text.encode('utf-8').decode('unicode_escape')
text
print(text)

text = '\\u4f60\\u597d\\uff0c今天天气不错'
text
print(text)

text = '新增\u4f60\u597d\uff0c今天天气不错'
text
print(text)

import re
text = re.sub(r'(\\u[0-9a-fA-F]{4})', lambda matched: matched.group(1).encode('utf-8').decode('unicode_escape'), text)
text
print(text)

你好,今天天气不错
你好,今天天气不错
你好,今天天气不错
\u4f60\u597d\uff0c\u4eca\u5929\u5929\u6c14\u4e0d\u9519
你好,今天天气不错
\u4f60\u597d\uff0c今天天气不错
新增你好,今天天气不错
新增你好,今天天气不错

实验二:

s1 = "\u4f60"
print(s1)
s2 = "\u4f60".encode("utf-8")
print(s2)
s3 = "你".encode("utf-8")
print(s3)

你
b'\xe4\xbd\xa0'
b'\xe4\xbd\xa0'

结论:\u与中文之间可以转换,但不是unicode码。\u后面的序列的原理是什么暂不清楚

你可能感兴趣的:(python)