UTF-8

http://code.alexreisner.com/articles/character-encoding.html

对于以UTF-8编码的字节:

if it starts with 0 it’s an ASCII character
if it starts with 10 it’s a continuation of a multi-byte character
if it starts with 110 it’s the first byte of a 2-byte character
if it starts with 1110 it’s the first byte of a 3-byte character
if it starts with 11110 it’s the first byte of a 4-byte character

你可能感兴趣的:(utf-8)