褰诲簳寮勬噦 Unicode 缂栫爜
鍓嶈█
涓轰粈涔堣鏈夌紪鐮侊紵
澶у闇�瑕佹槑纭殑鏄湪璁$畻鏈洪噷鎵�鏈夌殑鏁版嵁閮芥槸瀛楄妭鐨勫舰寮忓瓨鍌ㄣ�佸鐞嗙殑銆傛垜浠渶瑕佽繖浜涘瓧鑺傛潵琛ㄧず璁$畻鏈洪噷鐨勪俊鎭�備絾鏄繖浜涘瓧鑺傛湰韬張鏄病鏈変换浣曟剰涔夌殑锛屾墍浠ユ垜浠渶瑕佸杩欎簺瀛楄妭璧嬩簣瀹為檯鐨勬剰涔夈�傛墍浠ユ墠浼氬埗瀹氬悇绉嶇紪鐮佹爣鍑嗐��
缂栫爜妯″瀷
棣栧厛闇�瑕佹槑纭殑鏄瓨鍦ㄤ袱绉嶇紪鐮佹ā鍨�
A锛氱畝鍗曞瓧绗﹂泦
鍦ㄨ繖绉嶇紪鐮佹ā鍨嬮噷锛屼竴涓瓧绗﹂泦瀹氫箟浜嗚繖涓瓧绗﹂泦閲屽寘鍚粈涔堝瓧绗︼紝鍚屾椂鎶婃瘡涓瓧绗﹀浣曞搴旀垚璁$畻鏈洪噷鐨勬瘮鐗逛篃杩涜浜嗗畾涔夈�備緥濡� ASCII锛屽湪 ASCII 閲岀洿鎺ュ畾涔変簡 A -> 0100 0001銆備篃灏辨槸 ASCII 鐩存帴瀹屾垚浜嗙幇浠g紪鐮佹ā鍨嬬殑鍓嶄笁姝ュ伐浣溿��
B锛氱幇浠g紪鐮佹ā鍨�
鍦ㄧ幇浠g紪鐮佹ā鍨嬮噷瑕佺煡閬撲竴涓瓧绗﹀浣曟槧灏勬垚璁$畻鏈洪噷姣旂壒锛岄渶瑕佺粡杩囧涓嬪嚑涓楠わ細
鐭ラ亾涓�涓郴缁熼渶瑕佹敮鎸佸摢浜涘瓧绗︼紝杩欎簺瀛楃鐨勯泦鍚堣绉颁负瀛楃琛�锛圕haracter repertoire锛�
缁欏瓧绗﹁〃閲岀殑鎶借薄瀛楃缂栦笂涓�涓暟瀛楋紝涔熷氨鏄瓧绗﹂泦鍚堝埌涓�涓暣鏁伴泦鍚堢殑鏄犲皠銆傝繖绉�鏄犲皠绉颁负缂栫爜瀛楃闆�锛圕CS:Coded Character Set锛�,unicode聽鏄睘浜庤繖涓�灞傜殑姒傚康锛岃窡璁$畻鏈洪噷鐨勪粈涔堣繘鍒跺晩娌℃湁浠讳綍鍏崇郴锛屽畠鏄畬鍏ㄦ暟瀛︾殑鎶借薄鐨勩��
灏� CCS 閲屽瓧绗﹀搴旂殑鏁存暟杞崲鎴愭湁闄愰暱搴︾殑姣旂壒鍊�锛屼究浜庝互鍚庤绠楁満浣跨敤涓�瀹氶暱搴︾殑浜岃繘鍒跺舰寮忚〃绀鸿鏁存暟銆傝繖涓搴斿叧绯昏绉颁负瀛楃缂栫爜琛�锛圕EF:Character Encoding Form锛塙TF-8, UTF-16 閮藉睘浜庤繖灞傘��
瀵逛簬 CEF 寰楀埌鐨勬瘮鐗瑰�煎叿浣撳浣曞湪璁$畻鏈轰腑杩涜瀛樺偍锛屼紶杈撱�傚洜涓哄瓨鍦ㄥぇ绔皬绔殑闂锛岃繖灏变細璺熷叿浣撶殑鎿嶄綔绯荤粺鐩稿叧浜嗐�傝繖绉嶈В鍐虫柟妗堢О涓哄瓧绗︾紪鐮佹柟妗堬紙CES:Character Encoding Scheme锛夈��
骞冲父鎴戜滑鎵�璇寸殑缂栫爜閮藉湪绗笁姝ョ殑鏃跺�欏畬鎴愪簡锛岄兘娌℃湁娑夊強鍒� CES銆傛墍浠� CES 骞朵笉鍦ㄦ湰鏂囩殑璁ㄨ鑼冨洿涔嬪唴銆�
鐜板湪涔熻鏈変汉浼氭兂涓轰粈涔堣鏈夌幇浠g殑缂栫爜妯″瀷锛熶负浠�涔堝湪鐜板湪鐨勭紪鐮佹ā鍨嬭鎷嗗垎鍑鸿繖涔堝姒傚康锛熺洿鎺ュ儚鍘熷鐨勭紪鐮佹ā鍨嬬洿鎺ラ兘瑙勫畾濂芥墍鏈夌殑淇℃伅涓嶈鍚楋紵杩欎簺闂鍦ㄤ笅鏂囩殑缂栫爜鍙戝睍鍙蹭腑閮戒細鏈夋墍闃愯堪銆�
ASCII
鎴戜滑鐭ラ亾鍦ㄨ绠楁満涓紝鎵�鏈夌殑淇℃伅鏈�缁堥兘琛ㄧず涓轰竴涓簩杩涘埗鐨勫瓧绗︿覆锛屾瘡涓�涓簩杩涘埗浣嶆湁 0 鍜� 1 涓ょ鐘舵�侊紝閫氳繃涓嶅悓鐨勬帓鍒楃粍鍚堬紝浣跨敤 0 鍜� 1 灏卞彲浠ヨ〃绀轰笘鐣屼笂鎵�鏈夌殑涓滆タ銆�
鑰� 1 瀛楄妭瀵瑰簲 8 浣嶄簩杩涘埗鏁帮紝姣忎綅浜岃繘鍒舵暟鏈� 0銆�1 涓ょ鐘舵�侊紝鍥犳 1 瀛楄妭鍙互缁勫悎鍑� 256 绉嶇姸鎬併�傚鏋滆繖 256 涓姸鎬佹瘡涓�涓兘瀵瑰簲涓�涓鍙凤紝灏辫兘閫氳繃 1 瀛楄妭鐨勬暟鎹〃绀� 256 涓瓧绗︺�傜編鍥戒汉浜庢槸灏卞埗瀹氫簡涓�濂楃紪鐮侊紙鍏跺疄灏辨槸涓瓧鍏革級锛屾弿杩拌嫳璇腑鐨勫瓧绗﹀拰杩� 8 浣嶄簩杩涘埗鏁扮殑瀵瑰簲鍏崇郴锛岃繖琚О涓� ASCII 鐮併��
ASCII 鐮佷竴鍏卞畾涔変簡 128 涓瓧绗︼紝鍖呮嫭鑻辨枃瀛楁瘝 A-Z锛宎-z锛屾暟瀛� 0-9锛屼竴浜涙爣鐐圭鍙峰拰鎺у埗绗﹀彿绛夈�傝繖 128 涓瓧绗﹀彧浣跨敤浜� 8 浣嶄簩杩涘埗鏁颁腑鐨勫悗闈� 7 浣嶏紝鏈�鍓嶉潰鐨勪竴浣嶇粺涓�瑙勫畾涓� 0銆�
GB2312
鑻辫鐢� 128 涓瓧绗︽潵缂栫爜瀹屽叏鏄冻澶熺殑锛屼絾鏄敤鏉ヨ〃绀哄叾浠栬瑷�锛�128 涓瓧绗︽槸杩滆繙涓嶅鐨勩�備簬鏄紝涓�浜涙娲茬殑鍥藉灏卞喅瀹氾紝灏� ASCII 鐮佷腑闂茬疆鐨勬渶楂樹綅鍒╃敤璧锋潵锛岃繖鏍蜂竴鏉ュ氨鑳借〃绀� 256 涓瓧绗︺�備絾鏄紝杩欓噷鍙堟湁浜嗕竴涓棶棰橈紝閭e氨鏄笉鍚岀殑鍥藉鐨勫瓧绗﹂泦鍙兘涓嶅悓锛屽氨绠楀畠浠兘鑳界敤 256 涓瓧绗﹁〃绀哄叏锛屼絾鏄悓涓�涓爜鐐癸紙涔熷氨鏄� 8 浣嶄簩杩涘埗鏁帮級琛ㄧず鐨勫瓧绗﹀彲鑳藉彲鑳戒笉鍚屻�備緥濡傦紝144 鍦ㄩ樋鎷変集浜虹殑 ASCII 鐮佷腑鏄� 诏锛岃�屽湪淇勭綏鏂殑 ASCII 鐮佷腑鏄� 褣銆�
鍥犳锛孉SCII 鐮佺殑闂鍦ㄤ簬灏界鎵�鏈変汉閮藉湪 0 - 127 鍙峰瓧绗︿笂杈炬垚浜嗕竴鑷达紝浣嗗浜� 128 - 255 鍙峰瓧绗︿笂鍗存湁寰堝绉嶄笉鍚岀殑瑙i噴銆備笌姝ゅ悓鏃讹紝浜氭床璇█鏈夋洿澶氱殑瀛楃闇�瑕佽瀛樺偍锛屼竴涓瓧鑺傚凡缁忎笉澶熺敤浜嗐��
浣嗘槸杩欓毦涓嶅�掓櫤鎱х殑涓浗浜烘皯锛屾垜浠笉瀹㈡皵鍦版妸閭d簺 127 鍙蜂箣鍚庣殑濂囧紓绗﹀彿浠洿鎺ュ彇娑堟帀, 瑙勫畾锛�
涓�涓皬浜� 127 鐨勫瓧绗︾殑鎰忎箟涓庡師鏉ョ浉鍚岋紝浣嗕袱涓ぇ浜� 127 鐨勫瓧绗﹁繛鍦ㄤ竴璧锋椂锛屽氨琛ㄧず涓�涓眽瀛楋紱
鍓嶉潰鐨勪竴涓瓧鑺傦紙浠栫О涔嬩负楂樺瓧鑺傦級浠� 0xA1 鐢ㄥ埌 0xF7锛屽悗闈竴涓瓧鑺傦紙浣庡瓧鑺傦級浠� 0xA1 鍒� 0xFE锛�
杩欐牱鎴戜滑灏卞彲浠ョ粍鍚堝嚭澶х害 7000 澶氫釜绠�浣撴眽瀛椾簡銆�
鍦ㄨ繖浜涚紪鐮侀噷锛屾垜浠繕鎶婃暟瀛︾鍙枫�佺綏椹笇鑵婄殑瀛楁瘝銆佹棩鏂囩殑鍋囧悕浠兘缂栬繘鍘讳簡锛岃繛鍦� ASCII 閲屾湰鏉ュ氨鏈夌殑鏁板瓧銆佹爣鐐广�佸瓧姣嶉兘缁熺粺閲嶆柊缂栦簡涓や釜瀛楄妭闀跨殑缂栫爜锛岃繖灏辨槸甯歌鐨� 鍏ㄨ瀛楃銆�
鑰屽師鏉ュ湪 127 鍙蜂互涓嬬殑閭d簺灏卞彨 鍗婅瀛楃 浜嗐��
涓浗浜烘皯鐪嬪埌杩欐牱寰堜笉閿欙紝浜庢槸灏辨妸杩欑姹夊瓧鏂规鍙仛 GB2312銆侴B2312 鏄 ASCII 鐨勪腑鏂囨墿灞曘��
GBK
浣嗘槸涓浗鐨勬眽瀛楀お澶氫簡锛屾垜浠緢蹇氨灏卞彂鐜版湁璁稿浜虹殑浜哄悕娌℃湁鍔炴硶鍦ㄨ繖閲屾墦鍑烘潵銆備簬鏄垜浠笉寰椾笉缁х画鎶� GB2312 娌℃湁鐢ㄥ埌鐨勭爜浣嶆壘鍑烘潵鑰佸疄涓嶅姘斿湴鐢ㄤ笂銆�
鍚庢潵杩樻槸涓嶅鐢紝浜庢槸骞茶剢涓嶅啀瑕佹眰浣庡瓧鑺備竴瀹氭槸 127 鍙蜂箣鍚庣殑鍐呯爜锛屽彧瑕佺涓�涓瓧鑺傛槸澶т簬 127 灏卞浐瀹氳〃绀鸿繖鏄竴涓眽瀛楃殑寮�濮嬶紝涓嶇鍚庨潰璺熺殑鏄笉鏄墿灞曞瓧绗﹂泦閲岀殑鍐呭銆傜粨鏋滄墿灞曚箣鍚庣殑缂栫爜鏂规琚О涓� GBK 鏍囧噯锛孏BK 鍖呮嫭浜� GB2312 鐨勬墍鏈夊唴瀹癸紝鍚屾椂鍙堝鍔犱簡杩� 20000 涓柊鐨勬眽瀛楋紙鍖呮嫭绻佷綋瀛楋級鍜岀鍙枫��
GB18030 / DBCS
鍚庢潵灏戞暟姘戞棌涔熻鐢ㄧ數鑴戜簡锛屼簬鏄垜浠啀鎵╁睍锛屽張鍔犱簡鍑犲崈涓柊鐨勫皯鏁版皯鏃忕殑瀛楋紝GBK 鎵╂垚浜� GB18030銆備粠姝や箣鍚庯紝涓崕姘戞棌鐨勬枃鍖栧氨鍙互鍦ㄨ绠楁満鏃朵唬涓紶鎵夸簡銆�
涓浗鐨勭▼搴忓憳浠湅鍒拌繖涓�绯诲垪姹夊瓧缂栫爜鐨勬爣鍑嗘槸濂界殑锛屼簬鏄�氱О浠栦滑鍙仛 DBCS銆�
Double Byte Charecter Set锛氬弻瀛楄妭瀛楃闆嗐��
鍦� DBCS 绯诲垪鏍囧噯閲岋紝鏈�澶х殑鐗圭偣鏄�涓ゅ瓧鑺傞暱鐨勬眽瀛楀瓧绗﹀拰涓�瀛楄妭闀跨殑鑻辨枃瀛楃骞跺瓨浜庡悓涓�濂楃紪鐮佹柟妗堥噷锛屽洜姝や粬浠啓鐨勭▼搴忎负浜嗘敮鎸佷腑鏂囧鐞嗭紝蹇呴』瑕佹敞鎰忓瓧涓查噷鐨勬瘡涓�涓瓧鑺傜殑鍊硷紝濡傛灉杩欎釜鍊兼槸澶т簬 127 鐨勶紝閭d箞灏辫涓轰竴涓弻瀛楄妭瀛楃闆嗛噷鐨勫瓧绗﹀嚭鐜颁簡銆�
鍥犱负褰撴椂鍚勪釜鍥藉閮藉儚涓浗杩欐牱鎼炲嚭涓�濂楄嚜宸辩殑缂栫爜鏍囧噯锛岀粨鏋滀簰鐩镐箣闂磋皝涔熶笉鎳傝皝鐨勭紪鐮侊紝璋佷篃涓嶆敮鎸佸埆浜虹殑缂栫爜銆�
Unicode
鏈�缁堬紝缇庡浗浜烘剰璇嗗埌浠栦滑搴旇鎻愬嚭涓�绉嶆爣鍑嗘柟妗堟潵灞曠ず涓栫晫涓婃墍鏈夎瑷�涓殑鎵�鏈夊瓧绗︼紝鍑轰簬杩欎釜鐩殑锛孶nicode 璇炵敓浜嗐��
Unicode 婧愪簬涓�涓緢绠�鍗曠殑鎯虫硶锛氬皢鍏ㄤ笘鐣屾墍鏈夌殑瀛楃鍖呭惈鍦ㄤ竴涓泦鍚堥噷锛岃绠楁満鍙鏀寔杩欎竴涓瓧绗﹂泦锛屽氨鑳芥樉绀烘墍鏈夌殑瀛楃锛屽啀涔熶笉浼氭湁涔辩爜浜嗐��
瀹冧粠 0 寮�濮嬶紝涓烘瘡涓鍙锋寚瀹氫竴涓紪鍙凤紝杩欏彨鍋氣�濈爜鐐光�濓紙code point锛夈�傛瘮濡傦紝鐮佺偣 0 鐨勭鍙峰氨鏄� null锛堣〃绀烘墍鏈変簩杩涘埗浣嶉兘鏄� 0锛夈��
U+0000 = null
涓婂紡涓紝U+琛ㄧず绱ц窡鍦ㄥ悗闈㈢殑鍗佸叚杩涘埗鏁版槸 Unicode 鐨勭爜鐐广��
杩欎箞澶氱鍙凤紝Unicode 涓嶆槸涓�娆℃�у畾涔夌殑锛岃�屾槸鍒嗗尯瀹氫箟銆�姣忎釜鍖哄彲浠ュ瓨鏀� 65536 涓紙2^16锛夊瓧绗︼紝绉颁负涓�涓钩闈紙plane锛夈�傜洰鍓嶏紝涓�鍏辨湁 17 涓钩闈�锛屼篃灏辨槸璇达紝鏁翠釜 Unicode 瀛楃闆嗙殑澶у皬鐜板湪鏄�2^21銆�
鏈�鍓嶉潰鐨� 65536 涓瓧绗︿綅锛岀О涓哄熀鏈钩闈紙缂╁啓 BMP锛夛紝瀹冪殑鐮佺偣鑼冨洿鏄粠 0 涓�鐩村埌聽2^16-1锛屽啓鎴� 16 杩涘埗灏辨槸浠� U+0000 鍒� U+FFFF銆傛墍鏈夋渶甯歌鐨勫瓧绗﹂兘鏀惧湪杩欎釜骞抽潰锛岃繖鏄� Unicode 鏈�鍏堝畾涔夊拰鍏竷鐨勪竴涓钩闈€��
鍓╀笅鐨勫瓧绗﹂兘鏀惧湪杈呭姪骞抽潰锛堢缉鍐� SMP锛夛紝鐮佺偣鑼冨洿浠� U+010000 涓�鐩村埌 U+10FFFF銆�
Unicode 鍙瀹氫簡姣忎釜瀛楃鐨勭爜鐐癸紝鍒板簳鐢ㄤ粈涔堟牱鐨勫瓧鑺傚簭琛ㄧず杩欎釜鐮佺偣锛屽氨娑夊強鍒扮紪鐮佹柟娉曘��
Unicode 缂栫爜鏂规
涔嬪墠鎻愬埌锛孶nicode 娌℃湁瑙勫畾瀛楃瀵瑰簲鐨勪簩杩涘埗鐮佸浣曞瓨鍌ㄣ�備互姹夊瓧鈥滄眽鈥濅负渚嬶紝瀹冪殑 Unicode 鐮佺偣鏄� 0x6c49锛屽搴旂殑浜岃繘鍒舵暟鏄� 110110001001001锛屼簩杩涘埗鏁版湁 15 浣嶏紝杩欎篃灏辫鏄庝簡瀹冭嚦灏戦渶瑕� 2 涓瓧鑺傛潵琛ㄧず銆傚彲浠ユ兂璞★紝鍦� Unicode 瀛楀吀涓線鍚庣殑瀛楃鍙兘灏遍渶瑕� 3 涓瓧鑺傛垨鑰� 4 涓瓧鑺傦紝鐢氳嚦鏇村瀛楄妭鏉ヨ〃绀轰簡銆�
杩欏氨瀵艰嚧浜嗕竴浜涢棶棰橈紝璁$畻鏈烘�庝箞鐭ラ亾浣犺繖涓� 2 涓瓧鑺傝〃绀虹殑鏄竴涓瓧绗︼紝鑰屼笉鏄垎鍒〃绀轰袱涓瓧绗﹀憿锛熻繖閲屾垜浠彲鑳戒細鎯冲埌锛岄偅灏卞彇涓渶澶х殑锛屽亣濡� Unicode 涓渶澶х殑瀛楃鐢� 4 瀛楄妭灏卞彲浠ヨ〃绀轰簡锛岄偅涔堟垜浠氨灏嗘墍鏈夌殑瀛楃閮界敤 4 涓瓧鑺傛潵琛ㄧず锛屼笉澶熺殑灏卞線鍓嶉潰琛� 0銆傝繖鏍风‘瀹炲彲浠ヨВ鍐崇紪鐮侀棶棰橈紝浣嗘槸鍗撮�犳垚浜嗙┖闂寸殑鏋佸ぇ娴垂锛屽鏋滄槸涓�涓嫳鏂囨枃妗o紝閭f枃浠跺ぇ灏忓氨澶у嚭浜� 3 鍊嶏紝杩欐樉鐒舵槸鏃犳硶鎺ュ彈鐨勩��
浜庢槸锛屼负浜嗚緝濂界殑瑙e喅 Unicode 鐨勭紪鐮侀棶棰橈紝 UTF-8 鍜� UTF-16 涓ょ褰撳墠姣旇緝娴佽鐨勭紪鐮佹柟寮忚癁鐢熶簡銆傚綋鐒惰繕鏈変竴涓� UTF-32 鐨勭紪鐮佹柟寮忥紝涔熷氨鏄笂杩伴偅绉嶅畾闀跨紪鐮侊紝瀛楃缁熶竴浣跨敤 4 涓瓧鑺傦紝铏界劧鐪嬩技鏂逛究锛屼絾鏄嵈涓嶅鍙﹀涓ょ缂栫爜鏂瑰紡浣跨敤骞挎硾銆�
UTF-8
UTF-8 鏄竴涓潪甯告儕鑹崇殑缂栫爜鏂瑰紡锛屾紓浜殑瀹炵幇浜嗗 ASCII 鐮佺殑鍚戝悗鍏煎锛屼互淇濊瘉 Unicode 鍙互琚ぇ浼楁帴鍙椼��
UTF-8 鏄洰鍓嶄簰鑱旂綉涓婁娇鐢ㄦ渶骞挎硾鐨勪竴绉� Unicode 缂栫爜鏂瑰紡锛屽畠鐨勬渶澶х壒鐐瑰氨鏄�鍙彉闀裤�傚畠鍙互浣跨敤 1 - 4 涓瓧鑺傝〃绀轰竴涓瓧绗︼紝鏍规嵁瀛楃鐨勪笉鍚屽彉鎹㈤暱搴�銆傜紪鐮佽鍒欏涓嬶細
瀵逛簬鍗曚釜瀛楄妭鐨勫瓧绗︼紝绗竴浣嶈涓� 0锛屽悗闈㈢殑 7 浣嶅搴旇繖涓瓧绗︾殑 Unicode 鐮佺偣銆傚洜姝わ紝瀵逛簬鑻辨枃涓殑 0 - 127 鍙峰瓧绗︼紝涓� ASCII 鐮佸畬鍏ㄧ浉鍚屻�傝繖鎰忓懗鐫� ASCII 鐮侀偅涓勾浠g殑鏂囨。鐢� UTF-8 缂栫爜鎵撳紑瀹屽叏娌℃湁闂銆�
瀵逛簬闇�瑕佷娇鐢� N 涓瓧鑺傛潵琛ㄧず鐨勫瓧绗︼紙N > 1锛夛紝绗竴涓瓧鑺傜殑鍓� N 浣嶉兘璁句负 1锛岀 N + 1 浣嶈涓� 0锛屽墿浣欑殑 N - 1 涓瓧鑺傜殑鍓嶄袱浣嶉兘璁句綅 10锛屽墿涓嬬殑浜岃繘鍒朵綅鍒欎娇鐢ㄨ繖涓瓧绗︾殑 Unicode 鐮佺偣鏉ュ~鍏呫��
缂栫爜瑙勫垯濡備笅锛�
Unicode 鍗佸叚杩涘埗鐮佺偣鑼冨洿UTF-8 浜岃繘鍒�
0000 0000 - 0000 007F0xxxxxxx
0000 0080 - 0000 07FF110xxxxx 10xxxxxx
0000 0800 - 0000 FFFF1110xxxx 10xxxxxx 10xxxxxx
0001 0000 - 0010 FFFF11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
鏍规嵁涓婇潰缂栫爜瑙勫垯瀵圭収琛紝杩涜 UTF-8 缂栫爜鍜岃В鐮佸氨绠�鍗曞浜嗐�備笅闈互姹夊瓧鈥滄眽鈥濅负鍒╋紝鍏蜂綋璇存槑濡備綍杩涜 UTF-8 缂栫爜鍜岃В鐮併��
鈥滄眽鈥濈殑 Unicode 鐮佺偣鏄� 0x6c49锛�110 1100 0100 1001锛夛紝閫氳繃涓婇潰鐨勫鐓ц〃鍙互鍙戠幇锛�0x0000 6c49 浣嶄簬绗笁琛岀殑鑼冨洿锛岄偅涔堝緱鍑哄叾鏍煎紡涓� 1110xxxx 10xxxxxx 10xxxxxx銆傛帴鐫�锛屼粠鈥滄眽鈥濈殑浜岃繘鍒舵暟鏈�鍚庝竴浣嶅紑濮嬶紝浠庡悗鍚戝墠渚濇濉厖瀵瑰簲鏍煎紡涓殑 x锛屽鍑虹殑 x 鐢� 0 琛ヤ笂銆傝繖鏍凤紝灏卞緱鍒颁簡鈥滄眽鈥濈殑 UTF-8 缂栫爜涓� 11100110 10110001 10001001锛岃浆鎹㈡垚鍗佸叚杩涘埗灏辨槸 0xE6 0xB7 0x89銆�
瑙g爜鐨勮繃绋嬩篃鍗佸垎绠�鍗曪細濡傛灉涓�涓瓧鑺傜殑绗竴浣嶆槸 0 锛屽垯璇存槑杩欎釜瀛楄妭瀵瑰簲涓�涓瓧绗︼紱濡傛灉涓�涓瓧鑺傜殑绗竴浣� 1锛岄偅涔堣繛缁湁澶氬皯涓� 1锛屽氨琛ㄧず璇ュ瓧绗﹀崰鐢ㄥ灏戜釜瀛楄妭銆�
UTF-16
Windows 鍐呮牳銆丣ava銆丱bjective-C (Foundation)銆丣avaScript 涓兘浼氬皢瀛楃鐨勫熀鏈崟鍏冨畾涓轰袱涓瓧鑺傜殑鏁版嵁绫诲瀷锛屼篃灏辨槸鎴戜滑鍦� C / C++ 涓亣鍒扮殑 wchar_t 绫诲瀷鎴� Java 涓殑 char 绫诲瀷绛夌瓑锛岃繖浜涚被鍨嬪崰鍐呭瓨涓や釜瀛楄妭锛屽洜涓� Unicode 涓父鐢ㄧ殑瀛楃閮藉浜� 0x0 - 0xFFFF 鐨勮寖鍥翠箣鍐咃紝鍥犳涓や釜瀛楄妭鍑犱箮鍙互瑕嗙洊澶ч儴鍒嗙殑甯哥敤瀛楃銆�
UTF-16 缂栫爜浠嬩簬 UTF-32 涓� UTF-8 涔嬮棿锛屽悓鏃�缁撳悎浜嗗畾闀垮拰鍙橀暱涓ょ缂栫爜鏂规硶鐨勭壒鐐�銆傚畠鐨勭紪鐮佽鍒欏緢绠�鍗曪細鍩烘湰骞抽潰鐨勫瓧绗﹀崰鐢� 2 涓瓧鑺傦紝杈呭姪骞抽潰鐨勫瓧绗﹀崰鐢� 4 涓瓧鑺�銆備篃灏辨槸璇达紝UTF-16 鐨勭紪鐮侀暱搴﹁涔堟槸 2 涓瓧鑺傦紙U+0000 鍒� U+FFFF锛夛紝瑕佷箞鏄� 4 涓瓧鑺傦紙U+010000 鍒� U+10FFFF锛夈�傞偅涔堥棶棰樻潵浜嗭紝褰撴垜浠亣鍒颁袱涓瓧鑺傛椂锛屽埌搴曟槸鎶婅繖涓や釜瀛楄妭褰撲綔涓�涓瓧绗﹁繕鏄笌鍚庨潰鐨勪袱涓瓧鑺備竴璧峰綋浣滀竴涓瓧绗﹀憿锛�
杩欓噷鏈変竴涓緢宸у鐨勫湴鏂癸紝鍦ㄥ熀鏈钩闈㈠唴锛屼粠 U+D800 鍒� U+DFFF 鏄竴涓┖娈碉紝鍗宠繖浜涚爜鐐逛笉瀵瑰簲浠讳綍瀛楃銆傚洜姝わ紝杩欎釜绌烘鍙互鐢ㄦ潵鏄犲皠杈呭姪骞抽潰鐨勫瓧绗︺��
杈呭姪骞抽潰鐨勫瓧绗︿綅鍏辨湁 2^20 涓紝鍥犳琛ㄧず杩欎簺瀛楃鑷冲皯闇�瑕� 20 涓簩杩涘埗浣嶃�俇TF-16 灏嗚繖 20 涓簩杩涘埗浣嶅垎鎴愪袱鍗婏紝鍓� 10 浣嶆槧灏勫湪 U+D800 鍒� U+DBFF锛堢┖闂村ぇ灏� 2^10锛夛紝绉颁负楂樹綅锛圚锛夛紝鍚� 10 浣嶆槧灏勫湪 U+DC00 鍒� U+DFFF锛堢┖闂村ぇ灏� 2^10锛夛紝绉颁负浣庝綅锛圠锛夈�傝繖鎰忓懗鐫�锛屼竴涓緟鍔╁钩闈㈢殑瀛楃锛岃鎷嗘垚涓や釜鍩烘湰骞抽潰鐨勫瓧绗﹁〃绀恒��
鍥犳锛屽綋鎴戜滑閬囧埌涓や釜瀛楄妭锛屽彂鐜板畠鐨勭爜鐐瑰湪 U+D800 鍒� U+DBFF 涔嬮棿锛屽氨鍙互鏂畾锛岀揣璺熷湪鍚庨潰鐨勪袱涓瓧鑺傜殑鐮佺偣锛屽簲璇ュ湪 U+DC00 鍒� U+DFFF 涔嬮棿锛岃繖鍥涗釜瀛楄妭蹇呴』鏀惧湪涓�璧疯В璇汇��
鎺ヤ笅鏉ワ紝浠ユ眽瀛椻�濔牣封�濅负渚嬶紝璇存槑 UTF-16 缂栫爜鏂瑰紡鏄浣曞伐浣滅殑銆�
姹夊瓧鈥濔牣封�濈殑 Unicode 鐮佺偣涓� 0x20BB7锛岃鐮佺偣鏄剧劧瓒呭嚭浜嗗熀鏈钩闈㈢殑鑼冨洿锛�0x0000 - 0xFFFF锛夛紝鍥犳闇�瑕佷娇鐢ㄥ洓涓瓧鑺傝〃绀恒�傞鍏堢敤 0x20BB7 - 0x10000 璁$畻鍑鸿秴鍑虹殑閮ㄥ垎锛岀劧鍚庡皢鍏剁敤 20 涓簩杩涘埗浣嶈〃绀猴紙涓嶈冻鍓嶉潰琛� 0 锛夛紝缁撴灉涓� 0001000010 1110110111銆傛帴鐫�锛屽皢鍓� 10 浣嶆槧灏勫埌 U+D800 鍒� U+DBFF 涔嬮棿锛屽悗 10 浣嶆槧灏勫埌 U+DC00 鍒� U+DFFF 鍗冲彲銆俇+D800 瀵瑰簲鐨勪簩杩涘埗鏁颁负 1101100000000000锛岀洿鎺ュ~鍏呭悗闈㈢殑 10 涓簩杩涘埗浣嶅嵆鍙紝寰楀埌 1101100001000010锛岃浆鎴� 16 杩涘埗鏁板垯涓� 0xD842銆傚悓鐞嗗彲寰楋紝浣庝綅涓� 0xDFB7銆傚洜姝ゅ緱鍑烘眽瀛椻�濔牣封�濈殑 UTF-16 缂栫爜涓� 0xD842 0xDFB7銆�
Unicode3.0 涓粰鍑轰簡杈呭姪骞抽潰瀛楃鐨勮浆鎹㈠叕寮忥細
H = Math.floor((c-0x10000) / 0x400)+0xD800
L = (c - 0x10000) % 0x400 + 0xDC00
鏍规嵁缂栫爜鍏紡锛屽彲浠ュ緢鏂逛究鐨勮绠楀嚭瀛楃鐨� UTF-16 缂栫爜銆�
浠� 饾寙 瀛楃涓轰緥锛屽畠鏄竴涓緟鍔╁钩闈㈠瓧绗︼紝鐮佺偣涓� U+1D306锛屽皢鍏惰浆涓� UTF-16 鐨勮绠楄繃绋嬪涓嬨��
H = Math.floor((0x1D306-0x10000)/0x400)+0xD800 = 0xD834
L = (0x1D306-0x10000) % 0x400+0xDC00 = 0xDF06
鎵�浠ワ紝瀛楃鐨� UTF-16 缂栫爜灏辨槸 0xD834 0xDF06锛岄暱搴︿负鍥涗釜瀛楄妭銆�
UTF-32
UTF-32 鏄渶鐩磋鐨勭紪鐮佹柟娉曪紝姣忎釜鐮佺偣浣跨敤鍥涗釜瀛楄妭琛ㄧず锛屽瓧鑺傚唴瀹逛竴涓�瀵瑰簲鐮佺偣銆傛瘮濡傦紝鐮佺偣 0 灏辩敤鍥涗釜瀛楄妭鐨� 0 琛ㄧず锛岀爜鐐� 597D 灏卞湪鍓嶉潰鍔犱袱涓瓧鑺傜殑 0銆�
U+0000 = 0x0000 0000
U+597D = 0x0000 597D
UTF-32 鐨勪紭鐐瑰湪浜庯紝杞崲瑙勫垯绠�鍗曠洿瑙傦紝鏌ユ壘鏁堢巼楂樸�傜己鐐瑰湪浜庢氮璐圭┖闂达紝鍚屾牱鍐呭鐨勮嫳璇枃鏈紝瀹冧細姣� ASCII 缂栫爜澶у洓鍊嶃�傝繖涓己鐐瑰緢鑷村懡锛屽鑷村疄闄呬笂娌℃湁浜轰娇鐢ㄨ繖绉嶇紪鐮佹柟娉曪紝HTML 5 鏍囧噯灏辨槑鏂囪瀹氾紝缃戦〉涓嶅緱缂栫爜鎴� UTF-32銆�
JavaScript 缂栫爜鏂规硶
JavaScript 璇█閲囩敤 Unicode 瀛楃闆嗭紝浣嗘槸鍙敮鎸佷竴绉嶇紪鐮佹柟娉曘��
杩欑缂栫爜鏃笉鏄� UTF-16锛屼篃涓嶆槸 UTF-8锛屾洿涓嶆槸 UTF-32銆備笂闈㈤偅浜涚紪鐮佹柟娉曪紝JavaScript 閮戒笉鐢ㄣ��
JavaScript 鐢ㄧ殑鏄� UCS-2锛�
UCS-2 缂栫爜
鎬庝箞绐佺劧鏉�鍑轰竴涓� UCS-2锛熻繖灏遍渶瑕佽涓�鐐瑰巻鍙层��
浜掕仈缃戣繕娌″嚭鐜扮殑骞翠唬锛屾浘缁忔湁涓や釜鍥㈤槦锛屼笉绾﹁�屽悓鎯虫悶缁熶竴瀛楃闆嗐�備竴涓槸 1988 骞存垚绔嬬殑 Unicode 鍥㈤槦锛屽彟涓�涓槸 1989 骞存垚绔嬬殑 UCS 鍥㈤槦銆傜瓑鍒颁粬浠彂鐜颁簡瀵规柟鐨勫瓨鍦紝寰堝揩灏辫揪鎴愪竴鑷达細涓栫晫涓婁笉闇�瑕佷袱濂楃粺涓�瀛楃闆嗐��
1991 骞� 10 鏈堬紝涓や釜鍥㈤槦鍐冲畾鍚堝苟瀛楃闆嗐�備篃灏辨槸璇达紝浠庝粖浠ュ悗鍙彂甯冧竴濂楀瓧绗﹂泦锛屽氨鏄� Unicode锛屽苟涓斾慨璁㈡鍓嶅彂甯冪殑瀛楃闆嗭紝UCS 鐨勭爜鐐瑰皢涓� Unicode 瀹屽叏涓�鑷淬��
UCS 鐨勫紑鍙戣繘搴﹀揩浜� Unicode锛�1990 骞村氨鍏竷浜嗙涓�濂楃紪鐮佹柟娉� UCS-2锛屼娇鐢� 2 涓瓧鑺傝〃绀哄凡缁忔湁鐮佺偣鐨勫瓧绗︺�傦紙閭d釜鏃跺�欏彧鏈変竴涓钩闈紝灏辨槸鍩烘湰骞抽潰锛屾墍浠� 2 涓瓧鑺傚氨澶熺敤浜嗐�傦級UTF-16 缂栫爜杩熻嚦 1996 骞� 7 鏈堟墠鍏竷锛屾槑纭甯冩槸 UCS-2 鐨勮秴闆嗭紝鍗冲熀鏈钩闈㈠瓧绗︽部鐢� UCS-2 缂栫爜锛岃緟鍔╁钩闈㈠瓧绗﹀畾涔変簡 4 涓瓧鑺傜殑琛ㄧず鏂规硶銆�
涓よ�呯殑鍏崇郴绠�鍗曡锛�灏辨槸 UTF-16 鍙栦唬浜� UCS-2锛屾垨鑰呰 UCS-2 鏁村悎杩涗簡 UTF-16銆傛墍浠ワ紝鐜板湪鍙湁 UTF-16锛屾病鏈� UCS-2銆�
閭d箞锛屼负浠�涔� JavaScript 涓嶉�夋嫨鏇撮珮绾х殑 UTF-16锛岃�岀敤浜嗗凡缁忚娣樻卑鐨� UCS-2 鍛紵
绛旀寰堢畝鍗曪細闈炰笉鎯充篃锛屾槸涓嶈兘涔熴�傚洜涓哄湪聽JavaScript 璇█鍑虹幇鐨勬椂鍊欙紝杩樻病鏈� UTF-16 缂栫爜銆�
JavaScript 瀛楃鍑芥暟鐨勫眬闄�
鐢变簬 JavaScript 鍙兘澶勭悊 UCS-2 缂栫爜锛岄�犳垚鎵�鏈夊瓧绗﹀湪杩欓棬璇█涓兘鏄� 2 涓瓧鑺傦紝濡傛灉鏄� 4 涓瓧鑺傜殑瀛楃锛屼細褰撲綔涓や釜鍙屽瓧鑺傜殑瀛楃澶勭悊銆侸avaScript 鐨勫瓧绗﹀嚱鏁伴兘鍙楀埌杩欎竴鐐圭殑褰卞搷锛屾棤娉曡繑鍥炴纭粨鏋溿��
浠モ�濔潓嗏�濆瓧绗︿负渚嬶紝瀹冪殑 UTF-16 缂栫爜鏄� 4 涓瓧鑺傜殑 0xD834 0xDF06銆傞棶棰樺氨鏉ヤ簡锛�4 涓瓧鑺傜殑缂栫爜涓嶅睘浜� UCS-2锛孞avaScript 涓嶈璇嗭紝鍙細鎶婂畠鐪嬩綔鍗曠嫭鐨勪袱涓瓧绗� U+D834 鍜� U+DF06銆傚墠闈㈣杩囷紝杩欎袱涓爜鐐规槸绌虹殑锛屾墍浠� JavaScript 浼氳涓烘槸涓や釜鈥濓拷鈥濆瓧绗︾粍鎴愮殑瀛楃涓诧紒
"饾寙".length
// 2
'\u1D306' === '饾寙'
// false
"饾寙".charAt(0)
// "锟�"
"饾寙".charCodeAt(0)
// 55348(0xD834)
"饾寙" === '\uD834\uDF06'
// true
涓婇潰浠g爜琛ㄧず锛孞avaScript 璁や负瀛楃鐨勯暱搴︽槸 2锛屽彇鍒扮殑绗竴涓瓧绗︽槸绌哄瓧绗︼紝鍙栧埌鐨勭涓�涓瓧绗︾殑鐮佺偣鏄� 0xDB34銆傝繖浜涚粨鏋滈兘涓嶆纭紒
瑙e喅杩欎釜闂锛屽繀椤诲鐮佺偣鍋氫竴涓垽鏂紝鐒跺悗鎵嬪姩璋冩暣銆備笅闈㈡槸姝g‘鐨勯亶鍘嗗瓧绗︿覆鐨勫啓娉曘��
while (++index < length) {
聽 // ...
聽 if (charCode >= 0xD800 && charCode <= 0xDBFF) {
聽 聽 output.push(character + string.charAt(++index));
聽 } else {
聽 聽 output.push(character);
聽 }
}
绫讳技鐨勯棶棰樺瓨鍦ㄤ簬鎵�鏈夌殑 JavaScript 瀛楃鎿嶄綔鍑芥暟銆�
String.prototype.replace()
String.prototype.substring()
String.prototype.slice()
...
涓婇潰鐨勫嚱鏁伴兘鍙 2 瀛楄妭鐨勭爜鐐规湁鏁堛�傝姝g‘澶勭悊 4 瀛楄妭鐨勭爜鐐癸紝灏卞繀椤婚�愪竴閮ㄧ讲鑷繁鐨勭増鏈紝鍒ゆ柇涓�涓嬪綋鍓嶅瓧绗︾殑鐮佺偣鑼冨洿銆�
ECMAScript 6 涓 Unicode 鐨勬墿灞�
瀛楃鐨� Unicode 琛ㄧず娉�
ES6 鍔犲己浜嗗 Unicode 鐨勬敮鎸侊紝鍏佽閲囩敤\uxxxx 褰㈠紡琛ㄧず涓�涓瓧绗︼紝鍏朵腑 xxxx 琛ㄧず瀛楃鐨� Unicode 鐮佺偣銆�
"\u0061"
// "a"
浣嗘槸锛岃繖绉嶈〃绀烘硶鍙檺浜庣爜鐐瑰湪\u0000~\uFFFF 涔嬮棿鐨勫瓧绗︺�傝秴鍑鸿繖涓寖鍥寸殑瀛楃锛屽繀椤荤敤涓や釜鍙屽瓧鑺傜殑褰㈠紡琛ㄧず銆�
"\uD842\uDFB7"
// "馉"
"\u20BB7"
// "鈧�7"
涓婇潰浠g爜琛ㄧず锛屽鏋滅洿鎺ュ湪\u 鍚庨潰璺熶笂瓒呰繃 0xFFFF 鐨勬暟鍊硷紙姣斿\u20BB7锛夛紝JavaScript 浼氱悊瑙f垚\u20BB+7銆傛墍浠ヤ細鏄剧ず涓�涓叾浠栧瓧绗︼紝鍚庨潰璺熺潃涓�涓� 7銆�
ES6 瀵硅繖涓�鐐瑰仛鍑轰簡鏀硅繘锛屽彧瑕佸皢鐮佺偣鏀惧叆澶ф嫭鍙凤紝灏辫兘姝g‘瑙h璇ュ瓧绗︺��
"\u{20BB7}"
// "馉"
"\u{41}\u{42}\u{43}"
// "ABC"
let hello = 123;
hell\u{6F} // 123
'\u{1F680}' === '\uD83D\uDE80'
// true
涓婇潰浠g爜涓紝鏈�鍚庝竴涓緥瀛愯〃鏄庯紝澶ф嫭鍙疯〃绀烘硶涓庡洓瀛楄妭鐨� UTF-16 缂栫爜鏄瓑浠风殑銆�
鏈変簡杩欑琛ㄧず娉曚箣鍚庯紝JavaScript 鍏辨湁 6 绉嶆柟娉曞彲浠ヨ〃绀轰竴涓瓧绗︺��
'\z' === 'z'聽 // true
'\172' === 'z' // true
'\x7A' === 'z' // true
'\u007A' === 'z' // true
'\u{7A}' === 'z' // true
瀛楃涓茬殑閬嶅巻鍣ㄦ帴鍙�
ES6 涓哄瓧绗︿覆娣诲姞浜嗛亶鍘嗗櫒鎺ュ彛锛屼娇寰楀瓧绗︿覆鍙互琚� for鈥f 寰幆閬嶅巻銆�
for (let codePoint of 'foo') {
聽 console.log(codePoint)
}
// "f"
// "o"
// "o"
闄や簡閬嶅巻瀛楃涓诧紝杩欎釜閬嶅巻鍣ㄦ渶澶х殑浼樼偣鏄彲浠ヨ瘑鍒ぇ浜� 0xFFFF 鐨勭爜鐐癸紝浼犵粺鐨� for 寰幆鏃犳硶璇嗗埆杩欐牱鐨勭爜鐐广��
let text = String.fromCodePoint(0x20BB7);
for (let i = 0; i < text.length; i++) {
聽 console.log(text[i]);
}
// " "
// " "
for (let i of text) {
聽 console.log(i);
}
// "馉"
涓婇潰浠g爜涓紝瀛楃涓� text 鍙湁涓�涓瓧绗︼紝浣嗘槸 for 寰幆浼氳涓哄畠鍖呭惈涓や釜瀛楃锛堥兘涓嶅彲鎵撳嵃锛夛紝鑰� for鈥f 寰幆浼氭纭瘑鍒嚭杩欎竴涓瓧绗︺��
鐩存帴杈撳叆 U+2028 鍜� U+2029
JavaScript 瀛楃涓插厑璁哥洿鎺ヨ緭鍏ュ瓧绗︼紝浠ュ強杈撳叆瀛楃鐨勮浆涔夊舰寮忋�備妇渚嬫潵璇达紝鈥滀腑鈥濈殑 Unicode 鐮佺偣鏄� U+4e2d锛屼綘鍙互鐩存帴鍦ㄥ瓧绗︿覆閲岄潰杈撳叆杩欎釜姹夊瓧锛屼篃鍙互杈撳叆瀹冪殑杞箟褰㈠紡\u4e2d锛屼袱鑰呮槸绛変环鐨勩��
'涓�' === '\u4e2d' // true
浣嗘槸锛孞avaScript 瑙勫畾鏈� 5 涓瓧绗︼紝涓嶈兘鍦ㄥ瓧绗︿覆閲岄潰鐩存帴浣跨敤锛屽彧鑳戒娇鐢ㄨ浆涔夊舰寮忋��
U+005C锛氬弽鏂滄潬锛坮everse solidus)
U+000D锛氬洖杞︼紙carriage return锛�
U+2028锛氳鍒嗛殧绗︼紙line separator锛�
U+2029锛氭鍒嗛殧绗︼紙paragraph separator锛�
U+000A锛氭崲琛岀锛坙ine feed锛�
涓句緥鏉ヨ锛屽瓧绗︿覆閲岄潰涓嶈兘鐩存帴鍖呭惈鍙嶆枩鏉狅紝涓�瀹氳杞箟鍐欐垚\鎴栬�匼u005c銆�
杩欎釜瑙勫畾鏈韩娌℃湁闂锛岄夯鐑﹀湪浜� JSON 鏍煎紡鍏佽瀛楃涓查噷闈㈢洿鎺ヤ娇鐢� U+2028锛堣鍒嗛殧绗︼級鍜� U+2029锛堟鍒嗛殧绗︼級銆傝繖鏍蜂竴鏉ワ紝鏈嶅姟鍣ㄨ緭鍑虹殑 JSON 琚� JSON.parse 瑙f瀽锛屽氨鏈夊彲鑳界洿鎺ユ姤閿欍��
const json = '"\u2028"';
JSON.parse(json); // 鍙兘鎶ラ敊
JSON 鏍煎紡宸茬粡鍐荤粨锛圧FC 7159锛夛紝娌℃硶淇敼浜嗐�備负浜嗘秷闄よ繖涓姤閿欙紝ES2019 鍏佽 JavaScript 瀛楃涓茬洿鎺ヨ緭鍏� U+2028锛堣鍒嗛殧绗︼級鍜� U+2029锛堟鍒嗛殧绗︼級銆�
const PS = eval(鈥溾�榎u2029鈥欌��); 鏍规嵁杩欎釜鎻愭锛屼笂闈㈢殑浠g爜涓嶄細鎶ラ敊銆�
娉ㄦ剰锛屾ā鏉垮瓧绗︿覆鐜板湪灏卞厑璁哥洿鎺ヨ緭鍏ヨ繖涓や釜瀛楃銆傚彟澶栵紝姝e垯琛ㄨ揪寮忎緷鐒朵笉鍏佽鐩存帴杈撳叆杩欎袱涓瓧绗︼紝杩欐槸娌℃湁闂鐨勶紝鍥犱负 JSON 鏈潵灏变笉鍏佽鐩存帴鍖呭惈姝e垯琛ㄨ揪寮忋��
JSON.stringify() 鐨勬敼閫�
鏍规嵁鏍囧噯锛�JSON 鏁版嵁蹇呴』鏄� UTF-8 缂栫爜銆備絾鏄紝鐜板湪鐨� JSON.stringify()鏂规硶鏈夊彲鑳借繑鍥炰笉绗﹀悎 UTF-8 鏍囧噯鐨勫瓧绗︿覆銆�
鍏蜂綋鏉ヨ锛孶TF-8 鏍囧噯瑙勫畾锛�0xD800 鍒� 0xDFFF 涔嬮棿鐨勭爜鐐癸紝涓嶈兘鍗曠嫭浣跨敤锛屽繀椤婚厤瀵逛娇鐢ㄣ�傛瘮濡傦紝\uD834\uDF06 鏄袱涓爜鐐癸紝浣嗘槸蹇呴』鏀惧湪涓�璧烽厤瀵逛娇鐢紝浠h〃瀛楃 饾寙銆傝繖鏄负浜嗚〃绀虹爜鐐瑰ぇ浜� 0xFFFF 鐨勫瓧绗︾殑涓�绉嶅彉閫氭柟娉曘�傚崟鐙娇鐢╘uD834 鍜孿uDFO6 杩欎袱涓爜鐐规槸涓嶅悎娉曠殑锛屾垨鑰呴鍊掗『搴忎篃涓嶈锛屽洜涓篭uDF06\uD834 骞舵病鏈夊搴旂殑瀛楃銆�
JSON.stringify()鐨勯棶棰樺湪浜庯紝瀹冨彲鑳借繑鍥� 0xD800 鍒� 0xDFFF 涔嬮棿鐨勫崟涓爜鐐广��
JSON.stringify('\u{D834}') // "\u{D834}"
涓轰簡纭繚杩斿洖鐨勬槸鍚堟硶鐨� UTF-8 瀛楃锛孍S2019 鏀瑰彉浜� JSON.stringify()鐨勮涓恒�傚鏋滈亣鍒� 0xD800 鍒� 0xDFFF 涔嬮棿鐨勫崟涓爜鐐癸紝鎴栬�呬笉瀛樺湪鐨勯厤瀵瑰舰寮忥紝瀹冧細杩斿洖杞箟瀛楃涓诧紝鐣欑粰搴旂敤鑷繁鍐冲畾涓嬩竴姝ョ殑澶勭悊銆�
JSON.stringify('\u{D834}') // ""\\uD834""
JSON.stringify('\uDF06\uD834') // ""\\udf06\\ud834""
瀛楃涓插鐞嗗嚱鏁�
ES6 鏂板浜嗗嚑涓笓闂ㄥ鐞� 4 瀛楄妭鐮佺偣鐨勫嚱鏁般��
String.fromCodePoint()锛氫粠 Unicode 鐮佺偣杩斿洖瀵瑰簲瀛楃
鍥犱负 fromCodePoint() 鏄� String 鐨勪竴涓潤鎬佹柟娉曪紝鎵�浠ュ彧鑳介�氳繃 String.fromCodePoint() 杩欐牱鐨勬柟寮忔潵浣跨敤锛屼笉鑳藉湪浣犲垱寤虹殑 String 瀵硅薄瀹炰緥涓婄洿鎺ヨ皟鐢ㄣ��
String.fromCodePoint(42);聽 聽 聽 // "*"
String.fromCodePoint(65, 90);聽 // "AZ"
String.fromCodePoint(0x404);聽 聽 // "\u0404"
String.fromCodePoint(0x2F804);聽 // "\uD87E\uDC04"
String.fromCodePoint(194564);聽 // "\uD87E\uDC04"
String.fromCodePoint(0x1D306, 0x61, 0x1D307) // "\uD834\uDF06a\uD834\uDF07"
String.fromCodePoint('_');聽 聽 聽 // RangeError
String.fromCodePoint(Infinity); // RangeError
String.fromCodePoint(-1);聽 聽 聽 // RangeError
String.fromCodePoint(3.14);聽 聽 // RangeError
String.fromCodePoint(3e-2);聽 聽 // RangeError
String.fromCodePoint(NaN);聽 聽 聽 // RangeError
String.prototype.codePointAt()锛氫粠瀛楃杩斿洖瀵瑰簲鐨勭爜鐐�
濡傛灉鍦ㄦ寚瀹氱殑浣嶇疆娌℃湁鍏冪礌鍒欒繑鍥� undefined 銆傚鏋滃湪绱㈠紩澶勫紑濮嬫病鏈� UTF-16 浠g悊瀵癸紝灏嗙洿鎺ヨ繑鍥炲湪閭d釜绱㈠紩澶勭殑缂栫爜鍗曞厓銆�
Surrogate Pair 鏄� UTF-16 涓敤浜庢墿灞曞瓧绗﹁�屼娇鐢ㄧ殑缂栫爜鏂瑰紡锛屾槸涓�绉嶉噰鐢ㄥ洓涓瓧鑺�(涓や釜 UTF-16 缂栫爜)鏉ヨ〃绀轰竴涓瓧绗︼紝绉颁綔浠g悊瀵广��
'ABC'.codePointAt(1);聽 聽 聽 聽 聽 // 66
'\uD800\uDC00'.codePointAt(0); // 65536
'XYZ'.codePointAt(42); // undefined
姝e垯琛ㄨ揪寮�
ES6 鎻愪緵浜� u 淇グ绗︼紝瀵规鍒欒〃杈惧紡娣诲姞 4 瀛楄妭鐮佺偣鐨勬敮鎸併��
/^.$/.test('饾寙')
false
/^.$/u.test('饾寙')
true
Unicode 姝h鍖�
鏈変簺瀛楃闄や簡瀛楁瘝浠ュ锛岃繕鏈夐檮鍔犵鍙枫�傛瘮濡傦紝姹夎鎷奸煶鐨� 菓锛屽瓧姣嶄笂闈㈢殑澹拌皟灏辨槸闄勫姞绗﹀彿銆傚浜庤澶氭娲茶瑷�鏉ヨ锛屽0璋冪鍙锋槸闈炲父閲嶈鐨勩��
Unicode 鎻愪緵浜嗕袱绉嶈〃绀烘柟娉曘�備竴绉嶆槸甯﹂檮鍔犵鍙风殑鍗曚釜瀛楃锛屽嵆涓�涓爜鐐硅〃绀轰竴涓瓧绗︼紝姣斿 菓 鐨勭爜鐐规槸 U+01D1锛涘彟涓�绉嶆槸灏嗛檮鍔犵鍙峰崟鐙綔涓轰竴涓爜鐐癸紝涓庝富浣撳瓧绗﹀鍚堟樉绀猴紝鍗充袱涓爜鐐硅〃绀轰竴涓瓧绗︼紝姣斿 菓 鍙互鍐欐垚 O锛圲+004F锛� + 藝锛圲+030C锛夈��
// 鏂规硶涓�
'\u01D1'
// '菓'
// 鏂规硶浜�
'\u004F\u030C'
// '菓'
杩欎袱绉嶈〃绀烘柟娉曪紝瑙嗚鍜岃涔夐兘瀹屽叏涓�鏍凤紝鐞嗗簲浣滀负绛夊悓鎯呭喌澶勭悊銆備絾鏄紝JavaScript 鏃犳硶杈ㄥ埆銆�
'\u01D1'==='\u004F\u030C'
//false
ES6 鎻愪緵浜� normalize 鏂规硶锛屽厑璁糕�漊nicode 姝h鍖栤�濓紝鍗冲皢涓ょ鏂规硶杞负鍚屾牱鐨勫簭鍒椼��
'\u01D1'.normalize() === '\u004F\u030C'.normalize()
// true
Emoji 琛ㄦ儏绗﹀彿鐨勫偍瀛�
鍦� Android 鎵嬫満鎴栬�� iPhone 鐨勫悇绉嶈緭鍏ユ硶閿洏涓紝浼氳嚜甯︿竴浜� Emoji 琛ㄦ儏绗﹀彿锛屽 IPhone 鎵嬫満绯荤粺閿洏鍖呭惈鐨勮〃鎯呯鍙锋湁锛�
濡傛灉鍦ㄧЩ鍔ㄧ鍙戝竷鏂囨湰鍐呭鏃跺寘鍚簡杩欑 Emoji 琛ㄦ儏绗﹀彿锛岄�氳繃鎺ュ彛浼犻�掑埌鏈嶅姟鍣ㄧ锛屾湇鍔″櫒绔啀瀛樺叆 MySQL 鏁版嵁搴擄細
瀵� gbk 瀛楃闆嗙殑鏁版嵁搴擄紝鍐欏叆鏁版嵁搴撶殑鏁版嵁锛屽湪鍥炴樉鏃讹紝鍙樻垚 鈥樺彛鍙b�� 鏃犳硶鍥炴樉锛�
瀵� utf8 瀛楃闆嗙殑鏁版嵁搴擄紝鍒欐牴鏈棤娉曞啓鍏ユ暟鎹簱锛岀▼搴忕洿鎺ユ姤鍑哄紓甯镐俊鎭� java.io.exception xxxxxxxx.
鍘熷洜鍒嗘瀽锛�
杩欐槸鐢变簬瀛楃闆嗕笉鏀寔鐨勫紓甯革紝鍥犱负聽Emoji 琛ㄦ儏鏄洓涓瓧鑺傦紝鑰� mysql 鐨� utf-8 缂栫爜鏈�澶氫笁涓瓧鑺傦紝鎵�浠ュ鑷存暟鎹彃涓嶈繘鍘�銆�
鐪熸鐨� utf8 缂栫爜(澶у閮戒娇鐢ㄧ殑鏍囧噯)锛屾渶澶ф敮鎸� 4 涓� bytes銆傛鏄敱浜� mysql 鐨� utf8 灏戜竴涓� byte锛屽鑷翠腑鏂囩殑涓�浜涚壒娈婂瓧绗﹀拰 emoji 閮芥棤娉曟甯哥殑鏄剧ず銆�mysql 鐪熸鐨� utf8 鍏跺疄鏄� utf8mb4锛岃繖鏄湪 5.5.3 鐗堟湰涔嬪悗鍔犲叆鐨勩�傝�岀洰鍓嶇殑鈥渦tf8鈥濆叾瀹炴槸 utf8mb3銆傛墍浠ュ敖閲忎笉瑕佷娇鐢ㄩ粯璁ょ殑 utf8锛屼娇鐢� utf8mb4 鎵嶆槸姝g‘鐨勯�夋嫨銆�
浠� mysql 5.5.3 涔嬪悗鐗堟湰鍩烘湰鍙互鏃犵紳鍗囩骇鍒� utf8mb4 瀛楃闆嗐�傚悓鏃讹紝utf8mb4 鍏煎 utf8 瀛楃闆嗭紝utf8 瀛楃鐨勭紪鐮併�佷綅缃�佸瓨鍌ㄥ湪 utf8mb4 涓� utf8 瀛楃闆嗛噷涓�鏍风殑锛屼笉浼氬鏈夌幇鏈夋暟鎹甫鏉ユ崯鍧忋��