鍓嶅嚑澶╅」鐩腑鏈夎繖鏍风殑闇�姹傦細瀹㈡埛绔渶瑕佸湪鏈嶅姟鍣ㄤ笅杞戒竴涓枃鏈枃浠舵樉绀哄嚭鏉ャ�俠ug鏄痬ac涓婄殑涓枃鏄剧ず涔辩爜銆傞�氳繃鏌ユ壘鐪嬮」鐩腑鐨勮�佷唬鐮侊紝鍘熷洜鏄唬鐮佷腑浣跨敤鐨勬槸閫氳繃BOM澶寸殑鍒ゆ柇鏂瑰紡鏉ュ垽鏂枃鏈殑缂栫爜鏍煎紡锛屽鏋滄病鏈塀OM澶达紝浠g爜涓氨閮借瘑鍒垚浜咷BK缂栫爜銆傝櫧鐒跺湪Windows涓妘tf-8鐨勭紪鐮侀兘榛樿娣诲姞BOM澶达紝浣嗘槸涔熷彲浠ヤ娇鐢ㄦ棤BOM澶寸殑UTF-8鏉ヤ繚瀛樼殑锛岃�屽湪mac涓婇粯璁ょ殑UTF-8閮芥槸鏃燘OM鐨勭紪鐮佹牸寮忥紝鎵�浠ヤ唬鐮佷腑灏变細灏嗘棤BOM鐨刄TF-8鍒ゆ柇鎴怗BK鏉ヨ繘琛岀紪鐮侊紝鎵�浠ヤ細涔辩爜銆傝秮杩欎釜鏈轰細锛屽皢缂栫爜鐨勭煡璇嗚ˉ浜嗕竴涓嬶紝鏈�鍚庤璇碽ug鐨勪慨澶嶆柟寮忋�傚鏋滀綘鍙槸鎯崇煡閬撴�庝箞鍒ゆ柇缂栫爜鏍煎紡鍙互鐩存帴璺冲埌鏈�鍚�
Unicode瀛楃闆�
Unicode鏄竴涓瓧绗﹂泦锛屽氨鏄湪Unicode瀛楃闆嗕腑瀹氫箟浜嗗嚑涔庡叏涓栫晫鎵�鏈夌殑鏂囧瓧鍜岀鍙枫�侫SCII瀛楃闆嗐�両SO 8859瀛楃闆嗐�丟B2312瀛楃闆嗐�丅IG5瀛楃闆嗐�丟B18030瀛楃闆嗙瓑銆�
鍏朵腑鍚庨潰鍒椾妇鍑烘潵鐨勫瓧绗﹂泦閮芥槸鍖哄煙鎬х殑瀛楃闆嗭紝Unicode鍗村寘鍚簡鍑犱箮鎵�鏈夌殑鏂囧瓧鍜岀鍙�
鎴戜滑鐭ラ亾鍦ㄨ绠楁満涓槸浣跨敤瀛楄妭鏉ヨ〃绀哄瓧绗︾殑锛岄偅涔堝嚑涓瓧鑺傛潵琛ㄧず涓�涓瓧绗﹀氨鏄垜浠钩鏃舵墍璇寸殑缂栫爜鏍煎紡锛屽父瑙佺殑缂栫爜鏍煎紡鏈塙TF-8銆乁TF-16銆乁TF-32銆丟BK銆丄NSI銆�
UTF瀹舵棌
鍏堟潵璇碪TF瀹舵棌锛孶TF缂栫爜鏍煎紡鏄拡瀵筓nicode瀛楃闆嗚繘琛岀殑缂栫爜鏍煎紡锛屽寘鎷琔TF-8锛孶TF-16锛孶TF-32锛屽叾涓璘TF-16浣跨敤涓や釜鑷繁琛ㄧず涓�涓瓧绗︼紝UTF-32浣跨敤鍥涗釜瀛楄妭琛ㄧず涓�涓瓧绗︼紝杩欎袱绉嶉兘鏄娇鐢ㄥ浐瀹氱殑瀛楄妭鏉ヨ〃绀轰竴涓瓧绗︼紝鑰孶TF-8鍒欐槸鍙彉闀垮害鐨勭紪鐮佹柟寮忥紝瀹冩牴鎹瓧绗︾殑闀垮害杩涜鍔ㄦ�佷慨鏀癸紝浣跨敤1-4涓瓧鑺傛潵杩涜姣忎釜瀛楃鐨勭紪鐮侊紝鐩稿浜庝笂闈袱绉嶇紪鐮佹柟寮忥紝鍦ㄥ瓨鍌ㄦ暟鎹殑鏃跺�欏氨鏋佸ぇ鐨勮妭鐪佷簡绌洪棿锛屽洜涓烘槸鍙彉闀垮害鐨勬潵琛ㄧず瀛楃锛屾墍浠TF-8鏄吋瀹笰SCII鐮佽〃鐨勶紝鍚屾椂涔熷彲浠ヨ〃绀篣nicode涓墍鏈夌殑瀛楃锛屾墍浠TF-8鏄渶娴佽鐨勭紪鐮佹柟寮�
GBK
鑰孏BK鏄笓闂ㄧ殑涓枃缂栫爜鏂瑰紡锛屾槸鍦ˋSCII琛ㄧ殑鍩虹涓婃墿灞曟潵鐨勶紝鎵�浠BK涔熸槸鍏煎ASCII琛ㄧ殑锛岄殢鐫�GBK瀛楃闆嗙殑涓嶆柇鏇磋凯锛孏BK缂栫爜鏂瑰紡涔熷湪涓嶆柇杩涘寲锛孏BK2312锛孏BK18030锛屼絾鏄兘鏀寔鍚戝墠鍏煎锛屾墍浠ヤ娇鐢℅BK18030鏄彲浠ュ吋瀹逛箣鍓嶆墍鏈夌殑GBK缂栫爜鏂瑰紡鐨勩�傚叾涓瑽IG5鎸囩殑鏄箒浣撲腑鏂囩殑缂栫爜鏂瑰紡
ANSI
鍐嶈璇碅NSI缂栫爜锛屽叾瀹炶繖绉嶇紪鐮佷笉鏄竴绉嶅浐瀹氱殑缂栫爜鏍煎紡锛屽畠鏄牴鎹綘鎿嶄綔绯荤粺鐨勮瑷�鏉ョ‘瀹氫娇鐢ㄤ粈涔堢紪鐮佹柟寮忕殑锛屽鏋滅郴缁熸槸绠�浣撲腑鏂囩殑淇濆瓨灏遍粯璁や娇鐢℅BK锛屽鏋滅郴缁熸槸鑻辨枃鐨勪繚瀛橀粯璁や娇鐢ˋSCII锛屽叾浠栧浗瀹惰瑷�鐨勭郴缁熷氨榛樿瀵瑰簲鍥藉鐨勭紪鐮佹柟寮忥紝鍦↙inux绯荤粺涓婁篃鏄繖鏍风殑
澶х灏忕
涓嬮潰鍐嶆潵璇磋澶х搴忓拰灏忕搴忥細
鎵�璋撶殑澶х搴忥紝灏忕搴忓氨鏄湪鏁版嵁鍦ㄤ紶杈撹繃绋嬩腑锛岀敱浜庝笉鍚岀數鑴慍PU纭欢涓嶅悓锛屽湪璇绘暟鎹殑鏃跺�欙紝鍋囧cpu鏄ぇ绔簭閭d箞瀛楄妭楂樹綅鍦ㄥ墠锛屽鏋渃pu鏄皬绔簭閭d箞瀛楄妭浣庝綅鍦ㄥ墠銆備緥濡傦紝涓�涓�滃鈥濈殑Unicode缂栫爜鏄�594E锛屸�滀箼鈥濈殑Unicode缂栫爜鏄�4E59銆傚鏋滄垜浠敹鍒癠TF-16瀛楄妭娴佲��594E鈥濓紝閭d箞杩欐槸鈥滃鈥濊繕鏄�滀箼鈥濓紵濡傛灉BOM鏄ぇ绔簭锛岄偅涔堜唬鐮佺偣灏卞簲璇ユ槸594E锛岄偅涔堝氨鏄�滃鈥濓紝濡傛灉BOM鏄皬绔簭锛岄偅涔堜唬鐮佺偣灏卞簲璇ユ槸4E59锛屽氨鏄�滀箼鈥濅簡銆�
杩欐牱鍦ㄤ紶杈撶殑杩囩▼涓氨浼氫骇鐢熼敊璇紝澶氬瓧鑺傜殑Unicode缂栫爜鏂瑰紡瀹氫箟浜嗕竴涓�"瀛楄妭椤哄簭鏍囪(Byte Order Mark)"鍗矪OM锛屽畠鏄竴涓壒娈婄殑闈炴墦鍗板瓧绗︼紝浣犲彲浠ユ妸瀹冨寘鍚湪鏂囨。鐨勫紑澶存潵鎸囩ず浣犳墍浣跨敤鐨勫瓧鑺傞『搴忥紝璁╄绠楁満鐭ラ亾璋佸湪楂樹綅锛岃皝鍦ㄤ綆浣嶃�傚浜嶶TF-16锛屽瓧鑺傞『搴忔爣璁版槸U+FEFF銆傚鏋滄敹鍒颁竴涓互瀛楄妭FF FE寮�澶寸殑UTF-16缂栫爜鐨勬枃妗o紝浣犲氨鑳界‘瀹氬畠鐨勫瓧鑺傞『搴忔槸灏忕浜嗭紱濡傛灉瀹冧互FE FF寮�澶达紝鍒欏彲浠ョ‘瀹氬瓧鑺傞『搴忓ぇ绔殑浜嗐��
濡傛灉缁嗗績鐨勮瘽锛屼綘浼氱湅鍒颁笂闈㈣澶х搴忓拰灏忕搴忕殑鏃跺�欐湁涓�涓檺瀹氳瘝锛屽氨鏄�屽瀛楄妭鐨刄nicode缂栫爜銆嶏紝閭d箞濡傛灉涓嶆槸澶氬瓧鑺傜殑鏃跺�欏憿锛熶笉濡俇TF-8
鎴戜滑鐭ラ亾UTF-8 鏄彲鍙橀暱搴︾殑缂栫爜鏂瑰紡锛屾湁鐨勫瓧绗︿娇鐢ㄧ殑鏄崟瀛楄妭缂栫爜锛屼笉濡傝嫳鏂囧瓧姣嶏紝鏈夌殑瀛楃鏄瀛楄妭缂栫爜锛孶TF-8鏄病鏈夊ぇ灏忕搴忎箣鍒嗙殑锛屼负浠�涔堝憿锛熷氨鏄洜涓篣TF-8鏄互鍗曞瓧鑺備负缂栫爜鍗曞厓鐨勶紝鑰屾暟鎹殑璇诲彇鐨勬渶灏忓崟浣嶅氨鏄瓧鑺傦紝鎵�浠ヨ瀵逛簬 UTF-8 鏉ヨ鏍规湰灏辨病鏈夐『搴忕殑闂锛屼綘鍙兘浼氶棶锛屼綘涓嶆槸璇� UTF-8 涔熸湁澶氬瓧鑺傜紪鐮佺殑鏃跺�欏悧锛熸槸鐨勶紝鍗充娇鏄瀛楄妭缂栫爜鐨勬椂鍊欙紝涔熸槸浠ヤ竴涓瓧鑺備负涓�涓崟鍏冩潵澶勭悊鐨勶紝鑰� UTF-16 鍜� UTF-32 鐨勫崟鍏冨垎鍒槸 2 涓瓧鑺傚拰 4 涓瓧鑺傘��
鏈�鍚庤璇撮偅涓猙ug鐨勪慨澶嶏紝鍦ㄦ垜浠疄闄呬娇鐢ㄤ腑鍏跺疄灏变袱绉嶇紪鐮侊紝UTF-8鍜孏BK銆俇TF-8鏈夋棤BOM鍜屾湁BOM鐨勫尯鍒紝鑰孏BK鏄病鏈塀OM澶寸殑锛屾墍鏈夎鍖哄垎鐨勫氨鏄棤BOM鐨刄TF-8鍜孏BK锛岀櫨搴oogle浜嗗ソ涔呬篃娌℃湁鎵惧埌鏂规硶锛屾渶鍚庨棶浜嗗叕鍙搁噷鎼濩鐨勫悓浜嬫墠鐭ラ亾鍘熸潵鏈変竴涓簱锛屽彲浠ョ洿鎺ュ垽鏂紝浼犲叆鏂囨湰鐨刡yte瀛楄妭锛岃繑鍥炵紪鐮佹牸寮忋�俫ithub涓婃病鏈夌殑锛屽湪Unicode瀹樼綉鎵嶆湁锛屽彨 ICU4J锛屼篃鏈夊搴旂殑 C 搴擄紝鍙獻CU4C锛屽浜庢垜杩欐牱鐨勫彧浼氱敤java璇█鐨勫垵绾х▼搴忓憳锛屽彧鑳界敤绗竴涓簡銆�
浣跨敤杩欎釜搴撳彲浠ュ緢瀹规槗鐨勫垽鏂竴涓枃鏈殑缂栫爜鏍煎紡,鎴戠◢寰皝瑁呬簡涓�涓嬶紝浼犲叆鏂囨湰鐨勫瓧鑺傛暟缁勫氨鍙互鑾峰彇鍒扮紪鐮佹牸寮�
public String getEncode(byte[] bytes){
CharsetDetector detector = new CharsetDetector();
detector.setText(bytes);
CharsetMatch match = detector.detect();
String encode = match.getName();
return encode;
}
韪╄繃鐨勫潙
涓�锛氬皢娴佸厛杞崲鎴怱tring锛岀劧鍚庡湪杞垚瀛楄妭鏁扮粍
鍦ㄤ紶鍏ュ瓧鑺傛暟缁勭殑鏃跺�欐垜鏄厛閫氳繃娴佽浆鎴愮殑String锛岀劧鍚庡湪杞垚 byte[] 浼犲叆鎺ュ彛涓殑锛岃繖鏍疯瘑鍒殑缂栫爜鏄笉姝g‘鐨勶紝鍚庢潵鏀规垚浜嗙洿鎺ュ皢娴佽浆鎴� byte[] 鐨勬柟寮忓氨濂界敤浜嗐�傝浆鎹唬鐮佸涓嬶細
private static byte[] readInputSream(InputStream in) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int len = 0;
byte[] buffer = new byte[1024];
while ((len = in.read(buffer)) != -1) {
baos.write(buffer, 0, len);
}
in.close();
return baos.toByteArray();
}
浜岋細鐩存帴灏嗘祦浼犲叆搴撶殑鎺ュ彛涓�
鍦↖CU4J杩欎釜搴撲腑鏄湁鏂规硶鏉ユ帴鏀朵竴涓狪nputStream鏉ュ垽鏂枃鏈殑缂栫爜鐨勶紝灏辨槸涓婇潰鐨�setText
鏂规硶锛屽畠鍙互鎺ユ敹涓�涓緭鍏ユ祦锛屼絾鏄綋鎴戠洿鎺ュ皢鏂囨湰杞崲鎴愯緭鍏ユ祦浼犲叆涔嬪悗锛屾�绘槸浼氭姏鍑哄紓甯革紝鏈�鍚庝篃娌℃湁鎼炴噦涓轰粈涔堬紝鏃㈢劧浼犲叆瀛楄妭鏁扮粍濂界敤锛屾墍浠ユ垜涔熸病鏈夊お绾犵粨杩欎釜涓滆タ锛岃繖閲屽憡璇夊ぇ瀹讹紝濡傛灉浣犵敤鍒扮殑璇濓紝娉ㄦ剰涓�涓嬭繖鐐广��
娆㈣繋鍏虫敞鎴戠殑寰俊鍏紬鍙凤紝鎴戜細鎶婁竴浜涚敓娲荤殑鎰熸兂鍜屾姇璧勬柟闈㈢殑鎬荤粨鍐欏埌鍏紬鍙凤紝甯屾湜浣犺兘鏉ュ拰鎴戜竴璧蜂氦娴佹妧鏈箣澶栫殑涓滆タ銆�