日文编码SHIFT_JIS/MS932使用

总结:在多数情况下,使用MS932代替SHIFT_JIS,可减少乱码。

-----------------------------------------------------------------------------


参考:http://www.asteria.com/tutorial/asbook320_application_read.html



(6) Differences Between Shift_JIS and Windows-31J

 

 

As we mentioned earlier, Shift_JIS and Windows-31J employ different character sets and codes. This means that you must use different mapping converters when converting between them and Unicode.

The table below gives you the differences between Shift_JIS and Windows-31J at a glance:

・Mapping from Shift_JIS/Windows-31J to Unicode

JIS X 0208 characters Shift_JIS/Windows-31J codes Shift_JIS→Unicode Windows-31J→Unicode
~ (1-33, WAVE DASH) 8160 U+301C U+FF5E
∥ (1-34, DOUBLE VERTICAL LINE) 8161 U+2016 U+2225
- (1-61, MINUS SIGN) 817C U+2212 U+FF0D
¢ (1-81, CENT SIGN) 8191 U+00A2 U+FFE0
£ (1-82, POUND SIGN) 8192 U+00A3 U+FFE1
¬ (2-44, NOT SIGN) 81CA U+00AC U+FFE2
IBM extensions   No Yes
NEC extensions   No Yes

User-defined characters are mapped into the Unicode Private Use Area as shown in the table below:

Converter Shift_JIS range Unicode range
Windows-31J F040~F9FC E000~E757

・Mapping from Unicode to Shift_JIS/Windows-31J

Unicode characters Unicode codes Shift_JIS Windows-31J
∥ (DOUBLE VERTICAL LINE) U+2016 8161 ×
- (MINUS SIGN) U+2212 817C ×
~ (WAVE DASH) U+301C 8160 ×
- (FULLWIDTH HYPHEN-MINUS) U+FF0D × 817C
~ (FULLWIDTH TILDE) U+FF5E × 8160
¢ (FULLWIDTH CENT SIGN) U+FFE0 × 8191
£ (FULLWIDTH POUND SIGN) U+FFE1 × 8192
¬ (FULLWIDTH NOT SIGN) U+FFE2 × 81CA

To sum up, Shift_JIS and Windows-31J differ in the following ways:

 

  • Windows-31J can handle the additional characters from IBM and NEC.
  • Code points differ for some symbols when they are converted into Unicode.
  • In general, if you stick with Windows-31J, which is the larger character set, you shouldn't have any problems.

     

------------------------------ JDK 源代码摘要 --------------------------------------
  134           charset("Shift_JIS", "SJIS",

  135                   new String[] {

  136                       // IANA aliases

  137                       "sjis", // historical

  138                       "shift_jis",

  139                       "shift-jis",

  140                       "ms_kanji",

  141                       "x-sjis",

  142                       "csShiftJIS"

  143                   });

  144   

  145           // The definition of this charset may be overridden by the init method,

  146           // below, if the sun.nio.cs.map property is defined.

  147           //

  148           charset("windows-31j", "MS932",

  149                   new String[] {

  150                       "MS932", // JDK historical

  151                       "windows-932",

  152                       "csWindows31J"

  153                   });

  154   

  155           charset("JIS_X0201", "JIS_X_0201",

  156                   new String[] {

  157                       "JIS0201", // JDK historical

  158                       // IANA aliases

  159                       "JIS_X0201",

  160                       "X0201",

  161                       "csHalfWidthKatakana"

  162                   });

------------------------------

public class CharToByteMS932 extends CharToByteMS932DB {
     CharToByteJIS0201 cbJIS0201 = new CharToByteJIS0201();
    ... ...
------------------------------
public class CharToByteSJIS extends CharToByteJIS0208 {
     CharToByteJIS0201 cbJIS0201 = new CharToByteJIS0201();
    ... ...
------------------------------

你可能感兴趣的:(Oo/Java)