对字符串进行验证之前先进行规范化
原文来自:http://www.javaarch.net/jiagoushi/1068.htm
应用系统中经常对字符串会进行各种规则的验证,不过由于字符串信息在java6中是基于unicode的4.0版本的,而java7则是unicode的6.0.0版本。
unicode的规范化格式有几种,每种的处理方式有些不一样。
NFC
Unicode 规范化格式 C。如果未指定 normalization-type,那么会执行 Unicode 规范化。
NFD
Unicode 规范化格式 D。
NFKC
Unicode 规范化格式 KC。
NFKD
Unicode 规范化格式 KD。
如果我们对输入字符串先进行验证,再规范化,Normalizer.normalize将unicode的文本转成等价的规范化格式内容,下面这个用Pattern.compile("[<>]")验证不通过,
// String s may be user controllable // \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC String s = "\uFE64" + "script" + "\uFE65"; // Validate Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets Matcher matcher = pattern.matcher(s); if (matcher.find()) { // Found black listed tag throw new IllegalStateException(); } else { // . . . } // Normalize s = Normalizer.normalize(s, Form.NFKC);
String s = "\uFE64" + "script" + "\uFE65"; // Normalize s = Normalizer.normalize(s, Form.NFKC); // Validate Pattern pattern = Pattern.compile("[<>]"); Matcher matcher = pattern.matcher(s); if (matcher.find()) { // Found black listed tag throw new IllegalStateException(); } else { // . . . }
public final class Normalizer { private Normalizer() {}; /** * This enum provides constants of the four Unicode normalization forms * that are described in * <a href="http://www.unicode.org/unicode/reports/tr15/tr15-23.html"> * Unicode Standard Annex #15 — Unicode Normalization Forms</a> * and two methods to access them. * * @since 1.6 */ public static enum Form { /** * Canonical decomposition. */ NFD, /** * Canonical decomposition, followed by canonical composition. */ NFC, /** * Compatibility decomposition. */ NFKD, /** * Compatibility decomposition, followed by canonical composition. */ NFKC } /** * Normalize a sequence of char values. * The sequence will be normalized according to the specified normalization * from. * @param src The sequence of char values to normalize. * @param form The normalization form; one of * {@link java.text.Normalizer.Form#NFC}, * {@link java.text.Normalizer.Form#NFD}, * {@link java.text.Normalizer.Form#NFKC}, * {@link java.text.Normalizer.Form#NFKD} * @return The normalized String * @throws NullPointerException If <code>src</code> or <code>form</code> * is null. */ public static String normalize(CharSequence src, Form form) { return NormalizerBase.normalize(src.toString(), form); } /** * Determines if the given sequence of char values is normalized. * @param src The sequence of char values to be checked. * @param form The normalization form; one of * {@link java.text.Normalizer.Form#NFC}, * {@link java.text.Normalizer.Form#NFD}, * {@link java.text.Normalizer.Form#NFKC}, * {@link java.text.Normalizer.Form#NFKD} * @return true if the sequence of char values is normalized; * false otherwise. * @throws NullPointerException If <code>src</code> or <code>form</code> * is null. */ public static boolean isNormalized(CharSequence src, Form form) { return NormalizerBase.isNormalized(src.toString(), form); } }