【The Java™ Tutorials】【Regular Expressions】3. String Literals

String Literals

该API支持的最基础的模式匹配是string literals。举个例子:

string literals example

index是左闭右开的,即[0,3)。index的具体划分方式如下:
【The Java™ Tutorials】【Regular Expressions】3. String Literals_第1张图片
The string literal foo, with numbered cells and index values.

cell是按字符划分的,能够完美支持中文。

Metacharacter

在解释什么是metacharacter之前,我们先来看一个例子:

metacharacter example

“cats”中并没有包含“.”,为什么会匹配成功呢?因为“.”是 metacharacter — a character with special meaning interpreted by the matcher。 “.”在matcher看来表示的是任意字符(不包括空字符)。

该API支持的metacharacter有:<([{^-=$!|]})?*+.>

Note: In certain situations the special characters listed above will not be treated as metacharacters. You'll encounter this as you learn more about how regular expressions are constructed. You can, however, use this list to check whether or not a specific character will ever be considered a metacharacter. For example, the characters @ and # never carry a special meaning.

那如果我们想把metacharacter当成是普通字符该怎么办呢?有两种方法:

  1. 用反斜杠转义
  2. 用\Q和\E包裹

比如我们想匹配文本中是否有"\",那我们的正则表达式应该写为“\\”或者“\Q\\E”,如下所示:


反斜杠转义

\Q\E转义

这里还有一点需要注意,Java的字符串中,“\”本身有特殊含义,我们在程序中写正则表达式,需要再做一层转义,如下所示:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexTestHarness {

    public static void main(String[] args){
        
        String regex = "\\\\";//注意这里的正则表达式与从console输入的区别
        
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher("he\\he");
        
        boolean found = false;
        while (matcher.find()) {
            System.out.format("I found the text" +
                    " \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());
                found = true;
         }
         if(!found){
             System.out.format("No match found.%n");
         }
    }
}
// output: I found the text "\" starting at index 2 and ending at index 3.

如果上面的代码中写成regex = "\\";,在Pattern.compile(regex);的时候就会抛出如下异常:

Exception in thread "main" java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
\
 ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.compile(Pattern.java:1702)
    at java.util.regex.Pattern.(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)
    at RegexTestHarness.main(RegexTestHarness.java:10)

因为在Pattern看来,“\\”是一个字符“\”。那为什么从console读取字符串的时候,只需要输入“\\”呢?因为console把我们的输入完全当成是普通字符,没有把“\”当成特殊字符,相当于它已经把我们的输入自动转化为“\\\\”。

你可能感兴趣的:(【The Java™ Tutorials】【Regular Expressions】3. String Literals)