正则表达式在java中的类

Java中的正则表达式类

java.util.regex包

该包主要包括以下三个类

  1. Pattern类: Pattern对象是一个正则表达式的编译表示
  2. Matcher类: Matcher对象是对输入字符串进行解释和匹配操作引擎。
  3. PatternSyntaxException : PatternSyntaxException是一个非强制异常类,它表示一个正则表达式模式中的语法错误

Pattern类

Pattern是一个final修饰的类,

主要参数

 /**
     * The original regular-expression pattern string.
     *这个是匹配的规则,就是正则表达式
     * @serial
     */
    private String pattern;

    /**
     * The original pattern flags.
     *这个是匹配的标志,如public static final int UNICODE_CASE = 0x40;这个就是该类中的一个规则,如果flags=UNICODE_CASE那么该匹配规则就不区分大小写
     * @serial
     */
    private int flags;
    
	/**
     * The starting point of state machine for the find operation.  This allows
     * a match to start anywhere in the input.
     * 用于查找操作的状态机的起点。 这允许在输入中任何位置开始的匹配。
     */
    transient Node root;

实例化Pattern:comile(String regex)

pulic static Pattern comile(String regex){
	return new Pattern(regex,0);
}

public static Pattern comile(String regex,int flags){
	return new Pattern(regex,flags)
}

实例化Matcher:matcher(CharSequence input)

public Matcher matcher(CharSequence input) {
        if (!compiled) {
            synchronized(this) {
                if (!compiled)
                    compile();
            }
        }
        Matcher m = new Matcher(this, input);
        return m;
    }

全局匹配matches(String regex, CharSequence input)

 public static boolean matches(String regex, CharSequence input) {
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        return m.matches();
    }

字符切割:split(CharSequence input, int limit),split(CharSequence input)

input:这个是要操作的字符串,limit是限制返回的个数,但是如果使用该方法只能切割成3个,但是limit=4,这样的话也返回的是3个
public String[] split(CharSequence input, int limit) {
        int index = 0;
        boolean matchLimited = limit > 0;
        ArrayList matchList = new ArrayList<>();
        Matcher m = matcher(input);

        // Add segments before each match found
        while(m.find()) {
            if (!matchLimited || matchList.size() < limit - 1) {
                if (index == 0 && index == m.start() && m.start() == m.end()) {
                    // no empty leading substring included for zero-width match
                    // at the beginning of the input char sequence.
                    continue;
                }
                String match = input.subSequence(index, m.start()).toString();
                matchList.add(match);
                index = m.end();
            } else if (matchList.size() == limit - 1) { // last one
                String match = input.subSequence(index,
                                                 input.length()).toString();
                matchList.add(match);
                index = m.end();
            }
        }

        // If no match was found, return this
        if (index == 0)
            return new String[] {input.toString()};

        // Add remaining segment
        if (!matchLimited || matchList.size() < limit)
            matchList.add(input.subSequence(index, input.length()).toString());

        // Construct result
        int resultSize = matchList.size();
        if (limit == 0)
            while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
                resultSize--;
        String[] result = new String[resultSize];
        return matchList.subList(0, resultSize).toArray(result);
    }

public String[] split(CharSequence input) {
        return split(input, 0);
    }

Matcher类

Matcher类是使用final修饰的继承MatchResult接口,MatchResult中的方法有》start(),start(int),end(),end(int),groupCount(), ——返回int
group(),group(int), ——返回String

参数

/**
     * The Pattern object that created this Matcher.创建该Matcher的Pattern类
     */
    Pattern parentPattern;
/**
 * The storage used by groups. They may contain invalid values if
 * a group was skipped during the matching.组使用的存储。如果在匹配过程中跳过一个组,它们可能包含无效的值。
 */
int[] groups;


/**
 * The range within the sequence that is to be matched. Anchors
 * will match at these "hard" boundaries. Changing the region
 * changes these values.匹配的范围
 */
int from, to;

/**
 * Lookbehind uses this value to ensure that the subexpression
 * match ends at the point where the lookbehind was encountered.
 */
int lookbehindTo;

/**
 * The original string being matched.要操作的数据
 */
CharSequence text;

/**
 * Matcher state used by the last node. NOANCHOR is used when a
 * match does not have to consume all of the input. ENDANCHOR is
 * the mode used for matching all the input. NOANCHOR表示不必匹配所有的输入;ENDANCHOR表示必须匹配所有的输入。
 */
static final int ENDANCHOR = 1;
static final int NOANCHOR = 0;
int acceptMode = NOANCHOR;

/**
 * The range of string that last matched the pattern. If the last
 * match failed then first is -1; last initially holds 0 then it
 * holds the index of the end of the last match (which is where the
 * next search starts).最后一个匹配模式的字符串的范围。
 */
int first = -1, last = 0;

/**
 * The end index of what matched in the last match operation.在最后一次匹配操作中匹配的结束索引。
 */
int oldLast = -1;

/**
 * The index of the last position appended in a substitution.追加在替换中的最后位置的索引。
 */
int lastAppendPosition = 0;

/**
 * Storage used by nodes to tell what repetition they are on in
 * a pattern, and where groups begin. The nodes themselves are stateless,
 * so they rely on this field to hold state during a match.
 */
int[] locals;

/**
 * Boolean indicating whether or not more input could change
 * the results of the last match. 
 * 
 * If hitEnd is true, and a match was found, then more input
 * might cause a different match to be found.
 * If hitEnd is true and a match was not found, then more
 * input could cause a match to be found.
 * If hitEnd is false and a match was found, then more input
 * will not change the match.
 * If hitEnd is false and a match was not found, then more
 * input will not cause a match to be found.
 */
boolean hitEnd;

/**
 * Boolean indicating whether or not more input could change
 * a positive match into a negative one.
 *
 * If requireEnd is true, and a match was found, then more
 * input could cause the match to be lost.
 * If requireEnd is false and a match was found, then more
 * input might change the match but the match won't be lost.
 * If a match was not found, then requireEnd has no meaning.
 */
boolean requireEnd;

/**
 * If transparentBounds is true then the boundaries of this
 * matcher's region are transparent to lookahead, lookbehind,
 * and boundary matching constructs that try to see beyond them.
 */
boolean transparentBounds = false;

/**
 * If anchoringBounds is true then the boundaries of this 
 * matcher's region match anchors such as ^ and $.
 */
boolean anchoringBounds = true;

怎么获取Matcher类

Matcher构造方法修饰符都是缺省的,所以外部不能直接创建Matcher类需要通过下面代码来创建
Patern p = Pattern.comile(regex);
Matcher m = p.matcher(str);

还可以使用下面方法来获取Pattern对象

public Pattern pattern() {
        return parentPattern;
    }

重置匹配器reset(),它会将里面的该类中的参数重新构造

public Matcher reset() {
        first = -1;
        last = 0;
        oldLast = -1;
        for(int i=0; i

查找方法public boolean find(),
是否存在该匹配

public static void main(String[] args) {
		getDecomPressionStr("HG[3|B[2|CA]]FHG[3|B[2|sCA]]F");
	}
	private static void getDecomPressionStr(String str) {
		String regex = "[\\[][0-9][\\|][^\\[\\]]+[\\]]";
		Pattern p = Pattern.compile(regex);
		Matcher m = p.matcher(str);
		while(m.find()) {
			String targe = m.group();
			System.out.println(targe);
		}
	}

打印

[2|CA]
[2|sCA]

find()方法
是查找该操作字符串中是否存在该匹配,如果有,则可以调用m.group()方法,可以获取该匹配字符串,然后Matcher类中的参数全都进行修改,这个方法有点像迭代器中的hasnext()和next()方法
find(int start)方法
从start位置开始查找

start()方法
返回当前匹配的字串的第一个字符在目标字符串中的索引位置
end()方法
返回当前匹配的字串的最后一个字符的下一个位置在目标字符串中的索引位置
相当于如果操作的字符串为=HG[3|B[2|CA]]F,匹配出[2|CA],那么[2|CA]在字符串中的位置为[start(),end()),左闭右开的位置。
regionStart()
返回次匹配器域的开始索引

 public int regionStart() {
        return from;
    }

regionEnd()方法
报告次匹配器区域的结束索引
region(int start,int end)方法
是匹配器需要匹配作用的范围,

replaceAll(String replacement)方法
将匹配的字串替换掉public String replaceAll(String replacement)
replaceFirst(String replacement)方法
将匹配的第一个字串用指定的字符串替换public String replaceFirst(String replacement)
usePattern(Pattern newPattern)方法
更改匹配模式

你可能感兴趣的:(#,正则表达式)