Quantifiers allow you to specify the number of occurrences to match against.
Greedy | Reluctant | Possessive | Meaning |
---|---|---|---|
X? | X?? | X?+ | X, once or not at all |
X* | X*? | X*+ | X, zero or more times |
X+ | X+? | X++ | X, one or more times |
X{n} | X{n}? | X{n}+ | X, exactly n times |
X{n,} | X{n,}? | X{n,}+ | X, at least n times |
X{n,m} | X{n,m}? | X{n,m}+ | X, at least n but not more than m times |
其中X可以是一个字符,一个characters class,或者是一个group,关于group会在下一章讲解。
量词可以分为三类:Greedy,Reluctant和Possessive。需要注意的是X?,X??和X?+都表示X有且只出现一次或者没有出现。但是他们在实现上存在着微妙的区别,我们先来看下面这个例子:
虽然这三个正则表达式表示的是同一个意思,但是对于同一个字符串,却得到了不同的匹配结果,这是我在Stack Overflow上找到的一个解释:
A greedy quantifier first matches as much as possible. So the .* matches the entire string. Then the matcher tries to match the f following, but there are no characters left. So it "backtracks", making the greedy quantifier match one less thing (leaving the "o" at the end of the string unmatched). That still doesn't match the f in the regex, so it "backtracks" one more step, making the greedy quantifier match one less thing again (leaving the "oo" at the end of the string unmatched). That still doesn't match the f in the regex, so it backtracks one more step (leaving the "foo" at the end of the string unmatched). Now, the matcher finally matches the f in the regex, and the o and the next o are matched too. Success!
A reluctant or "non-greedy" quantifier first matches as little as possible. So the .* matches nothing at first, leaving the entire string unmatched. Then the matcher tries to match the f following, but the unmatched portion of the string starts with "x" so that doesn't work. So the matcher backtracks, making the non-greedy quantifier match one more thing (now it matches the "x", leaving "fooxxxxxxfoo" unmatched). Then it tries to match the f, which succeeds, and the o and the next o in the regex match too. Success!
In your example, it then starts the process over with the remaining unmatched portion of the string, following the same process.
A possessive quantifier is just like the greedy quantifier, but it doesn't backtrack. So it starts out with .* matching the entire string, leaving nothing unmatched. Then there is nothing left for it to match with the f in the regex. Since the possessive quantifier doesn't backtrack, the match fails there.
下面是官方文档中对Greedy,Reluctant和Possessive的解释,结合上面的解答,可以更深刻的理解它们之间的区别:
Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.
(贪婪的,所以每次都先匹配最长的)
The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.
(不情愿的,所以每次都先匹配最短的)
Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.
(占有欲强的,不仅贪婪,想匹配最长的字符串,而且不把匹配到的字符吐出来)