参考:http://blog.csdn.net/axwolfer/archive/2009/04/20/4094152.aspx
The three kinds of regular expression quantifiers are greedy, reluctant, and possessive.
A greedy quantifier starts by looking at the entire string for a match. If no match is found, it eliminates
the last character in the string and tries again. If a match is still not found, the last character is again
discarded and the process repeats until a match is found or the string is left with no characters. All the
quantifiers discussed to this point have been greedy.
A reluctant quantifier starts by looking at the first character in the string for a match. If that character
alone isn’t enough, it reads in the next character, forming a string of two characters. If still no match isfound, a reluctant quantifier continues to add characters from the string until either a match is found or
the entire string is checked without a match. Reluctant quantifiers work in reverse of greedy quantifiers.
A Possessive quantifier only tries to match against the entire string. If the entire string doesn’t produce a match, no further attempt is made. Possessive quantifiers are, in a manner of speaking, a one-shot deal.
What makes a quantifier greedy, reluctant, or possessive? It’s really all in the use of the asterisk, question
mark, and plus symbols. For example, the question mark alone (?) is greedy, but a question mark fol-
lowed by another question mark (??) is reluctant. To make the question mark possessive, append a plus
sign (?+). The following table shows all the greedy, reluctant, and possessive versions of the quantifiers
you’ve already learned.
Greedy | Reluctant | Possessive | Description |
? | ?? | ?+ | Zero or one occurrences |
* | *? | *+ | Zero or more occurrences |
+ | +? | ++ | One or more occurrences |
{n} | {n}? | {n}+ | Exactly n occurrences |
{n,m} | {n,m}? | {n,m}+ | At least n but no more than m occurrences |
{n,} | {n,}? | {n,}+ | At least n occurrences |
Notes: Browser support for possessive quantifiers leaves much to be desired. Internet
Explorer and Opera don’t support possessive quantifiers and throw an error when
you try to use one. Mozilla won’t throw an error, but it treats possessive quantifiers
as greedy.
匹配次数中的贪婪与非贪婪
在使用修饰匹配次数的特殊符号时,有几种表示方法可以使同一个表达式能够匹配不同的次数,比如:"{m,n}", "{m,}", "?", "*", "+",具体匹配的次数随被匹配的字符串而定。这种重复匹配不定次数的表达式在匹配过程中,总是尽可能多的匹配。比如,针对文本 "dxxxdxxxd",举例如下:
表达式 |
匹配结果 |
"\w+" 将匹配第一个 "d" 之后的所有字符 "xxxdxxxd" |
|
"\w+" 将匹配第一个 "d" 和最后一个 "d" 之间的所有字符 "xxxdxxx"。虽然 "\w+" 也能够匹配上最后一个 "d",但是为了使整个表达式匹配成功,"\w+" 可以 "让出" 它本来能够匹配的最后一个 "d" |
由此可见,"\w+" 在匹配的时候,总是尽可能多的匹配符合它规则的字符。虽然第二个举例中,它没有匹配最后一个 "d",但那也是为了让整个表达式能够匹配成功。同理,带 "*" 和 "{m,n}" 的表达式都是尽可能地多匹配,带 "?" 的表达式在可匹配可不匹配的时候,也是尽可能的 "要匹配"。这 种匹配原则就叫作 "贪婪" 模式 。
非贪婪模式:
在修饰匹配次数的特殊符号后再加上一个 "?" 号,则可以使匹配次数不定的表达式尽可能少的匹配,使可匹配可不匹配的表达式,尽可能的 "不匹配"。这种匹配原则叫作 "非贪婪" 模式,也叫作 "勉强" 模式。如果少匹配就会导致整个表达式匹配失败的时候,与贪婪模式类似,非贪婪模式会最小限度的再匹配一些,以使整个表达式匹配成功。举例如下,针对文本 "dxxxdxxxd" 举例:
表达式 |
匹配结果 |
"\w+?" 将尽可能少的匹配第一个 "d" 之后的字符,结果是:"\w+?" 只匹配了一个 "x" |
|
为了让整个表达式匹配成功,"\w+?" 不得不匹配 "xxx" 才可以让后边的 "d" 匹配,从而使整个表达式匹配成功。因此,结果是:"\w+?" 匹配 "xxx" |
更多的情况,举例如下:
举 例1:表达式 "<td>(.*)</td>" 与字符串 "<td><p>aa</p></td> <td><p>bb</p></td>" 匹配时 ,匹配的结果是:成功;匹配到的内容是 "<td><p>aa</p></td> <td><p>bb</p></td>" 整个字符串, 表达式中的 "</td>" 将与字符串中最后一个 "</td>" 匹配。
举例2:相比之下,表达式 "<td>(.*?)</td>" 匹配举例1中同样的字符串时 ,将只得到 "<td><p>aa</p></td>", 再次匹配下一个时,可以得到第二个 "<td><p>bb</p></td>"。