Regular Expressions (1) ---- What is Regular Expressions?
1.什么是正则表达式.............................................................................................................................................................. 2
2.正则表达式的起源............................................................................................................................................................. 2
3. 正则表达式使用祥解........................................................................................................................................................ 3
3.1基本语法.............................................................................................................................................................................. 3
3.1.1普通字符..................................................................................................................................................................... 3
3.1.2非打印字符................................................................................................................................................................. 3
3.1.3特殊字符..................................................................................................................................................................... 3
3.1.4字符集........................................................................................................................................................................... 4
3.1.5在字符集中使用元字符............................................................................................................................................. 5
3.1.6预定义字符集.............................................................................................................................................................. 5
3.1.7 限定符........................................................................................................................................................................... 6
3.1.8定位符........................................................................................................................................................................... 6
3.1.9 “.”元字符...................................................................................................................................................................... 7
3.1.10用 “|”表示选择.......................................................................................................................................................... 8
3.1.11用 “()”表示分组.................................................................................................................................................. 8
3.1.12 “?”的补充说明.......................................................................................................................................................... 8
3.1.13给正则表达式添加注释......................................................................................................................................... 8
3.1.14操作符的运算优先级............................................................................................................................................... 8
3.2 高级话题............................................................................................................................................................................... 9
3.2.1反向引用..................................................................................................................................................................... 9
3.2.2在正则表达式中指定模式option........................................................................................................................... 9
3.2.3 Lookaround断言..................................................................................................................................................... 10
4. 正则表达式基本语法索引............................................................................................................................................ 11
5. 正则表达式高级语法索引.............................................................................................................................................. 15
6. 参考资料................................................................................................................................................................................. 17
7. 推荐工具................................................................................................................................................................................. 17
· 基本的正则表达式(BRE – Basic Regular Expressions)
· 扩展的正则表达式(ERE – Extended Regular Expressions)。
· 基于字符驱动(text-directed engine)
· 基于正则表达式驱动(regex-directed engine)
Jeffery Friedl把它们称作DFA和NFA解释引擎。
1. 本文所举例的所有表达时都是基于NFA解释引擎的。
2. 正则表达式,也就是匹配模式,会简写为Regex。
3. Regex的匹配目标,也就是目标字符串,会简写为String。
4. 匹配结果用会用黄色底色标识。
5. 用1/+1=2 括起来的表示这是一个regex。
6. 举例会用以下格式:
Regex |
Target String |
Description |
test |
This is a test |
会匹配test,testcase等 |
正则表达式的“祖先”可以一直上溯至对人类神经系统如何工作的早期研究。Warren McCulloch 和 Walter Pitts 这两位神经生理学家研究出一种数学方式来描述这些神经网络。
1956 年, 一位叫 Stephen Kleene 的美国数学家在 McCulloch 和 Pitts 早期工作的基础上,发表了一篇标题为“神经网事件的表示法”的论文,引入了正则表达式的概念。正则表达式就是用来描述他称为“正则集的代数”的表达式,因此采用“正则表达式”这个术语。
随后,发现可以将这一工作应用于使用Ken Thompson 的计算搜索算法的一些早期研究,Ken Thompson是Unix 的主要发明人。正则表达式的第一个实用应用程序就是 Unix 中的qed 编辑器。从那时起直至现在正则表达式都是基于文本的编辑器和搜索工具中的一个重要部分。具有完整语法的正则表达式使用在字符的格式匹配方面上,后来被应用到熔融信息技术领域。自从那时起,正则表达式经过几个时期的发展,现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。
最简单的正则表达式相信大家都已熟悉并且经常使用,那就是文字字符串。特定的字符串可通过文字本身加以描述;像 test这样的Regex模式可精确匹配输入的字符串”test”,但是它也可以匹配this is a testcase,这就不是我们想要得结果。
当然,使用正则表达式匹配等于它自身的精确字符串是没有价值的实现,不能体现正则表达式的真正作用。但是,假如要查找的不是test,而是所有以字母 t 开头的单词,或所有4个字母的单词,那该怎么办?这超出了文字字符串的合理范围。所以我们才需要深入地研究正则表达式。
正则表达式是由普通字符(例如字符 a 到 z)以及特殊字符(称为元字符)组成的文字模式。该模式描述在查找文字主体时待匹配的一个或多个字符串。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进行匹配。
Symbol |
Description |
/cx |
匹配由x指明的控制字符。例如, /cM 匹配一个 Control-M 或回车符。x 的值必须为 A-Z 或 a-z 之一。否则,将 c 视为一个原义的 'c' 字符。 |
/f |
匹配一个换页符。等价于 /x0c 和 /cL。 |
/n |
匹配一个换行符。等价于 /x0a 和 /cJ。 |
/r |
匹配一个回车符。等价于 /x0d 和 /cM。 |
/s |
匹配任何空白字符,包括空格、制表符、换页符等等。等价于 [ /f/n/r/t/v]。 |
/S |
匹配任何非空白字符。等价于 [^ /f/n/r/t/v]。 |
/t |
匹配一个制表符。等价于 /x09 和 /cI。 |
/v |
匹配一个垂直制表符。等价于 /x0b 和 /cK。 |
Regex中可以使用非打印字符。/t会匹配一个tab字符(ASC||),/r 会匹配一个回车(0x0D),/n 会匹配一个换行符(0x0A)。应该注意的是:Windows使用/r/n表示一行的结束,而UNIX使用/n 。
同样,我们可以在Regex中使用16进制的ASCⅡ码或者ANSI标准码。在拉丁语中,版权符号的代码是0xA9,所以我们也可以这样来匹配版权符号 /xA9 。另外一个匹配tab的写法是:/x09 。但是注意,第一位的“0”必须去掉。
Symbol |
Description |
$ |
匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性,则 $ 也匹配 '/n' 或 '/r'。要匹配 $ 字符本身,请使用 /$。 |
( ) |
标记一个子表达式的开始和结束位置。子表达式可以获取供以后使用。要匹配这些字符,请使用 /( 和 /)。 |
* |
匹配前面的子表达式零次或多次。要匹配 * 字符,请使用 /*。 |
+ |
匹配前面的子表达式一次或多次。要匹配 + 字符,请使用 /+。 |
. |
匹配除换行符 /n之外的任何单字符。要匹配 .,请使用 /。 |
[ |
标记一个中括号表达式的开始。要匹配 [,请使用 /[。 |
? |
匹配前面的子表达式零次或一次,或指明一个非贪婪限定符。要匹配 ? 字符,请使用 /?。 |
/ |
将下一个字符标记为或特殊字符、或原义字符、或反向引用、或八进制转义符。例如, 'n' 匹配字符 'n'。'/n' 匹配换行符。序列 '//' 匹配 "/",而 '/(' 则匹配 "("。 |
^ |
匹配输入字符串的开始位置,除非在方括号表达式中使用,此时它表示不接受该字符集合。要匹配 ^ 字符本身,请使用 /^。 |
{ |
标记限定符表达式的开始。要匹配 {,请使用 /{。 |
| |
指明两项之间的一个选择。要匹配 |,请使用 /|。 |
在元字符前加 / 转义符,可以把特殊字符当作普通字符来使用。
比如:如要要匹配 1+1=2 ,正确的正则表达式应该为1/+1=2。否则, + 会被当作特殊字符对待。
除了特殊字符,所有的其他字符都不应该加 / 。因为 / 也是一个特殊字符。/ 和普通字符组合在一起也可以创造一种特殊的意义。比如 /d 表示匹配所有的数字。
作为程序员,单引号和双引号不是特殊字符会也许让我们感到很惊讶。但这是正确的。因为我们在编程的时候,编程语言会知道引号之间的哪些字符表示特殊意义,编译器在把字符串x传递给regex解释引擎之前,会把它们处理成regex。比如,在C#中,如果我们要匹配 1/+1=2 ,在程序中我们要这样写: “1//+1=2” ,C#编译器会把 “//” ,处理为一个“/” 。同样,如果要匹配 C:/Temp ,首先,正则表达式要这样写 C://Temp,然后在程序中我们应该这样写:“ C:temp”。
字符集用[ ]括起来即可。
但是会和Iraq is a country匹配,因为q后面的空格字符是一个“不是u的字符”。
字符集中的元字符只能是 ‘]’, ‘/’, ‘^’, 和 ‘-‘ 。
Regex |
String |
Description |
[x^] |
A string with x and ^. |
匹配x或者“^” |
Regex |
String |
Description |
[]x] |
A string with x and ] |
匹配x或者“]” |
[^]x] |
A string with x and ] |
匹配除了x和”] ”以外的所有字符 |
Regex |
String |
Description |
[//x] |
A string with x and / |
匹配x或者“/” |
Regex |
String |
Description |
[-x] |
A string with x and - |
匹配x或者“-” |
[x-] |
A string with x and - |
匹配x或者“-” |
Regex |
Meaning |
Description |
/d |
[0-9] |
所有数字 |
/w |
[a-zA-Z] |
表示所有的字符,和文化字体有关 |
/s |
[ /t/r/n] |
空格,回车和tab。和文化字体有关 |
Regex |
String |
Description |
/s/d |
1 |
匹配后面紧跟着一个数字的空白符 |
[/s/d] |
1 |
匹配一个单独的字符或者一个数字或者一个空白符 |
Regex |
Meaning |
Description |
/D |
[^/d] |
非数字 |
/W |
[^/w] |
非字符,和文化字体有关 |
/S |
[^/s] |
非空格,回车和tab。和文化字体有关 |
Symbol |
Description |
Description |
? |
0次获1次 |
* |
0次或n次 |
+ |
1次或n次 |
{min, max} |
最少min次,最多max次 |
Max必须大于等于min。 |
{min,<不指定> } |
最少min次,或者n次 |
{min} |
精确的重复min次 |
在字符集后面使用 “?”,”*”,”+”,表示重复。会重复整个的字符集,而不是重复匹配的那个字符。
Regex |
String |
意义 |
[0-9]+ |
846,111 |
匹配数字 |
([0-9))+ |
846,111 |
匹配数字相同的数字 |
([0-9])/1+ 只会匹配111,而不会匹配846。
如果目标string是 811116。则1111会被匹配。如果不想这样,则需要使用
Regex |
Function |
Description |
^ |
第一个字符之前的位置 |
包含换行符 |
$ |
最后一个字符后面的位置 |
包含换行符 |
/A |
总是匹配string的第一个位置 |
不包含换行符 |
/Z |
总是匹配string的最后一个位置 |
不包含换行符 |
Regex |
String |
意义 |
^ |
Abc |
匹配A之前的位置 |
$ |
Abc |
匹配c后面的位置 |
^A |
Abc |
匹配A |
^b |
Abc |
不能匹配 |
c$ |
Abc |
匹配c |
A$ |
Abc |
不能匹配 |
词(word)是由可以组成词的字符组成(“word characters”),“word characters”就是可以组成词的字符,不包括非打印字符和回车换行。
所有的word characters都可以用/w来表示。
所有的non-word characters都可以用/W来表示。
我们要匹配一个mm/dd/yy的格式的日期。但是我们可以让用户指定日期的分割符。一个简单的Regex是:/d/d./d/d./d/d这个看起来可以实现。它会很好的匹配04/09/07。问题是: 04409407也会被匹配。因为第三个4和第五个4都会被“.”匹配。这不是我们想要得结果。
但这个方法还不完美,它会匹配 99/99/99 , [0-1]/d[-/.][0-3]/d[-/.]/d/d也许会好一些。虽然它仍旧会匹配19/39/99, 。方法够用就好了,不必追求完美,如果这个是用来验证用户需求,可能还需要改进,如果只是用来分析一段code,或许已经足够了。
如果我们要匹配一段带双引号的字符串。听起来很容易,我们可以在两个双引号之间放任意多个任意字符。Regex可能会这么写:“.*”,这个会匹配put a “string” between double quotes.结果是对的,但是如果去匹配“"string one” and “string two””则得到的结果会是 “string one” and “string two” 。这不是我们要的结果。
如果需要匹配 cat 或者 dog,可以这样写:cat|dog,也可以添加很多:cat|dog|mouse|fish。
效果就是,如果用Feb 23(rd)?去匹配Today is Feb 23rd, 2004,结果总是Feb 23rd,而不是Feb 23。
(?#comment here)
Symbol |
Function |
Memo |
/ |
转义符 |
(), (?:), (?=), [] |
括号 |
*, +, ?, {n}, {n,}, {n,m} |
限定符 |
^, $, /anymetacharacter |
定位符 |
| |
或 |
()除了把regex括起来以外,还可以创建反向引用。对一个正则表达式模式或部分模式两边添加圆括号将导致相关匹配存储到一个临时缓冲区中,所捕获的每个子匹配都按照在正则表达式模式中从左至右所遇到的内容存储。存储子匹配的缓冲区编号从 1 开始,连续编号直至最大 99 个子表达式。每个缓冲区都可以使用 '/n' 访问,其中 n 为一个标识特定缓冲区的一位或两位十进制数。
可以使用非捕获元字符 '?:', '?=', or '?!' 来忽略对相关匹配的保存。
正则表达式一个最重要的特性就是将匹配成功的模式的某部分进行存储供以后使用这一能力。请回想一下,对一个正则表达式模式或部分模式两边添加圆括号将导致这部分表达式存储到一个临时缓冲区中。可以使用非捕获元字符 '?:', '?=', or '?!' 来忽略对这部分正则表达式的保存。
所捕获的每个子匹配都按照在正则表达式模式中从左至右所遇到的内容存储。存储子匹配的缓冲区编号从 1 开始,连续编号直至最大 99 个子表达式。每个缓冲区都可以使用 '/n' 访问,其中 n 为一个标识特定缓冲区的一位或两位十进制数。
Is is the cost of of gasoline going up up?
/b([a-z]+) /1/b
在这个示例中,子表达式就是圆括号之间的每一项。所捕获的表达式包括一个或多个字母字符,即由[a-z]+ 所指定的。该正则表达式的第二部分是对前面所捕获的子匹配的引用,也就是由附加表达式所匹配的第二次出现的单词。'/1'用来指定第一个子匹配。单词边界元字符确保只检测单独的单词。如果不这样,则诸如 "is issued" 或 "this is" 这样的短语都会被该表达式不正确地识别。
Symbol |
Function |
Memo |
i |
区分大小写 |
前面加“-”表示关闭选项 |
s |
单行模式匹配 |
M |
多行模式匹配 |
前面我们的例子q[^u]表示的意义是:‘q’后面的字符可以是除了u以外的所有字符”。但是,如果我们要得到的结果是:‘q’后面不是’u’,注意,不是:’q’后面的字符不是’u’。(q后面可以什么也没有,而字符集必须匹配一个字符),在这种情况下,我们就必须使用反向lookahead断言。可以这样写:q(?!u)。它的匹配结果就是: ‘q’后面不是’u’。
Regular Expression Basic Syntax Reference
Characters |
Character |
Description |
Example |
Any character except [/^$.|?*+() |
All characters except the listed special characters match a single instance of themselves. |
a matches a |
/ (backslash) followed by any of [/^$.|?*+() |
A backslash escapes special characters to suppress their special meaning. |
/+ matches + |
/xFF where FF are 2 hexadecimal digits |
Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in character classes. |
/xA9 matches © when using the Latin-1 code page. |
/n, /r and /t |
Match an LF character, CR character and a tab character respectively. Can be used in character classes. |
/r/n matches a DOS/Windows CRLF line break. |
Character Classes or Character Sets [abc] |
Character |
Description |
Example |
[ (opening square bracket) |
Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are not valid in character classes, except /n, /r, /t and /xFF |
Any character except ^-]/ add that character to the possible matches for the character class. |
All characters except the listed special characters. |
[abc] matches a, b or c |
/ (backslash) followed by any of ^-]/ |
A backslash escapes special characters to suppress their special meaning. |
[/^/]] matches ^ or ] |
- (hyphen) except immediately after the opening [ |
Specifies a range of characters. (Specifies a hyphen if placed immediately after the opening [) |
[a-zA-Z0-9] matches any letter or digit |
^ (caret) immediately after the opening [ |
Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [) |
[^a-d] matches x (any character except a, b, c or d) |
/d, /w and /s |
Shorthand character classes matching digits 0-9, word characters (letters and digits) and whitespace respectively. Can be used inside and outside character classes |
[/d/s] matches a character that is a digit or whitespace |
/D, /W and /S |
Negated versions of the above. Should be used only outside character classes. (Can be used inside, but that is confusing).) |
/D matches a character that is not a digit |
Dot |
Character |
Description |
Example |
. (dot) |
Matches any single character except line break characters /r and /n. Most regex flavors have an option to make the dot match line break characters too. |
. matches x or (almost) any other character |
Anchors |
Character |
Description |
Example |
^ (caret) |
Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well. |
^. matches a in abc/ndef. Also matches d in "multi-line" mode. |
$ (dollar) |
Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break. |
.$ matches f in abc/ndef. Also matches c in "multi-line" mode. |
/A |
Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Never matches after line breaks. |
/A. matches a in abc |
/Z |
Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break. |
./Z matches f in abc/ndef |
/z |
Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks. |
./z matches f in abc/ndef |
Word Boundaries |
Character |
Description |
Example |
/b |
Matches at the position between a word character (anything matched by /w) and a non-word character (anything matched by [^/w] or /W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters. |
./b matches c in abc |
/B |
Matches at the position between two word characters (i.e the position between /w/w) as well as at the position between two non-word characters (i.e. /W/W). |
/B./B matches b in abc |
Alternation |
Character |
Description |
Example |
| (pipe) |
Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of options. |
abc|def|xyz matches abc, def or xyz |
| (pipe) |
The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression. |
abc(def|xyz) matches abcdef or abcxyz |
Quantifiers |
Character |
Description |
Example |
? (question mark) |
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. |
abc? matches ab or abc |
?? |
Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. This construct is often excluded from documentation because of its limited use. |
abc?? matches ab or abc |
* (star) |
Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all. |
".*" matches "def" "ghi" in abc "def" "ghi" jkl |
*? (lazy star) |
Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item. |
".*?" matches "def" in abc "def" "ghi" jkl |
+ (plus) |
Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once. |
".+" matches "def" "ghi" in abc "def" "ghi" jkl |
+? (lazy plus) |
Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item. |
".+?" matches "def" in abc "def" "ghi" jkl |
{n} where n is an integer >= 1 |
Repeats the previous item exactly n times. |
a{3} matches aaa |
{n,m} where n >= 1 and m >= n |
Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times. |
a{2,4} matches aa, aaa or aaaa |
{n,m}? where n >= 1 and m >= n |
Repeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times. |
a{2,4} matches aaaa, aaa or aa |
{n,} where n >= 1 |
Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times. |
a{2,} matches aaaaa in aaaaa |
{n,}? where n >= 1 |
Repeats the previous item between n and m times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item. |
a{2,}? matches aa in aaaaa |
Regular Expression Advanced Syntax Reference
Grouping and Backreferences |
Syntax |
Description |
Example |
(regex) |
Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex. |
(abc){3} matches abcabcabc. First group matches abc. |
(?:regex) |
Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything and do not create backreferences. |
(?:abc){3} matches abcabcabc. No groups. |
/1 through /9 |
Substituted with the text matched between the 1st through 9th pair of capturing parentheses. Some regex flavors allow more than 9 backreferences. |
(abc|def)=/1 matches abc=abc or def=def, but not abc=def or def=abc. |
Modifiers |
Syntax |
Description |
Example |
(?i) |
Turn on case insensitivity for the remainder of the regular expression. (Older regex flavors may turn it on for the entire regex.) |
te(?i)st matches teST but not TEST. |
(?-i) |
Turn off case insensitivity for the remainder of the regular expression. |
(?i)te(?-i)st matches TEst but not TEST. |
(?s) |
Turn on "dot matches newline" for the remainder of the regular expression. (Older regex flavors may turn it on for the entire regex.) |
(?-s) |
Turn off "dot matches newline" for the remainder of the regular expression. |
(?m) |
Caret and dollar match after and before newlines for the remainder of the regular expression. (Older regex flavors may apply this to the entire regex.) |
(?-m) |
Caret and dollar only match at the start and end of the string for the remainder of the regular expression. |
(?i-sm) |
Turns on the options "i" and "m", and turns off "s" for the remainder of the regular expression. (Older regex flavors may apply this to the entire regex.) |
(?i-sm:regex) |
Matches the regex inside the span with the options "i" and "m" turned on, and "s" turned off. |
(?i:te)st matches TEst but not TEST. |
Atomic Grouping and Possessive Quantifiers |
Syntax |
Description |
Example |
(?>regex) |
Atomic groups prevent the regex engine from backtracking back into the group (forcing the group to discard part of its match) after a match has been found for the group. Backtracking can occur inside the group before it has matched completely, and the engine can backtrack past the entire group, discarding its match entirely. Eliminating needless backtracking provides a speed increase. Atomic grouping is often indispensable when nesting quantifiers to prevent a catastrophic amount of backtracking as the engine needlessly tries pointless permutations of the nested quantifiers. |
x(?>/w+)x is more efficient than x/w+x if the second x cannot be matched. |
?+, *+, ++ and {m,n}+ |
Possessive quantifiers are a limited yet syntactically cleaner alternative to atomic grouping. Only available in a few regex flavors. They behave as normal greedy quantifiers, except that they will not give up part of their match for backtracking. |
x++ is identical to (?>x+) |
Lookaround |
Syntax |
Description |
Example |
(?=regex) |
Zero-width positive lookahead. Matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume any characters or expand the match. In a pattern like one(?=two)three, both two and three have to match at the position where the match of one ends. |
t(?=s) matches the second t in streets. |
(?!regex) |
Zero-width negative lookahead. Identical to positive lookahead, except that the overall match will only succeed if the regex inside the lookahead fails to match. |
t(?!s) matches the first t in streets. |
(?<=text) |
Zero-width positive lookbehind. Matches at a position to the left of which text appears. Since regular expressions cannot be applied backwards, the test inside the lookbehind can only be plain text. Some regex flavors allow alternation of plain text options in the lookbehind. |
(?<=s)t matches the first t in streets. |
(? |
Zero-width negative lookbehind. Matches at a position if the text does not appear to the left of that position. |
(? matches the second t in streets. |
Continuing from The Previous Match |
Syntax |
Description |
Example |
/G |
Matches at the position where the previous match ended, or the position where the current match attempt started (depending on the tool or regex flavor). Matches at the start of the string during the first match attempt. |
/G[a-z] first matches a, then matches b and then fails to match in ab_cd. |
Conditionals |
Syntax |
Description |
Example |
(?(?=regex)then|else) |
If the lookahead succeeds, the "then" part must match for the overall regex to match. If the lookahead fails, the "else" part must match for the overall regex to match. Not just positive lookahead, but all four lookarounds can be used. Note that the lookahead is zero-width, so the "then" and "else" parts need to match and consume the part of the text matched by the lookahead as well. |
(?(?<=a)b|c) matches the second b and the first c in babxcac |
Comments |
Syntax |
Description |
Example |
(?#comment) |
Everything between (?# and ) is ignored by the regex engine. |
a(?#foobar)b matches ab |
正则表达式 Blog
Mastering Regular Expressions (O'Reilly),作者 Jeffrey Friedl
.NET 正则表达式参考
Jscript 正则表达式语法
本文所有的例子都是在EditPad Pro下验证的。
另一个工具是:The Regulator.