正则表达式语法汇总--类Unix、UltraEdit、MS VC++ 6.0及VS.NET

 

正则表达式语法汇总
 
正则表达式作为功能强大的文本模式匹配语言应用非常广泛,除类Unix系统所使用的标准正则表达式外,像UltraEdit、MS VC++ 6.0编辑器、VS.NET编辑器等也会遇到。但是他们的语法是有差别的,下面就将这几类正则表达式的语法罗列出来以供在必要时 查阅
 
一、标准正则表达式
这里所说的标准正则表达式是指类Unix系统所使用的正则表达式,其语法如下:
Regular Expressions (Unix Syntax):
 

Symbol
Function
/
Indicates the next character has a special meaning. "n" on it own matches the character "n". "/n" matches a linefeed or newline character.  See examples below (/d, /f, /n etc).
^
Matches/anchors the beginning of line.
$
Matches/anchors the end of line.
*
Matches the preceding character zero or more times.
+
Matches the preceding character one or more times. Does not match repeated newlines.
Matches any single character except a newline character. Does not match repeated newlines.
(expression)
Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
 
The corresponding replacement expression is /x, for x in the range 1-9.  Example: If (h.*o) (f.*s) matches "hello folks", /2 /1 would replace it with "folks hello".
[xyz]
A character set. Matches any characters between brackets.
[^xyz]
A negative character set. Matches any characters NOT between brackets.
/d
Matches a digit character. Equivalent to [0-9].
/D
Matches a nondigit character. Equivalent to [^0-9].
/f
Matches a form-feed character.
/n
Matches a linefeed character.
/r
Matches a carriage return character.
/s
Matches any whitespace including space, tab, form-feed, etc but not newline.
/S
Matches any non-whitespace character but not newline.
/t
Matches a tab character.
/v
Matches a vertical tab character.
/w
Matches any word character including underscore.
/W
Matches any nonword character.
/p
Matches CR/LF (same as /r/n) to match a DOS line terminator

 
二、UltraEdit风格的正则表达式
Regular Expressions (UltraEdit Syntax):
 

Symbol
Function
%
Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
$
Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
?
Matches any single character except newline.
*
Matches any number of occurrences of any character except newline.
+
Matches one or more of the preceding character/expression.  At least one occurrence of the character must be found.  Does not match repeated newlines.
++
Matches the preceding character/expression zero or more times.  Does not match repeated newlines.
^b
Matches a page break.
^p
Matches a newline (CR/LF) (paragraph) (DOS Files)
^r
Matches a newline (CR Only) (paragraph) (MAC Files)
^n
Matches a newline (LF Only) (paragraph) (UNIX Files)
^t
Matches a tab character
[ ]
Matches any single character or range in the brackets
^{A^}^{B^}
Matches expression A OR B
^
Overrides the following regular expression character
^(sub-regex)  
Brackets or tags an expression to use in the replace command.  A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
 
The corresponding replacement expression is ^x, for x in the range 1-9.  Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

 
 
三、MS VC++ 6.0编辑器风格的正则表达式
在使用MS VC++ 6.0编辑代码时,我们常常会在代码中“查找/替换”,这时只需勾选“正则表达式”选项就可以在查找替换时使用功能强大的正则表达式。下面是在此处使用正则表达式相应的语法规则:

Regular Expression
Description
.
(Period.) Any single character.
[ ]
Any one of the characters contained in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, b[aeiou]d matches bad, bed, bid, bod, and bud, and r[eo]+d matches red, rod, reed, and rood, but not reod or roed. x[0-9] matches x0, x1, x2, and so on. If the first character in the brackets is a caret (^), then the regular expression matches any characters except those in the brackets.
^
The beginning of a line.
$
The end of a line.
/( /)
Indicates a tagged expression to retain for replacement purposes. If the expression in the Find What text box is /(lpsz/)BigPointer, and the expression in the Replace With box is /1NewPointer, all selected occurrences of lpszBigPointer are replaced with lpszNewPointer. Each occurrence of a tagged expression is numbered according to its order in the Find What text box, and its replacement expression is /n, where 1 corresponds to the first tagged expression, 2 to the second, and so on. You can have up to nine tagged expressions.
/~
No match if the following character or characters occur. For example, b/~a+d matches bbd, bcd, bdd, and so on, but not bad.
You can use this expression to prefix a group of characters you want to exclude, which is useful for excluding matches of particular words. For example, foo/~/(lish/) matches "foo" in "food" and "afoot" but not in "foolish."
/{c/!c/}
Any one of the characters separated by the alternation symbol (/!). For example, /{j/!u/}+fruit finds jfruit, jjfruit, ufruit, ujfruit, uufruit, and so on.
*
None or more of the preceding characters or expressions. For example, ba*c matches bc, bac, baac, baaac, and so on.
+
At least one or more of the preceding characters or expressions. For example, ba+c matches bac, baac, baaac, but not bc.
/{/}
Any sequence of characters between the escaped braces. For example, /{ju/}+fruit finds jufruit, jujufruit, jujujufruit, and so on. Note that it will not find jfruit, ufruit, or ujfruit, because the sequence ju is not in any of those strings.
[^]
Any character except those following the caret (^) character in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, x[^0-9] matches xa, xb, xc, and so on, but not x0, x1, x2, and so on.
/:a
Any single alphanumeric character [a – zA – Z0 – 9].
/:b
Any white-space character. The /:b finds tabs and spaces. There is no alternate syntax to express :b.
/:c
Any single alphabetic character [a – zA – Z].
/:d
Any decimal digit [0 – 9].
/:n
Any unsigned number /{[0-9]+/.[0-9]*/![0-9]*/.[0-9]+/![0-9]+/}. For example, /:n should match 123, .45, and 123.45.
/:z
Any unsigned decimal integer [0 – 9]+.
/:h
Any hexadecimal number [0 – 9a – fA – F]+.
/:i
Any C/C++ identifier [a – zA – Z_$][a – zA – Z0 – 9_$]+.
/:w
Any alphabetic string [a – zA – Z]+. The string need not be bounded by white space or appear at the beginning or the end of a line.
/:q
Any quoted string /{"[^"]*"/!'[^']*'/}.
/
Removes the pattern match characteristic in the Find What text box from the special characters listed above. For example, 100$ matches 100 at the end of a line, but 100/$ matches the character string 100$ anywhere on a line.

 
 
参考: http://msdn2.microsoft.com/en-us/library/aa242808.aspx
 
 
四、VS.NET 2005编辑器风格的正则表达式
VS.NET 2005编辑器所使用的正则表达式是MS VC++ 6.0编辑器所使用正则表达式的超集:

Expression
Syntax
Description
Any character
.
Matches any one character except a line break.
Maximal — zero or more
*
Matches zero or more occurrences of the preceding expression.
Maximal — one or more
+
Matches at least one occurrence of the preceding expression.
Minimal — zero or more
@
Matches zero or more occurrences of the preceding expression, matching as few characters as possible.
Minimal — one or more
#
Matches one or more occurrences of the preceding expression, matching as few characters as possible.
Repeat n times
^n
Matches n occurrences of the preceding expression. For example, [0-9]^4 matches any 4-digit sequence.
Set of characters
[]
Matches any one of the characters within the []. To specify a range of characters, list the starting and ending character separated by a dash (-), as in [a-z].
Character not in set
[^...]
Matches any character not in the set of characters following the ^.
Beginning of line
^
Anchors the match to the beginning of a line.
End of line
$
Anchors the match to the end of a line.
Beginning of word
Matches only when a word begins at this point in the text.
End of word
Matches only when a word ends at this point in the text.
Grouping
()
Groups a subexpression.
Or
|
Matches the expression before or after the OR symbol (|). Mostly used within a group. For example, (sponge|mud) bath matches "sponge bath" and "mud bath."
Escape
/
Matches the character following the backslash (/). This allows you to find characters used in the regular expression notation, such as { and ^. For example, /^ Searches for the ^ character.
Tagged expression
{}
Tags the text matched by the enclosed expression.
n th tagged text
/n
In a Find or Replace expression, indicates the text matched by the nth tagged expression, where n is a number from 1 to 9.
In a Replace expression, /0 inserts the entire matched text.
Right-justified field
/(w,n)
In a Replace expression, right-justifies the nth tagged expression in a field at least w characters wide.
Left-justified field
/(-w,n)
In a Replace expression, left-justifies the nth tagged expression in a field at least w characters wide.
Prevent match
~(X)
Prevents a match when X appears at this point in the expression. For example, real~(ity) matches the "real" in "realty" and "really," but not the "real" in "reality."
Alphanumeric character
:a
Matches the expression
([a-zA-Z0-9]).
Alphabetic character
:c
Matches the expression
([a-zA-Z]).
Decimal digit
:d
Matches the expression
([0-9]).
Hexadecimal digit
:h
Matches the expression
([0-9a-fA-F]+)..
Identifier
:i
Matches the expression
([a-zA-Z_$][a-zA-Z0-9_$]*).
Rational number
:n
Matches the expression
(([0-9]+.[0-9]*)|([0-9]*.[0-9]+)|([0-9]+)).
Quoted string
:q
Matches the expression (("[^"]*")|('[^']*'))
Alphabetic string
:w
Matches the expression
([a-zA-Z]+)
Decimal integer
:z
Matches the expression
([0-9]+).
Escape
/e
Unicode U+001B.
Bell
/g
Unicode U+0007.
Backspace
/h
Unicode U+0008.
Line break
/n
Matches a platform-independent line break. In a Replace expression, inserts a line break.
Tab
/t
Matches a tab character, Unicode U+0009.
Unicode character
/x#### or /u####
Matches a character given by Unicode value where #### is hexadecimal digits. You can specify a character outside the Basic Multilingual Plane (that is, a surrogate) with the ISO 10646 code point or with two Unicode code points giving the values of the surrogate pair.

The following table lists the syntax for matching by standard Unicode character properties. The two-letter abbreviation is the same as listed in the Unicode character properties database. These may be specified as part of a character set. For example, the expression [:Nd:Nl:No] matches any kind of digit.

Expression
Syntax
Description
Uppercase letter
:Lu
Matches any one capital letter. For example, :Luhe matches "The" but not "the".
Lowercase letter
:Ll
Matches any one lower case letter. For example, :Llhe matches "the" but not "The".
Title case letter
:Lt
Matches characters that combine an uppercase letter with a lowercase letter, such as Nj and Dz.
Modifier letter
:Lm
Matches letters or punctuation, such as commas, cross accents, and double prime, used to indicate modifications to the preceding letter.
Other letter
:Lo
Matches other letters, such as gothic letter ahsa.
Decimal digit
:Nd
Matches decimal digits such as 0-9 and their full-width equivalents.
Letter digit
:Nl
Matches letter digits such as roman numerals and ideographic number zero.
Other digit
:No
Matches other digits such as old italic number one.
Open punctuation
:Ps
Matches opening punctuation such as open brackets and braces.
Close punctuation
:Pe
Matches closing punctuation such as closing brackets and braces.
Initial quote punctuation
:Pi
Matches initial double quotation marks.
Final quote punctuation
:Pf
Matches single quotation marks and ending double quotation marks.
Dash punctuation
:Pd
Matches the dash mark.
Connector punctuation
:Pc
Matches the underscore or underline mark.
Other punctuation
:Po
Matches commas (,), ?, ", !, @, #, %, &, *, /, colons (:), semi-colons (;), ', and /.
Space separator
:Zs
Matches blanks.
Line separator
:Zl
Matches the Unicode character U+2028.
Paragraph separator
:Zp
Matches the Unicode character U+2029.
Non-spacing mark
:Mn
Matches non-spacing marks.
Combining mark
:Mc
Matches combining marks.
Enclosing mark
:Me
Matches enclosing marks.
Math symbol
:Sm
Matches +, =, ~, |, <, and >.
Currency symbol
:Sc
Matches $ and other currency symbols.
Modifier symbol
:Sk
Matches modifier symbols such as circumflex accent, grave accent, and macron.
Other symbol
:So
Matches other symbols, such as the copyright sign, pilcrow sign, and the degree sign.
Other control
:Cc
Matches end of line.
Other format
:Cf
Formatting control character such as the bidirectional control characters.
Surrogate
:Cs
Matches one half of a surrogate pair.
Other private-use
:Co
Matches any character from the private-use area.
Other not assigned
:Cn
Matches characters that do not map to a Unicode character.

In addition to the standard Unicode character properties, the following additional properties may be specified. These properties may be specified as part of a character set.

Expression
Syntax
Description
Alpha
:Al
Matches any one character. For example, :Alhe matches words such as "The", "then", and "reached".
Numeric
:Nu
Matches any one number or digit.
Punctuation
:Pu
Matches any one punctuation mark, such as ?, @, ', and so on.
White space
:Wh
Matches all types of white space, including publishing and ideographic spaces.
Bidi
:Bi
Matches characters from right-to-left scripts such as Arabic and Hebrew.
Hangul
:Ha
Matches Korean Hangul and combining Jamos.
Hiragana
:Hi
Matches hiragana characters.
Katakana
:Ka
Matches katakana characters.
Ideographic/Han/Kanji
:Id
Matches ideographic characters, such as Han and Kanji.

 
参考: http://msdn2.microsoft.com/en-us/library/2k3te2cs(VS.71).aspx

你可能感兴趣的:(unix,properties,正则表达式,vc++,character,newline)