Regular Expression Special Characters
"."---Any single character(a "wildcard")
"["---Begin character class
"]"---End character class
"{"---Begin count
"}"---End count
"("---Begin grouping
")"---End grouping
"\"---Next character has a special meaning
"*"---Zero or more
"+"---One or more
"?"---Optional(zero or one)
"!"---Alternative(or)
"^"---Start of line; negation
"$"---End of line
Example:
case 1:
^A*B+C?$
explain 1:
以A开头,有多个或者没有B,有至少一个C,之后有没有都可以,结束。
A pattern can be optional or repeated(the default is exactly once) by adding a suffix:
Repetition
{n}---Exactly n times;
{n,}---no less than n times;
{n,m}---at least n times and at most m times;
*---Zero or more , that is , {0,}
+---One or more, that is ,{1,}
?---Optional(zero or one), that is {0,1}
Example:
case 1:
A{3}B{2,4}C*
explain 1:
AAABBC or AAABBB
A suffix ? after any of the repetition notations makes the pattern matcher "lazy" or "non-greedy".
That is , when looking for a pattern, it will look for the shortest match rather than the lonest.
By default, the pattern matcher always looks for the longest match (similar to C++'s Max rule).
Consider:
ababab
The pattern (ab)*matches all of "ababab". However, (ab)*? matches only the first "ab".
The most common character classifications have names:
Character Classes
alnum --- Any alphanumeric character
alpha --- Any alphanumeric character
blank --- Any whitespace character that is not a line separator
cntrl --- Any control character
d --- Any decimal digit
digit --- Any decimal digit
graph --- Any graphical character
lower --- Any lowercase character
print --- Any printable character
punct --- Any punctuation character
s --- Any whitespace character
space --- Any whitespace character
upper --- Any uppercase charater
w --- Any word character(alphnumeric characters plus the underscore)
xdigit --- Any hexadecimal digit character
Several character classes are supported by shorthand notation:
Character Class Abbreviations
\d --- A decimal digit --- [[:digit:]]
\s --- A space (space tab,...) --- [[:space:]]
\w --- A letter(a-z) or digit(0-9) or underscore(_) --- [_[:alnum:]]
\D --- Not \d --- [^[:digit:]]
\S --- Not \s --- [^[:space:]]
\W --- Not \w --- [^_[:alnum:]]
In addition, languages supporting regular expressions often provide:
Nonstandard (but Common) Character Class Abbreviations
\l --- A lowercase character --- [[:lower:]]
\u --- An uppercase character --- [[:upper;]]
\L --- Not \l --- [^[:lower:]]
\U --- Not \u --- [^[:upper:]]
Note the doubling of the backslash to include a backslash in an ordinary string literal.
As usual, backslashes can denote special charaters:
Special Characters
\n --- Newline
\t --- Tab
\\ --- One backslash
\xhh -- Unicode characters expressed using twp hexadecimal digits
\uhhh --- Unicode characters expressed using four hexadecimal digits
To add to the opportunites for confusion, two further logically differents uses of the backslash are provided:
Special Characters
\b --- The first or last character of a word (a "boundary character")
\B --- Not a \b
\i --- The ith sub_match in this pattern
Here are some examples of patterns:
Ax* //A,Ax,Axxxx
Ax+ //Ax,Axxx not A
\d-?\d //1-2,12 not 1--2
\w{2}-d{4,5} //Ab-1234,XX54321,22-5432
(\d*:)?(\d+) //12:3, 1:23, 123, :123 Not 123:
(bs|BS) //bs ,BS Not bS
[aeiouy] //a,o,u An English vowel, not x
[^aeiouy] //x,k Not an English vowel, not e
[a^eiouy] //a,^,o,u An Engish vowel or ^
下面是测试代码:
#include <iostream> #include <regex> using namespace std; int main() { const char* reg_esp = "^A*B+C?$"; regex rgx(reg_esp); cmatch match; const char* target = "AAAAAAAAABBBBBBBBC"; if(regex_search(target,match,rgx)) { for(size_t a = 0;a < match.size();a++) cout << string(match[a].first,match[a].second) << endl; } else cout << "No Match Case !" << endl; return 0; }