重新了解了下正则表达式,小记如下,
参考:《Classic Shell Scripting》 p33 ~ p47
POSIX BRE and ERE metacharacters
Character BRE / ERE Meaning in a pattern
\ Both
. Both
* Both
^ Both
$ Both
[...] Both
\{n,m\} BRE
\( \) BRE
\n BRE
+ ERE
? ERE
| ERE
( ) ERE
{n,m} ERE
POSIX bracket expressions
Character classes
Class Matching characters
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Nonspace characters
[:lower:] Lowercase characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits
Collating symbols
A collating symbol is a multicharacter sequence that should be treated as a unit.
It consists of the characters bracketed by
[. and
.]. Collating symbols are specific to
the locale in which they are used.
Equivalence classes
An equivalence class lists a set of characters that should be considered equivalent,
such as e and è. It consists of a named element from the locale, bracketed by
[= and
=].
All three of these constructs must appear inside the square brackets of a bracket
expression. For example,
[[:alpha:]!] matches any single alphabetic character or the
exclamation mark,and
[[.ch.]] matches the collating element ch, but does not match just
the letter c or the letter h. In a French locale,
[[=e=]] might match any of e, è, ë, ê, or é.
Basic Regular Expressions
Matching single characters
•
Ordinary characters
•
Metacharacters: escaping it
•
The . (dot) character
•
Bracket expression:
[](such as
[012345],
[0-5],
[^0-5]) or
Character classes(such as
[:digit:]) or
Equivalence classes(such as
[=e=]) or
Collating symbols(such as
[.ch.]).
Within bracket expressions, all other metacharacters lose their special meanings. Thus,
[*\.] matches a literal asterisk, a literal backslash, or a literal period. To
get a ] into the set,
place it first in the list:
[ ]*\.] adds the ] to the list. To
get a minus character into the set,
place it first in the list:
[-*\.]. If you need
both a right bracket and a minus, make the right
bracket the first character, and make the minus the last one in the list:
[ ]*\.-].
Backreferences
Pattern Matches
\(ab\)\(cd\)[def]*\2\1
\(why\).*\1
\([[:alpha:]_][[:alnum:]_]*\) = \1;
\(["']\).*\1
Matching multiple characters with one expression
*
\{N\}
\{N,\}
\{N,M\}
\{,M\}
Anchoring text matches
^
$
BRE operator precedence
Operator Meaning
[..] [==] [::]
\metacharacter
[]
\(\) \digit
* \{\}
no symbol Concatenation
^ $
Extended Regular Expressions
Matching single characters
same as BREs. But one notable exception is that in
awk,
\ is special inside bracket
expressions. Thus, to match a left bracket, dash, right bracket, or backslash, you could
use
[\[\-\]\\].
Backreferences don’t exist
Matching multiple regular expressions with one expression
*
+
?
{N}
{N,}
{N,M}
{,M}
Alternation
|
Grouping
()
Anchoring text matches
same as BRE. But there is one significant difference: in EREs,
^ and
$ are always
metacharacters. Thus, regular expressions such as
ab^cd and
ef$gh are valid, but cannot
match anything,
ERE operator precedence
Operator Meaning
[..] [==] [::]
\metacharacter
[]
()
* + ? {}
no symbol Concatenation
^ $
| Alternation
Additional GNU regular expression operators
Operator
Meaning
\w Matches any word-constituent character. Equivalent to [[:alnum:]_].
\W Matches any nonword-constituent character. Equivalent to [^[:alnum:]_].
\< \> Matches the beginning and end of a word, as described previously.
\b Matches the null string found at either the beginning or the end of a word.
This is a generalization of the \< and \> operators. Note: Because awk uses
\b to represent the backspace character, GNU awk (gawk) uses \y.
\B Matches the null string between two word-constituent characters.
\' \` Matches the beginning and end of an emacs buffer, respectively. GNU
programs (besides emacs) generally treat these as being equivalent to ^ and $.
Finally, although POSIX explicitly states that the NUL character need not be matchable, GNU
programs have no such restriction. If a NUL character occurs in input data, it can be matched by
the
. metacharacter or a bracket expression.
Unix programs and their regular expression type
Type grep sed ed ex/vi more egrep awk lex
BRE • • • • •
ERE • • •
\< \> • • • • •