Regular expressions are used to search and manipulate the text, based on the patterns. Most of the Linux commands and programming languages use regular expression.
Grep command is used to search for a specific string in a file. Please refer our earlier article for 15 practical grep command examples.
You can also use regular expressions with grep command when you want to search for a text containing a particular pattern. Regular expressions search for the patterns on each line of the file. It simplifies our search operation.
This articles is part of a 2 article series.
This part 1 article covers grep examples for simple regular expressions. The future part 2 article will cover advanced regular expression examples in grep.
Let us take the file /var/log/messages file which will be used in our examples.
In grep command, caret Symbol ^ matches the expression at the start of a line. In the following example, it displays all the line which starts with the Nov 10. i.e All the messages logged on November 10.
$ grep "^Nov 10" messages.1 Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to LOCAL(0), stratum 10 Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13, stratum 3 Nov 10 13:21:26 gs123 ntpd[2241]: time reset +0.146664 s Nov 10 13:25:46 gs123 ntpd[2241]: synchronized to LOCAL(0), stratum 10 Nov 10 13:26:27 gs123 ntpd[2241]: synchronized to 15.1.13.13, stratum 3
The ^ matches the expression in the beginning of a line, only if it is the first character in a regular expression. ^N matches line beginning with N.
Character $ matches the expression at the end of a line. The following command will help you to get all the lines which ends with the word “terminating”.
$ grep "terminating.$" messages Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating. Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating.
From the above output you can come to know when all the kernel log has got terminated. Just like ^ matches the beginning of the line only if it is the first character, $ matches the end of the line only if it is the last character in a regular expression.
Using ^ and $ character you can find out the empty lines available in a file. “^$” specifies empty line.
$ grep -c "^$" messages anaconda.log messages:0 anaconda.log:3
The above commands displays the count of the empty lines available in the messages and anaconda.log files.
The special meta-character “.” (dot) matches any character except the end of the line character. Let us take the input file which has the content as follows.
$ cat input 1. first line 2. hi hello 3. hi zello how are you 4. cello 5. aello 6. eello 7. last line
Now let us search for a word which has any single character followed by ello. i.e hello, cello etc.,
$ grep ".ello" input 2. hi hello 3. hi zello how are you 4. cello 5. aello 6. eello
In case if you want to search for a word which has only 4 character you can give grep -w “….” where single dot represents any single character.
The special character “*” matches zero or more occurrence of the previous character. For example, the pattern ’1*’ matches zero or more ’1′.
The following example searches for a pattern “kernel: *” i.e kernel: and zero or more occurrence of space character.
$ grep "kernel: *." * messages.4:Jul 12 17:01:02 cloneme kernel: ACPI: PCI interrupt for device 0000:00:11.0 disabled messages.4:Oct 28 06:29:49 cloneme kernel: ACPI: PM-Timer IO Port: 0x1008 messages.4:Oct 28 06:31:06 btovm871 kernel: sda: sda1 sda2 sda3 messages.4:Oct 28 06:31:06 btovm871 kernel: sd 0:0:0:0: Attached scsi disk sda . .
In the above example it matches for kernel and colon symbol followed by any number of spaces/no space and “.” matches any single character.
The special character “\+” matches one or more occurrence of the previous character. ” \+” matches at least one or more space character.
If there is no space then it will not match. The character “+” comes under extended regular expression. So you have to escape when you want to use it with the grep command.
$ cat input hi hello hi hello how are you hihello $ grep "hi \+hello" input hi hello hi hello how are you
In the above example, the grep pattern matches for the pattern ‘hi’, followed by one or more space character, followed by “hello”.
If there is no space between hi and hello it wont match that. However, * character matches zero or more occurrence.
“hihello” will be matched by * as shown below.
$ grep "hi *hello" input hi hello hi hello how are you hihello $
The special character “?” matches zero or one occurrence of the previous character. “0?” matches single zero or nothing.
$ grep "hi \?hello" input hi hello hihello
“hi \?hello” matches hi and hello with single space (hi hello) and no space (hihello).
The line which has more than one space between hi and hello did not get matched in the above command.
If you want to search for special characters (for example: * , dot) in the content you have to escape the special character in the regular expression.
$ grep "127\.0\.0\.1" /var/log/messages.4 Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo, 127.0.0.1#123 Enabled
The character class is nothing but list of characters mentioned with in the square bracket which is used to match only one out of several characters.
$ grep -B 1 "[0123456789]\+ times" /var/log/messages.4 Oct 28 06:38:35 btovm871 init: open(/dev/pts/0): No such file or directory Oct 28 06:38:35 btovm871 last message repeated 2 times Oct 28 06:38:38 btovm871 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found Oct 28 06:38:38 btovm871 last message repeated 3 times
Repeated messages will be logged in messages logfile as “last message repeated n times”. The above example searches for the line which has any number (0to9) followed by the word “times”. If it matches it displays the line before the matched line and matched line also.
With in the square bracket, using hyphen you can specify the range of characters. Like [0123456789] can be represented by [0-9]. Alphabets range also can be specified such as [a-z],[A-Z] etc. So the above command can also be written as
$ grep -B 1 "[0-9]\+ times" /var/log/messages.4
If you want to search for all the characters except those in the square bracket, then use ^ (Caret) symbol as the first character after open square bracket. The following example searches for a line which does not start with the vowel letter from dictionary word file in linux.
$ grep -i "^[^aeiou]" /usr/share/dict/linux.words 1080 10-point 10th 11-point 12-point 16-point 18-point 1st 2
First caret symbol in regular expression represents beginning of the line. However, caret symbol inside the square bracket represents “except” — i.e match except everything in the square bracket.
http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/
字符 | 说明 |
---|---|
\ |
将下一字符标记为特殊字符、文本、反向引用或八进制转义符。例如,“n”匹配字符“n”。“\n”匹配换行符。序列“\\”匹配“\”,“\(”匹配“(”。 |
^ |
匹配输入字符串开始的位置。如果设置了 RegExp 对象的 Multiline 属性,^ 还会与“\n”或“\r”之后的位置匹配。 |
$ |
匹配输入字符串结尾的位置。如果设置了 RegExp 对象的 Multiline 属性,$ 还会与“\n”或“\r”之前的位置匹配。 |
* |
零次或多次匹配前面的字符或子表达式。例如,zo* 匹配“z”和“zoo”。* 等效于 {0,}。 |
+ |
一次或多次匹配前面的字符或子表达式。例如,“zo+”与“zo”和“zoo”匹配,但与“z”不匹配。+ 等效于 {1,}。 |
? |
零次或一次匹配前面的字符或子表达式。例如,“do(es)?”匹配“do”或“does”中的“do”。? 等效于 {0,1}。 |
{n} |
n 是非负整数。正好匹配 n 次。例如,“o{2}”与“Bob”中的“o”不匹配,但与“food”中的两个“o”匹配。 |
{n,} |
n 是非负整数。至少匹配 n 次。例如,“o{2,}”不匹配“Bob”中的“o”,而匹配“foooood”中的所有 o。“o{1,}”等效于“o+”。“o{0,}”等效于“o*”。 |
{n,m} |
M 和 n 是非负整数,其中 n <= m。匹配至少 n 次,至多 m 次。例如,“o{1,3}”匹配“fooooood”中的头三个 o。'o{0,1}' 等效于 'o?'。注意:您不能将空格插入逗号和数字之间。 |
? |
当此字符紧随任何其他限定符(*、+、?、{n}、{n,}、{n,m})之后时,匹配模式是“非贪心的”。“非贪心的”模式匹配搜索到的、尽可能短的字符串,而默认的“贪心的”模式匹配搜索到的、尽可能长的字符串。例如,在字符串“oooo”中,“o+?”只匹配单个“o”,而“o+”匹配所有“o”。 |
. |
匹配除“\n”之外的任何单个字符。若要匹配包括“\n”在内的任意字符,请使用诸如“[\s\S]”之类的模式。 |
(pattern) |
匹配 pattern 并捕获该匹配的子表达式。可以使用 $0…$9 属性从结果“匹配”集合中检索捕获的匹配。若要匹配括号字符 ( ),请使用“\(”或者“\)”。 |
(?:pattern) |
匹配 pattern 但不捕获该匹配的子表达式,即它是一个非捕获匹配,不存储供以后使用的匹配。这对于用“or”字符 (|) 组合模式部件的情况很有用。例如,'industr(?:y|ies) 是比 'industry|industries' 更经济的表达式。 |
(?=pattern) |
执行正向预测先行搜索的子表达式,该表达式匹配处于匹配 pattern 的字符串的起始点的字符串。它是一个非捕获匹配,即不能捕获供以后使用的匹配。例如,'Windows (?=95|98|NT|2000)' 匹配“Windows 2000”中的“Windows”,但不匹配“Windows 3.1”中的“Windows”。预测先行不占用字符,即发生匹配后,下一匹配的搜索紧随上一匹配之后,而不是在组成预测先行的字符后。 |
(?!pattern) |
执行反向预测先行搜索的子表达式,该表达式匹配不处于匹配 pattern 的字符串的起始点的搜索字符串。它是一个非捕获匹配,即不能捕获供以后使用的匹配。例如,'Windows (?!95|98|NT|2000)' 匹配“Windows 3.1”中的 “Windows”,但不匹配“Windows 2000”中的“Windows”。预测先行不占用字符,即发生匹配后,下一匹配的搜索紧随上一匹配之后,而不是在组成预测先行的字符后。 |
x|y |
匹配 x 或 y。例如,'z|food' 匹配“z”或“food”。'(z|f)ood' 匹配“zood”或“food”。 |
[xyz] |
字符集。匹配包含的任一字符。例如,“[abc]”匹配“plain”中的“a”。 |
[^xyz] |
反向字符集。匹配未包含的任何字符。例如,“[^abc]”匹配“plain”中的“p”。 |
[a-z] |
字符范围。匹配指定范围内的任何字符。例如,“[a-z]”匹配“a”到“z”范围内的任何小写字母。 |
[^a-z] |
反向范围字符。匹配不在指定的范围内的任何字符。例如,“[^a-z]”匹配任何不在“a”到“z”范围内的任何字符。 |
\b |
匹配一个字边界,即字与空格间的位置。例如,“er\b”匹配“never”中的“er”,但不匹配“verb”中的“er”。 |
\B |
非字边界匹配。“er\B”匹配“verb”中的“er”,但不匹配“never”中的“er”。 |
\cx |
匹配 x 指示的控制字符。例如,\cM 匹配 Control-M 或回车符。x 的值必须在 A-Z 或 a-z 之间。如果不是这样,则假定 c 就是“c”字符本身。 |
\d |
数字字符匹配。等效于 [0-9]。 |
\D |
非数字字符匹配。等效于 [^0-9]。 |
\f |
换页符匹配。等效于 \x0c 和 \cL。 |
\n |
换行符匹配。等效于 \x0a 和 \cJ。 |
\r |
匹配一个回车符。等效于 \x0d 和 \cM。 |
\s |
匹配任何空白字符,包括空格、制表符、换页符等。与 [ \f\n\r\t\v] 等效。 |
\S |
匹配任何非空白字符。与 [^ \f\n\r\t\v] 等效。 |
\t |
制表符匹配。与 \x09 和 \cI 等效。 |
\v |
垂直制表符匹配。与 \x0b 和 \cK 等效。 |
\w |
匹配任何字类字符,包括下划线。与“[A-Za-z0-9_]”等效。 |
\W |
与任何非单词字符匹配。与“[^A-Za-z0-9_]”等效。 |
\xn |
匹配 n,此处的 n 是一个十六进制转义码。十六进制转义码必须正好是两位数长。例如,“\x41”匹配“A”。“\x041”与“\x04”&“1”等效。允许在正则表达式中使用 ASCII 代码。 |
\num |
匹配 num,此处的 num 是一个正整数。到捕获匹配的反向引用。例如,“(.)\1”匹配两个连续的相同字符。 |
\n |
标识一个八进制转义码或反向引用。如果 \n 前面至少有 n 个捕获子表达式,那么 n 是反向引用。否则,如果 n 是八进制数 (0-7),那么 n 是八进制转义码。 |
\nm |
标识一个八进制转义码或反向引用。如果 \nm 前面至少有 nm 个捕获子表达式,那么 nm 是反向引用。如果 \nm 前面至少有 n 个捕获,则 n 是反向引用,后面跟有字符 m。如果两种前面的情况都不存在,则 \nm 匹配八进制值 nm,其中 n 和 m 是八进制数字 (0-7)。 |
\nml |
当 n 是八进制数 (0-3),m 和 l 是八进制数 (0-7) 时,匹配八进制转义码 nml。 |
\un |
匹配 n,其中 n 是以四位十六进制数表示的 Unicode 字符。例如,\u00A9 匹配版权符号 (©)。 |
http://msdn.microsoft.com/zh-cn/library/ae5bf541(v=vs.80).aspx