正则表达式

容易混淆的两个注意事项:

  • 正则表达式应用非常广泛,存在于各种语言中,例如:php,python,java等。但是我们今天讲的是Linux系统运维工作中的正则表达式,即Linux正则表达式,最常应用正则表达式的命令就是grep(egrep)、sed、awk,换句话说Linux三剑客要能工作的更高效,那一定离不开正则表达式配合。

  • 正则表达式和我们常用的通配符特殊字符是有本质区别的。

注意事项:

  • Linux正则表达式一般是以行为单位处理的
  • alias grep='grep --color=auto',讲课以grep为例

  • 注意字符集 export LC_ALL=C

基础正则:

  • ^word 匹配以word开头的内容,vi/vim 编辑器里^代表一行的开头
  • word$ 匹配以word结尾的内容,vi/vim 编辑器里$代表一行的开头
  • ^$表示空行
  • . 代表且只能代表任意一个字符
  • \ 例如 \.就只代表点本身,转义符号,让有着特殊身份意义的字符,脱掉马甲
  • *表示重复0个或多个前面的一个字符,例如:o* 匹配没有o 或者1个o或者多个oooo
  • .* 匹配所有字符 延伸^.*以任意多个字符开关.*$以任意多个字符结尾
  • [abc] 匹配字符集合内的任意一个字符[a-zA-z],[0-9]
  • [^abc] 匹配不包含^后的任意一个字符的内容 注意:中括号里的^为取反,注意和中括号以外是以什么什么开头区别
  • a\{n,m\}'重复n到m次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线
  • a\{n,\}'重复至少n次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线
  • a\{n\} 重复n次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线

测试环境

[root@centos6 data]# cat >>oldboy.log< I am oldboy teacher!
> I teach Linux.
>
> I like badminton ball,billiard ball and chinese chess!
> my blog is http://oldboy.blog.51cto.com
> our site is http://www.etiantian.org
> my qq num is 49000448.
>
> not 4900000448.
> my god ,i am not oldbey,but OLDBOY!
> EOF
[root@centos6 data]#
[root@centos6 data]# cat oldboy.log
I am oldboy teacher!
I teach Linux.

I like badminton ball,billiard ball and chinese chess!
my blog is http://oldboy.blog.51cto.com
our site is http://www.etiantian.org
my qq num is 49000448.

not 4900000448.
my god ,i am not oldbey,but OLDBOY!
gd
good
goood

过滤以m开头的行:

[root@centos6 data]# grep "^m" oldboy.log
my blog is http://oldboy.blog.51cto.com
my qq num is 49000448.
my god ,i am not oldbey,but OLDBOY!

过滤以m结尾的行:

[root@centos6 data]# grep "m$" oldboy.log 
my blog is http://oldboy.blog.51cto.com

过滤空行:

[root@centos6 data]# grep -n "^$" oldboy.log
3:
8:
[root@centos6 data]# cat -n oldboy.log 
     1  I am oldboy teacher!
     2  I teach Linux.
     3
     4  I like badminton ball,billiard ball and chinese chess!
     5  my blog is http://oldboy.blog.51cto.com
     6  our site is http://www.etiantian.org
     7  my qq num is 49000448.
     8
     9  not 4900000448.
    10  my god ,i am not oldbey,but OLDBOY!

过滤空行:-v是取反

[root@centos6 data]# grep -vn "^$" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!

因为.代表任意一个字符,过滤.就过滤所有的内容了,但空行没有

[root@centos6 data]# grep -n "." oldboy.log 
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad

过滤所有的行,包括空行

[root@centos6 data]# grep -n ".*" oldboy.log 
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad

过滤"oldb.y" .代表任意一个字符

[root@centos6 data]# grep -n "oldb.y" oldboy.log 
1:I am oldboy teacher!
5:my blog is http://oldboy.blog.51cto.com
10:my god ,i am not oldbey,but OLDBOY!

过滤"oldb.y" .代表任意一个字符; -i 参数:表示不区分大小写

[root@centos6 data]# grep -in "oldb.y" oldboy.log   
1:I am oldboy teacher!
5:my blog is http://oldboy.blog.51cto.com
10:my god ,i am not oldbey,but OLDBOY!

过滤结尾的内容".$" "\.$"

[root@centos6 data]# grep -n ".$" oldboy.log 
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
[root@centos6 data]# grep -n "\.$" oldboy.log 
2:I teach Linux.
7:my qq num is 49000448.
9:not 4900000448.

重复前面一个或多个字符 "0*"

[root@centos6 data]# grep -n "0*" oldboy.log 
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad

过滤包含abc中任意一个字符的行

[root@centos6 data]# grep -n "[abc]" oldboy.log       
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
10:my god ,i am not oldbey,but OLDBOY!
13:glad

过滤不包含abc中任意一个字符的行

[root@centos6 data]# grep -n "[^abc]" oldboy.log 
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad

过滤数字

[root@centos6 data]# grep -n "[0-9]" oldboy.log 
5:my blog is http://oldboy.blog.51cto.com
7:my qq num is 49000448.
9:not 4900000448.

过滤重复0的次数

[root@centos6 data]# grep -n "0\{3\}" oldboy.log 
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{5\}" oldboy.log  
9:not 4900000448.
[root@centos6 data]# grep -n "0\{3,\}" oldboy.log  
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{3,5\}" oldboy.log 
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{,3\}" oldboy.log   
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad

三剑客grep总结:

grep一般常用参数:

  • a --在二进制文件中,以文本文件的方式搜索数据
  • c --计算找到‘搜索字符串’的次数
  • o --仅显示出匹配regrexp的内容(用于统计出现在文中的次数)
  • i --忽略大小写的不同,所以大小写视为相同*****
  • n --匹配的内容在其行首显示行号*****
  • v --反向选择,即显示没有‘搜索字符串’内容的那一行*****
  • E --扩展的grep,即egrep*****
  • --color=auto 以特定颜色高亮显示匹配关键字***
  • A after的意思,显示匹配字符串及其后n行的数据
  • B before的意思,显示匹配字符串及其前n行的数据
  • C context的意思,显示匹配字符串及其前后各num行