正则表达式
容易混淆的两个注意事项:
-
正则表达式应用非常广泛,存在于各种语言中,例如:php,python,java等。但是我们今天讲的是Linux系统运维工作中的正则表达式,即Linux正则表达式,最常应用正则表达式的命令就是grep(egrep)、sed、awk,换句话说Linux三剑客要能工作的更高效,那一定离不开正则表达式配合。
- 正则表达式和我们常用的通配符特殊字符是有本质区别的。
注意事项:
- Linux正则表达式一般是以行为单位处理的
-
alias grep='grep --color=auto',讲课以grep为例
- 注意字符集 export LC_ALL=C
基础正则:
^word
匹配以word开头的内容,vi/vim 编辑器里^代表一行的开头word$
匹配以word结尾的内容,vi/vim 编辑器里$代表一行的开头^$
表示空行.
代表且只能代表任意一个字符\
例如\.
就只代表点本身,转义符号,让有着特殊身份意义的字符,脱掉马甲*
表示重复0个或多个前面的一个字符,例如:o*
匹配没有o 或者1个o或者多个oooo.*
匹配所有字符 延伸^.*
以任意多个字符开关.*$
以任意多个字符结尾[abc]
匹配字符集合内的任意一个字符[a-zA-z],[0-9][^abc]
匹配不包含^后的任意一个字符的内容 注意:中括号里的^为取反,注意和中括号以外是以什么什么开头区别a\{n,m\}'
重复n到m次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线a\{n,\}'
重复至少n次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线a\{n\}
重复n次,前一个重复的字符 如果用egrep或sed -r 可以去掉斜线
测试环境
[root@centos6 data]# cat >>oldboy.log< I am oldboy teacher!
> I teach Linux.
>
> I like badminton ball,billiard ball and chinese chess!
> my blog is http://oldboy.blog.51cto.com
> our site is http://www.etiantian.org
> my qq num is 49000448.
>
> not 4900000448.
> my god ,i am not oldbey,but OLDBOY!
> EOF
[root@centos6 data]#
[root@centos6 data]# cat oldboy.log
I am oldboy teacher!
I teach Linux.
I like badminton ball,billiard ball and chinese chess!
my blog is http://oldboy.blog.51cto.com
our site is http://www.etiantian.org
my qq num is 49000448.
not 4900000448.
my god ,i am not oldbey,but OLDBOY!
gd
good
goood
过滤以m开头的行:
[root@centos6 data]# grep "^m" oldboy.log
my blog is http://oldboy.blog.51cto.com
my qq num is 49000448.
my god ,i am not oldbey,but OLDBOY!
过滤以m结尾的行:
[root@centos6 data]# grep "m$" oldboy.log
my blog is http://oldboy.blog.51cto.com
过滤空行:
[root@centos6 data]# grep -n "^$" oldboy.log
3:
8:
[root@centos6 data]# cat -n oldboy.log
1 I am oldboy teacher!
2 I teach Linux.
3
4 I like badminton ball,billiard ball and chinese chess!
5 my blog is http://oldboy.blog.51cto.com
6 our site is http://www.etiantian.org
7 my qq num is 49000448.
8
9 not 4900000448.
10 my god ,i am not oldbey,but OLDBOY!
过滤空行:-v是取反
[root@centos6 data]# grep -vn "^$" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
因为.
代表任意一个字符,过滤.
就过滤所有的内容了,但空行没有
[root@centos6 data]# grep -n "." oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
过滤所有的行,包括空行
[root@centos6 data]# grep -n ".*" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
过滤"oldb.y"
.
代表任意一个字符
[root@centos6 data]# grep -n "oldb.y" oldboy.log
1:I am oldboy teacher!
5:my blog is http://oldboy.blog.51cto.com
10:my god ,i am not oldbey,but OLDBOY!
过滤"oldb.y"
.
代表任意一个字符; -i 参数:表示不区分大小写
[root@centos6 data]# grep -in "oldb.y" oldboy.log
1:I am oldboy teacher!
5:my blog is http://oldboy.blog.51cto.com
10:my god ,i am not oldbey,but OLDBOY!
过滤结尾的内容".$"
"\.$"
[root@centos6 data]# grep -n ".$" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
[root@centos6 data]# grep -n "\.$" oldboy.log
2:I teach Linux.
7:my qq num is 49000448.
9:not 4900000448.
重复前面一个或多个字符 "0*"
[root@centos6 data]# grep -n "0*" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
过滤包含abc
中任意一个字符的行
[root@centos6 data]# grep -n "[abc]" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
10:my god ,i am not oldbey,but OLDBOY!
13:glad
过滤不包含abc
中任意一个字符的行
[root@centos6 data]# grep -n "[^abc]" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
过滤数字
[root@centos6 data]# grep -n "[0-9]" oldboy.log
5:my blog is http://oldboy.blog.51cto.com
7:my qq num is 49000448.
9:not 4900000448.
过滤重复0的次数
[root@centos6 data]# grep -n "0\{3\}" oldboy.log
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{5\}" oldboy.log
9:not 4900000448.
[root@centos6 data]# grep -n "0\{3,\}" oldboy.log
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{3,5\}" oldboy.log
7:my qq num is 49000448.
9:not 4900000448.
[root@centos6 data]# grep -n "0\{,3\}" oldboy.log
1:I am oldboy teacher!
2:I teach Linux.
3:
4:I like badminton ball,billiard ball and chinese chess!
5:my blog is http://oldboy.blog.51cto.com
6:our site is http://www.etiantian.org
7:my qq num is 49000448.
8:
9:not 4900000448.
10:my god ,i am not oldbey,but OLDBOY!
11:gd
12:good
13:glad
三剑客grep总结:
grep一般常用参数:
- a --在二进制文件中,以文本文件的方式搜索数据
- c --计算找到‘搜索字符串’的次数
- o --仅显示出匹配regrexp的内容(用于统计出现在文中的次数)
- i --忽略大小写的不同,所以大小写视为相同
*****
- n --匹配的内容在其行首显示行号
*****
- v --反向选择,即显示没有‘搜索字符串’内容的那一行
*****
- E --扩展的grep,即egrep
*****
--color=auto
以特定颜色高亮显示匹配关键字***
- A after的意思,显示匹配字符串及其后n行的数据
- B before的意思,显示匹配字符串及其前n行的数据
- C context的意思,显示匹配字符串及其前后各num行