正则表达式又称正规表达式、常规表达式。在代码中常简写为 regex、regexp 或 RE。正则表达式是使用单个字符串来描述、匹配一系列符合某个句法规则的字符串,简单来说, 是一种匹配字符串的方法,通过一些特殊符号,实现快速查找、删除、替换某个特定字符串。
正则表达式是由普通字符与元字符组成的文字模式。模式用于描述在搜索文本时要匹 配的一个或多个字符串。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进 行匹配。其中普通字符包括大小写字母、数字、标点符号及一些其他符号,元字符则是指 那些在正则表达式中具有特殊意义的专用字符,可以用来规定其前导字符(即位于元字符 前面的字符)在目标对象中的出现模式。
正则表达式一般用于脚本编程与文本编辑器中。很多文本处理器与程序设计语言均支持正则表达式,例如 Linux 系统中常见的文本处理器(grep、egrep、sed、awk)以及应用比较广泛的 Python 语言。正则表达式具备很强大的文本匹配功能,能够在文本海洋中快速高效地处理文本。
正则表达式的字符串表达方法根据不同的严谨程度与功能分为基本正则表达式与扩展正则表达式。基础正则表达式是常用正则表达式最基础的部分。在 Linux 系统中常见的文件处理工具中 grep 与 sed 支持基础正则表达式,而 egrep 与 awk 支持扩展正则表达式。掌握基础正则表达式的使用方法,首先必须了解基本正则表达式所包含元字符的含义,下面通过grep 命令以举例的方式逐个介绍。
[root@localhost ~]# cat grep.txt
he was short and fat.
He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night! Misfortunes never come alone/single.
MisfortunI shouldn’t have lett so tast.
es never come alone/single.
11
wd
● \ :转意字符,\!\n等
● ^ :匹配字符串开始的位置
比如 ^a ^the ^#
●$ :匹配字符串结束的位置
比如 the$ a$
● . :匹配除\n \r 之外的任意的一个字符
比如 go.d g..d
● * :匹配前面子表达式0次或者多次
比如 goo*d go*d
● [list] :匹配[ ]中的任意一个字符
比如 go[abcdef] 可以匹配goa gob goc god .....
● [^list]: 匹配任意不在[ ]中的一个字符
比如 go[^abcdef] 可以匹配 gog gox goz gow goo ...
● {n,m}:匹配前面的子表达式n到m次!有{} {n,} {n,m}三种模式
比如 go\{2\}d go\{2,3\}d go\{2,\}d 这个我拉到后面讲
●-n :显示行号
●-i :不区分大小写
●-v :反向过滤
●-o :精确匹配字符
# 为了看着舒服 下面的 .... 一律代表省略的内容
# 过滤出有the的行 并且显示行号
[root@localhost opt]# grep -n 'the' grep.txt
3:the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
4:The year ahead will test our political establishment
to the limit.
# 过滤出有The的行 并且显示行号
[root@localhost opt]# grep -n 'The' grep.txt
2:He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
4:The year ahead will test our political establishment
to the limit.
# 过滤出没有the的行 并且显示行号
[root@localhost opt]# grep -vn 'the' grep.txt
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. The ...
5:PI=3.141592653589793238462643383249901429
6:a wood cross!
7:Actions speak louder than words
8:#woood #
9:#woooooood #
10:AxyzxyzxyzxyzC
11:I bet this place is really spooky late at night! Misfortunes...
12:MisfortunI shouldn't have lett so tast.
13:es never come alone/single.
14:11
15:wd
# 过滤出有the的行,不区分大小写 并且显示行号
[root@localhost opt]# grep -in 'the' grep.txt
2:He was wearing a blue polo shirt with black pants. The ....
3:the tongue is boneless but it breaks bones.12! google is ....
4:The year ahead will test our political establishment to the limit.
#过滤出包含shi 或 sho 字符串的行
[root@localhost opt]# grep -n 'sh[io]' grep.txt
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. The hom
12:MisfortunI shouldn't have lett so tast.
#意思同上一个一样
[root@localhost opt]# grep -n 'sh[io]rt' grep.txt
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. .....
# 过滤出有连续两个oo,且不是和w组合的字符串,但如果是woood则判断为ooo连续
# 所以也会显示出来
[root@localhost opt]# grep -n '[^w]oo' grep.txt
2:He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online.
3:the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
8:#woood #
9:#woooooood #
# 过滤出包含两个 o 或 两个o 以上的行 显示行号 o*代表0个或多个o
[root@localhost opt]# grep -n 'ooo*' grep.txt
2:He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online.
3:the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
6:a wood cross!
8:#woood #
9:#woooooood #
11:I bet this place is really spooky late at night! Misfortunes never come alone/single.
#过滤出以w开头d结尾,中间有0个或多个o的行 显示行号
[root@localhost opt]# grep -n 'wo*d' grep.txt
6:a wood cross!
8:#woood #
9:#woooooood #
15:wd
#过滤出以w开头d结尾 中间有0个或多个任意字符的行 显示行号
[root@localhost opt]# grep -n 'w.*d' grep.txt
1:he was short and fat.
3:the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
6:a wood cross!
7:Actions speak louder than words
8:#woood #
9:#woooooood #
15:wd
#过滤出有两个o字符的行
[root@localhost opt]# grep -n 'o\{2\}' grep.txt
2:He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online.
3:the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
6:a wood cross!
8:#woood #
9:#woooooood #
11:I bet this place is really spooky late at night! Misfortunes never come alone/single.
#过滤出w开头d结尾,中间有5-8个o字符的行
[root@localhost opt]# grep -n 'wo\{5,8\}d' grep.txt
9:#woooooood #
#过滤出w开头d结尾,中间有3个或3个以上的o的行
[root@localhost opt]# grep -n 'wo\{3,\}d' grep.txt
8:#woood #
9:#woooooood #