博文目录
一、正则表达式的定义
二、扩展正则表达式元字符
三、文本处理器

一、正则表达式的定义

正则表达式又称正规表达式、常规表达式。在代码中常简写为regex、regexp或RE。正则表达式是使用单个字符串来描述,匹配一系列符合某个句法规则的字符串,简单来说,是一种匹配字符串的方法,通过一些特殊符号,实现快速查找、删除、替换某个特定字符串。
正则表达式是由普通字符与元字符组成的文字模式。模式用于描述在搜索文本时要匹配的一个或多个字符串。正则表达式作为一个模板,将某个字符模式与所搜索的字符串进行匹配。其中普通字符包括大小写字母、数字、标点符号及一些其他符号,元字符则是指那些在正则表达式中具有特殊意义的专用字符,可以用来规定其前导字符(即位于元字符前面的字符)在目标对象中的出现模式。

1、基础正则表达式

正则表达式的字符串表达方法根据不同的严谨程度与功能分为基本正则表达式与扩展正则表达式。基础正则表达式是常用的正则表达式的最基础的部分。在Linux系统中常见的文件处理工具中grep与sed支持基础正则表达式,而egrep与awk支持扩展正则表达式。

提前准备一个名为test.txt的测试文件,文件具体内容如下:

[root@centos01 ~]# vim test.txt
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.14148223023840-2382924893980--2383892948
a wood cross!
Actions speak louder than words

#wooood #
#woooood #
AxyzxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

1)基础正则表达式示例:

[root@centos01 ~]# grep -n 'the' test.txt         
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -in 'the' test.txt    
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@centos01 ~]# grep -vn 'the' test.txt    
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
7:PI=3.14148223023840-2382924893980--2383892948
8:a wood cross!
9:Actions speak louder than words
10:
11:
12:#wooood #
13:#woooood #
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.

2)grep利用中括号“[]”来查找集合字符

[root@centos01 ~]# grep -n 'sh[io]rt' test.txt      
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
[root@centos01 ~]# grep -n 'oo' test.txt     
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^w]oo' test.txt   
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
12:#wooood #
13:#woooood #
15:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n '[^a-z]oo' test.txt        
3:The home of Football on BBC Sport online.
[root@centos01 ~]# grep -n '[0-9]' test.txt        
4:the tongue is boneless but it breaks bones.12!
7:PI=3.14148223023840-2382924893980--2383892948

3)grep查找行首“^”与行尾字符“$”

[root@centos01 ~]# grep -n '^the' test.txt      
4:the tongue is boneless but it breaks bones.12!
[root@centos01 ~]# grep -n '^[a-z]' test.txt      
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
8:a wood cross!
[root@centos01 ~]# grep -n '^[A-Z]' test.txt       
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:The year ahead will test our political establishment to the limit.
7:PI=3.14148223023840-2382924893980--2383892948
9:Actions speak louder than words
14:AxyzxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
[root@centos01 ~]# grep -n '^[^a-zA-Z]' test.txt    
12:#wooood #
13:#woooood #
[root@centos01 ~]# grep -n 'w..d' test.txt      
5:google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
[root@centos01 ~]# grep -n 'ooo*' test.txt     
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!
[root@centos01 ~]# grep -n 'woo*d' test.txt      
8:a wood cross!
11:#woood #
13:#woooooood #
[root@centos01 ~]# grep -n '[0-9][0-9]*' test.txt   
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429
[root@centos01 ~]# grep -n 'o\{2\}' test.txt       
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
11:#woood #
13:#woooooood #
19:I bet this place is really spooky late at night!

2、元字符总结

Shell脚本中的正则表达式_第1张图片

二、扩展正则表达式元字符

Shell脚本中的正则表达式_第2张图片

三、文本处理器

在Linux/UNIX系统中包含很多种文本处理器或文本编辑器,其中包括VIM编辑器与grep等。而grep,sed,awk更是shell编程中经常用到的文本处理工具,被称为shell编程三剑客。

1、sed工具

sed(Stream EDitor)是一个强大而简单的文本解析转换工具,可以读取文本,并根据指定的条件对文本内容进行编辑(删除、
替换、添加、移动等),最后输出所有行或者仅输出处理的某些行。sed也可以在无交互的情况下实现相当复杂的文本处理操作,被广泛应用于shell脚本中,用以完成各种自动化处理任务。

sed的工作流程主要包括读取、执行和显示三个过程:

  • 读取:sed从输入流(文件、管道、标准输入)中读取一行内容并存储到临时的缓冲区中(又称模式空间,patterm space)。
  • 执行:默认情况下,所有的sed命令都在模式空间中顺序地执行,除非指定了行的地址,否则sed命令将会在所有的行上依次执行。
  • 显示:发送修改后的内容到输出流。再发送数据后,模式空间将会被清空。在所有的文件内容都被处理完成之前,上述过程将重复执行,直到所有内容被处理完。

2、sed命令常见的用法

sed[选项] '操作'  参数
sed [选项] -f scriptfile 参数

常见的sed命令选项主要包含以下几种:

  • -e或--expression=:表示用指定命令或者脚本来处理输入的文本文件。
  • -f或--file=:表示用指定的脚本文件来处理输入的文本文件。
  • -h或--help:显示帮助。
  • -n、--quiet或silent:表示仅显示处理后的结果。
  • -i:直接编辑文本文件。
    “操作”用于指定对文件操作的动作行为,也就是sed的命令。通常情况下是采用的“[n1[,n2]]”操作参数的格式。n1、n2是可选的,不一定会存在,代表选择进行操作的行数,如操作需要在5~20行之间进行,则表示为“5,20动作行为”。常见的操作包括以下几种:
  • a:增加,在当前行下面增加一行指定内容。
  • c:替换,将选定行替换为指定内容。
  • d:删除,删除选定的行。
  • i:插入,在选定行上面插入一行指定内容。
  • p:打印,如果同时指定行,表示打印指定行;如果不指定行,则表示打印所有内容;如果有非打印字符,则以ASCII码输出。其通常与“-n”选项一起使用。
  • s:替换,替换指定字符。
  • y:字符转换。

3、用法示例

1)输出符号条件的文本(p表示正常输出)

[root@centos01 ~]# sed -n '3p' test.txt        
The home of Football on BBC Sport online.
[root@centos01 ~]# sed -n '3,5p' test.txt  
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n 'p;n' test.txt       
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n 'p;n' test.txt     
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# sed -n '1,5{p;n}' test.txt 
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.

[root@centos01 ~]# sed -n '10,${n;p}' test.txt       
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.

2)Sed命令结合正则表达式

[root@centos01 ~]# sed -n '/the/p' test.txt 
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
[root@centos01 ~]# sed -n '4,/the/p' test.txt
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
[root@centos01 ~]# sed -n '/the/=' test.txt       
4
5
6
[root@centos01 ~]# sed -n '/^PI/p' test.txt 
PI=3.141592653589793238462643383249901429
[root@centos01 ~]# sed -n '/\/p' test.txt  
a wood cross!

3)删除符合条件的文件(d)

[root@centos01 ~]# nl test.txt | sed '3d'    
     1  he was short and fat.
     2  He was wearing a blue polo shirt with black pants.
     4  the tongue is boneless but it breaks bones.12!
     5  google is the best tools for search keyword.
     6  The year ahead will test our political establishment to the limit.
     7  PI=3.141592653589793238462643383249901429
     8  a wood cross!
     9  Actions speak louder than words
    10  
    11  #woood #
    12  
    13  #woooooood #
    14  
    15  
    16  AxyzxyzxyzxyzC
    17  
    18  
    19  I bet this place is really spooky late at night!
    20  Misfortunes never come alone/single.
    21  I shouldn't have lett so tast.
[root@centos01 ~]# nl test.txt | sed '3,5d'   
     1  he was short and fat.
     2  He was wearing a blue polo shirt with black pants.
     6  The year ahead will test our political establishment to the limit.
     7  PI=3.141592653589793238462643383249901429
     8  a wood cross!
     9  Actions speak louder than words
    10  
    11  #woood #
    12  
    13  #woooooood #
    14  
    15  
    16  AxyzxyzxyzxyzC
    17  
    18  
    19  I bet this place is really spooky late at night!
    20  Misfortunes never come alone/single.
    21  I shouldn't have lett so tast.
[root@centos01 ~]# sed '/^[a-z]/d' test.txt 
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

4)替换符合条件的文本

[root@centos01 ~]# sed 's/the/THE/' test.txt 
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
THE tongue is boneless but it breaks bones.12!
google is THE best tools for search keyword.
The year ahead will test our political establishment to THE limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# sed 's/l/L/2' test.txt 
he was short and fat.
He was wearing a blue poLo shirt with black pants.
The home of FootbalL on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tooLs for search keyword.
The year ahead wilL test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is reaLly spooky late at night!
Misfortunes never come alone/singLe.
I shouldn't have Lett so tast.
[root@centos01 ~]# sed 's/^/#/' test.txt  
#he was short and fat.
#He was wearing a blue polo shirt with black pants.
#The home of Football on BBC Sport online.
#the tongue is boneless but it breaks bones.12!
#google is the best tools for search keyword.
#The year ahead will test our political establishment to the limit.
#PI=3.141592653589793238462643383249901429
#a wood cross!
#Actions speak louder than words
#
##woood #
#
##woooooood #
#
#
#AxyzxyzxyzxyzC
#
#
#I bet this place is really spooky late at night!
#Misfortunes never come alone/single.
#I shouldn't have lett so tast.
[root@centos01 ~]# sed '/the/s/o/0/g' test.txt  
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the t0ngue is b0neless but it breaks b0nes.12!
g00gle is the best t00ls f0r search keyw0rd.
The year ahead will test 0ur p0litical establishment t0 the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

3、awk工具

在Linux/UNIX系统中,awk是一个功能强大的编辑工具,逐行读取输入文本,并根据指定的匹配模式进行查找,对符合条件的内容进行格式化输出或者过滤处理,可以在无交互的情况下实现相当复杂的文本操作,被广泛应用于Shell脚本,完成各种自动化配置任务。

1)awk常见用法

通常情况下awk所使用的命令格式如下所示,其中,单引号加上大括号“{}”用于设置对数据进行的处理动作。awk可以直接处理目标文件也可以通过“-f”读取脚本对目标文件进行处理。

awk 选项  '模式或条件 {编辑指令}' 文件1 文件2 ......
awk -f 脚本文件 文件1 文件2 ...

awk包含几个特殊的内建变量(可直接用)如下所示:

  • NF:当前处理的行的字段个数。
  • FS:指定每行文本的字段分隔符,默认为空格或制表位。
  • NR:当前处理的行的字段个数。
  • $0:当前处理的行的整行内容。
  • FILENAME:被处理的文件名。
  • RS:数据记录分隔,默认为\n,即每行为一条记录。

2)用法示例

[root@centos01 ~]# awk '{print}' test.txt  
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words

#woood #

#woooooood #

AxyzxyzxyzxyzC

I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@centos01 ~]# awk 'NR==1,NR==3{print}' test.txt 
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
[root@centos01 ~]# awk '(NR%2)==1{print}' test.txt   
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.
PI=3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#woooooood #

I bet this place is really spooky late at night!
I shouldn't have lett so tast.
[root@centos01 ~]# awk '(NR%2)==0{print}' test.txt   
He was wearing a blue polo shirt with black pants.
the tongue is boneless but it breaks bones.12!
The year ahead will test our political establishment to the limit.
a wood cross!

AxyzxyzxyzxyzC

Misfortunes never come alone/single.
[root@centos01 ~]# awk '/^root/{print}' /etc/passwd  
root:x:0:0:root:/root:/bin/bash
[root@centos01 ~]# awk '{print $1 $3}' test.txt 
heshort
Hewearing
Theof
theis
googlethe
Theahead
PI=3.141592653589793238462643383249901429
across!
Actionslouder

#woood

#woooooood

AxyzxyzxyzxyzC

Ithis
Misfortunescome
Ihave

—————— 本文至此结束,感谢阅读 ——————