模式是指在字符串中寻找特定序列的字符或者说是一个 匹配(或不匹配)某字符串的模板,模式由反斜线包含。而一个** 正则表达式**,就是 用某种模式去匹配一类字符串的一个公式。
要进行复杂的模式,就需要借助于元字符和量词。
字符 | 作用 |
---|---|
. | 匹配任何单个字符(换行符\n除外) |
\ | 转义字符,用于特殊字符前,使其失去特殊字符的作用变成普通字符 |
+ | 匹配该字符前而的字符至少一次 |
* | 匹配该字符前面的字符任意多次(包括0次) |
? | 匹配该字符前而的字符0次或者1次 |
.* | 匹配任意字符(换行符除外)任意多次 |
{count} | 匹配前而的字符count次 |
{min} | 匹配前面的字符至少min次 |
{min,max} | 匹配前而的字符至少min次,但最多不超过max次 |
*? | 匹配前而的字符0次或更多次 |
+? | 匹配前而的字符一次或者更多次 |
{min,}? | 匹配前扁的字符0次或者至少min次 |
{min,max}? | 匹配前而的字符至少min次,但不超过max次 |
() | 模式分组字符.eg:/(perl)+/会匹配"perlperl" -------1)借助模式分组,可以使用”\n”形式的方向引用来引用园括号中匹配的内容,n和括号的组号匹配;2)Perl5.10可以使用\g{n}这种形式的反向引用,n可以为负数 eg:$_="abba”;print "Matched” if(/(.)\1/); |
\ | 或,即’1"符号的左边匹配或右边匹配都可以 eg:/ perl丨Perl | PERL/ |
字符集是指一串可能出现的字符集合。通过写在方括号“[ ]”内来表示。只用来匹配单个字,但可以是方括号中列出的字符中的任意一个。
字符集的简写:
简写 | 等价字符集 | 含义 |
---|---|---|
\d | [0-9] | 匹配任意数字: |
\w | [A-Za-z0-9_] | 匹配任意字母数字下划线 |
\s | [\f\t\n\r] | 匹配空白 (\f:换页符,\t:制表符,\n:换行符,\r:回车) |
\h | [\t] | 匹配横向空白 |
\v | [\f\n\r] | 匹配纵向空格 |
\R | 匹配任何类型的断行 | |
\d | \D | [^\d] |
\w | \W | [^\w] |
\s | \S | [^\s] |
下面会使用到的一些字符集:
[\dA-Fa-f] :匹配16进制数;
[\d\D]:匹配所有,包括数字,和数字意外部分;
[^\d\D]:匹配除数字外的所有空集除外;
使用双斜线(//)来进行匹配是m//(模式匹配)操作符的简写。
模式匹配操作符中可以使用任何成对的定界符
m(fred),m ,m{fred},m[fred]
/http:VV/
m%http://%
当使用斜线作为定界符时,开头的m可以省略
/-?\d+\.?\d*/
/-? \d+ \.? \d* /x
#!/usr/bin/perl -w
use strict;
#-----------------------------------------------------
# The example used to show the usage of option mdefiers
# to do the expression matching
#
print "\n Would you like to play a game?\n\n";
chomp ($_ = );
if (/yes/i) {
print "In that case, I recommend that you go bowling.\n\n";
}
$_ = "I saw Tom\ndown at the bowling alley\nwith Fred\nlast night.\n";
if (/Tom.*Fred/s) {
print "Matched successfully when use \"\\S\" option \n\n";
}
if (/tom.*fred/si) {
print "That string mentions Fred after Barney! \n\n";
}
默认情况下,模式匹配的过程开始于待匹配字符串的开头,如果不相符就一直往字符串后面浮动,看其他位置能否匹配但是加人一些锚位,可以让模式直接匹配字符串的某处。
锚位 | 作用 |
---|---|
^ | 标志字符串的开头 |
$ | 标志字符串的结尾 |
/b | 单词(\w+)锚位。匹配任何单词的收尾 |
/B | 非单词锚位。能匹配所有\b不能匹配的位置 |
例子:
#!/user/bin/perl -w
use strict;
# ----------------------------------------------
# THe example used to show the useage of anchros
# in Perl
$_ = "it is just a test for the useage of anchros in Perl.just a test";
if (/^just/){
print "\n Match the \"just\"in the head successfully!\n\n";
}
else {
print "\n Failed to match the \"just\"in the head!\n\n";
}
if (/^it.*test$/){
print "\n Matched a string starts with \"it\"and end with\"test\"\n\n";
}
$_ = "called fred and so ...";
if (/\bfred\b/) {
print "\n[1] Matched \"fred\" word successfully!\n\n";
}
$_ = "frederick";
if (/\bfred\b/) {
print "\n[2] Matched \"fred\" word successfully!\n\n";
}
else {
print "\n[2] Failed to match \"fred\" word!\n\n";
}
默认情况下,模式匹配的对象是$_。绑定操作符=~用
于告诉Perl拿右边的模式来匹配左边的字符串,而不是匹配$_变量中的字符串。
例子:
#!/usr/bin/perl -w
use strict;
#----------------------------------------------
# The example used to show how to use =~ operator
# in perl languge to do expression matching
$_ = "just for test";
print "\n The value for the default variable \"\$_\" is \"$_\" \n";
if (/^just.*TEST$/i) {
print "\n[1] Matched a string starts with \"just\" and end with \"test\"\n\n";
}
else {
print "\n[1] Failed to match a string starts with \"just\" and end with \"test\"\n\n";
}
my $val = "just test it";
print "the specified variable val = \"$val\" \n";
if ($val =~ /^just.*it$/) {
print "\n[1] Matched a string starts with \"just\" and end with \"it\"\n\n";
}
else {
print "\n[1] Failed to match a string starts with \"just\" and end with \"it\"\n\n";
}
在Perl正则表达式中可以进行双引号形式的变量内插。
例子:
#!/usr/bin/perl -w
use strict;
my $val = "just";
my $val2 = "This is just a test example";
if ($val2 =~ /($val)/) {
print "\n \"$val2\" matched with \"$val\"!\n\n";
}
else {
print "\n \"$val2\" failed to match with \"$val\"!\n\n";
}
在模式中使用圆括号,可以启动正则表达式处理引擎的捕获功能(圆括号也可以用来对模式进行分组)。捕获功能指的是,把(圆括号中模式所匹配的)部分字符串暂时记忆下来的能力。如果有多对圆括号,就是有多个捕获。每个被捕获的对象是原本的字符串,不是模式。
捕获变量
捕获变量都是标量变量,它们的名字依次是$1,$2,…,$n模式里的园括号有多少个,匹配变量就有多少个。
捕获变量的生命周期
。捕获变量通常能存活到下次成功的模式匹配为止,即失败的匹配不会改动上次工程匹配时捕获的内容,而成功的匹配会将它们配置。
。捕获变量只应该在匹配成功时使用,否则为之前一次模式匹配的内容。
。如果需要在数行之外使用捕获变量,最好先将捕获变量的值复制到一个一般的变量里。
例子
#!/usr/bin/perl -w
use strict;
#--------------------------------------------
# The example used to show how to use the
# "match the variables"
#
my $val = "hello there, neighbor";
my $val2;
if ($val =~ /(\w+)\s*(\w+),\s(\w+)/) {
print "\n v1 = $1, vs = $2 , v3 = $3 \n\n";
$val2 = $2;
}
$val = "I fear that I will be extinct after 1000 years.";
if ( $val =~ /(\d+)\s*(\w+)/ ) {
print "[2] v1 = $1 , v2 = $2 \n\n";
}
print "\n After some operations \n\n";
print "The obtained value is \"$val2\" \n";
my $val = "This is just a simple example";
if ($val =~ /^this\s+is\s+(?:just)\s+(.*)$/i) {
print "\nThe matched value is \"$1\"\n\n";
}
管理$1,$2等这些数字型的捕获变量是比较困难的,尤其对于比较复杂的正则表达式而言。
为了不必记忆$1等这些数字型捕获变量的含义,Perl5.10引入了正则表达式命名捕获的概念。
在命名捕获中,捕获的结果会存入一个特殊的哈希%+中,其中的键就是在捕获时使用的特殊标签,其中的值则是被捕获的串。为捕获串加标签的方法是使用(?PATTERN)这样的写法,此时使用捕获串时需要访问的位置变成了$+{LABEL},在使用了命名捕获后,可以使用\g(LABEL)或\k< LABEL> 这样的形式来使用反向引用。
Note:使用捕获标签后,就可以随意移动位置并加入更多的捕获括号,不会因为括号的次序变化导致麻烦。
例子:
#!/usr/bin/perl -w
use strict;
#-----------------------------------------------------
# The example used to show the "Name Capture" in perl
#
use 5.010;
my $name = 'Fred or Barney';
if ($name =~ m/(?\w+) (?:and|or) (?\w+)/ ) {
print "\nI saw $+{name1} and $+{name2} \n\n";
}
$name = 'Fred Flinstone and wilam Flinstone';
if ($name =~ m/(?\w+) and \w+ \g{last_name}/ ) {
print "\nI saw $+{last_name} \n\n";
}
if ("Hello there, neighbor" =~ /\s(\w+),/ ) {
print "\nThat was ($`) ($&) ($') .\n\n";
}
1、用 “s/// ”替换:
$_ = "He's out bowling with Barney tonight";
print "\n[1] Before replacement and val = \"$_\"\n";
s/Barney/Fred/;
print "\n[1] After replacement and val = \"$_\"\n";
s/TOnight/this afternoon/;
print "\n[2] After replacement and val = \"$_\"\n";
$_ = "home, sweet home!";
print "\n[3] Before replacement and val = \"$_\"\n";
s/home/cave/;
print "\n[3] After replacement and val = \"$_\"\n";
$_ = "home, sweet home!";
print "\n[4] Before replacement and val = \"$_\"\n";
s/home/cave/g;
print "\n[4] After replacement and val = \"$_\"\n";
$_ = " home, sweet home! ";
print "\n[5] Before replacement and val = \"$_\"\n";
s/^\s*|\s*$//g;
print "\n[5] After replacement and val = \"$_\"\n";
s#^https://#http://#
;s{fred}{barney}; s[fred](barney); s#barney#;
openFILE,$filename or die "Can't open '$filename':$";
my $lines = join", ;
$lines =~ s/^/\$filename:/gm;
绑定操作符
模式匹配时使用的绑定操作符=~同样适用于替换操作
大小写转换:
操作符 | 作用 |
---|---|
\U | 该转义字符会将其后的所有字符转换成大写 |
\L | 该转义字符将其后的所有字符转换成小写 |
\E | 该转义字符用于结束\U和\L对大小写转换的影响 |
\u,\l | 也进行大小写转换,但只会影响其后的第一个字符 |
注意:\U,\L,\u,\l同样适用于任何双引号内的字符串。
#!/usr/bin/perl -w
use strict;
#-----------------------------------------------------
# The example used to show the following items in
# perl:
# 1) Different Delimiters for repalcement
# 2) Option modefiers for replacement
# 3) The Binding operaor for replacement
# 4) Case shifting operation
#
# 1) Different Delimiters for repalcement
$_ = "today is 12th Jan. 2014";
print "\n[1] Before replacement and val = \"$_\"\n";
s/today/tommorw/;
print "\n[1] After replacement and val = \"$_\"\n";
$_ = "today is 12th Jan. 2014";
print "\n[2] Before replacement and val = \"$_\"\n";
s#today#tommorw#;
print "\n[2] After replacement and val = \"$_\"\n";
$_ = "today is 12th Jan. 2014";
print "\n[3] Before replacement and val = \"$_\"\n";
s[today]{tommorw};
print "\n[3] After replacement and val = \"$_\"\n";
# 2) Option modefiers for replacement
$_ = "TODAY is 12th Jan. 2014";
print "\n[4] Before replacement and val = \"$_\"\n";
s/today/tommorw/;
print "\n[4] After replacement and val = \"$_\"\n";
s/today/tommorw/i;
print "\n[5] After replacement and val = \"$_\"\n";
$_ = "The following is the final result:\n- status:pass \n- __END__ \n\n Simulation finished\n";
print "\n[6] Before replacement and val = \"$_\"\n";
s/.*(__END__).*/$1/;
print "\n[6] After replacement and val = \"$_\"\n";
$_ = "The following is the final result:\n- status:pass \n- __END__ \n\n Simulation finished\n";
print "\n[7] Before replacement and val = \"$_\"\n";
s/.*(__END__).*/$1/s;
print "\n[7] After replacement and val = \"$_\"\n";
# 3) The Binding operaor for replacement
$_ = "today is 12th Jan. 2014";
my $val = "TODAY is 12th jan. 2014";
print "[8] before replacement: \n";
print " \$_ = \"$_\"\n";
print " \$val = \"$val\"\n";
$val =~ s/today/tommorw/i;
print "[8] after replacement: \n";
print " \$_ = \"$_\"\n";
print " \$val = \"$val\"\n";
# 4) Case shifting operation
$_ = "I saw Barney with Fred.\n";
print "\n[9] Before replacement and val = \"$_\"\n";
s/(fred|barney)/\U$1/gi;
print "\n[10] After replacement and val = \"$_\"\n";
s/(fred|barney)/\L$1/gi;
print "\n[11] After replacement and val = \"$_\"\n";
s/(\w+) with (\w+)/\U$2\E with $1/i;
print "\n[12] After replacement and val = \"$_\"\n";
s/(fred|barney)/\u$1/ig;
print "\n[13] After replacement and val = \"$_\"\n";
s/(fred|barney)/\u\L$1/ig;
print "\n[14] After replacement and val = \"$_\"\n";
@list=split/separator/,$string or@list=split(/separator/,$string)
#!/usr/bin/perl -w
use strict;
#-----------------------------------------------------
# The example used to show the useage of split operator
# in perl
#
my @fileds = split /:/,"abc:def:g:h";
print "\n[1] obtained list = @fileds \n\n";
@fileds = split /:/,"abc:def::g:h";
print "\n[2] obtained list = @fileds \n\n";
@fileds = split /:/,":::a:b:c:::";
print "\n[3] obtained list = @fileds \n\n";
my $num = @fileds;
print "\n[3'] number of list = $num \n\n";
my $some_input = "This is a \t test.\n";
my @args = split(/\s+/,$some_input);
print "\n[4] obtained list = @args \n\n";
$_ = "This is a \t test.\n";
@fileds = split;
print "\n[5] obtained list = @fileds \n\n";
my $result = join $glue, @pieces; or my $result = join($glue, @pieces);
my $x = join ":",4,6,8,10,12;
print "\n[1] after join and the value = \"$x\"\n\n";
my @values = split /:/,$x;
print "\n[2] after split and the value = \"@values\"\n\n";
my $z = join "-",@values;
print "\n[3] after join and the value = \"$z\"\n\n";
一般情况下,模式匹配操作符m//返回的是布尔值。如果在列表上下文中使用m//,模式匹配成功时返回的是所有捕获变量的列表,匹配失败时返回的是空列表。另外,/g修饰符也可以用在m//操作符上。
my $test = "Hello there, neighbor!";
my ($first,$second,$third) = ($test =~ /(\S+) (\S+), (\S+)/);
print "\nfirst = \"$first\" ,second = \"$second\" ,third = \"$third\"\n\n";
my $test = "Fred dropped a 5 ton granite block on Mr.Slate";
my @words = ($test =~ /([a-z]+)/ig);
print "Result : \"@words\"\n";
eg:
perl -p -i.bak -w -e 's/Randall/Randal/g' fred*.dat
设置/选项 | 作用 |
---|---|
-p | 让Perl自动生成一段小程序:while(<>){print:} |
-i.bak | 告诉Perl处理之前先将原文件做一个备份,备份文件的后缀名为“.bak” |
-w | 打开warning |
-e | 告诉Perl后面接着的是程序代码 |
#!/usr/bin/perl -w
use strict;
use POSIX;
use Getopt::Long;
# ----------------------------------------------------
# Filename : gen_ext4.18_logs.pl
#
# Description :
# the file use to generate the log files for
# example ext4.18.pl
#
# ---------------------------------------------------
my $file_num = 10; # how many log file will be generated
my $help = 0;
my $debug = 0;
my @companys = qw(vimcrc versilcon amd marvell amlogic mtk qualcomm spreadtrum CSR Parade);
my @names = qw(peter david tony allen kurt sam mark brance steven);
my @departs = qw(analog digital SoCI SoCII SoCIII video Audio Bluetooth WIFI PR CPU GPU PCIE);
my $name_num = @names + 0;
my $company_num = @companys + 0;
my $depart_num = @departs + 0;
my $tab = ""x4;
GetOptions(
'file_num=s' => \$file_num,
'help!' => \$help,
'debug!' => \$debug,
);
&help_message() if $help;
if (defined $file_num && $file_num > 0 ) {
if (! -e "./log") {
system ("mkdir log");
}
#use the for loop to generate the specified number of log files
for (my $i = 0; $i < $file_num; $i++) {
my $cur_file = "./log/info_${i}.log";
my $str = "";
open (LOG,">",$cur_file) or die "Can not open $cur_file file for writing!\n";
my $name_index = int (rand($name_num));
my $company_index = int (rand($company_num));
my $depart_index = int (rand($depart_num));
my $num = int(rand(86400));
my $date = getTime(time() - 86400*$i - $num);
my $month = $date->{month};
my $day = $date->{day};
my $year = $date->{year};
my $time = "${month}/${day}/${year}";
$str .= "program name : program_${i}\n";
$str .= "Author : $names[$name_index]\n";
$str .= "Company : $companys[$company_index]\n";
$str .= "Department : $departs[$depart_index]\n";
$str .= "Phone : +86 32 1345 123${i}\n";
$str .= "Date : $time \n";
$str .= "Version : ${i}.1 \n";
$str .= "Size : ${i}k \n";
$str .= "Status : Final beta_${i} \n";
print LOG $str;
print "INFO -- the $cur_file has been generated \n\n" if $debug;
close (LOG);
}
print "\n -- All the requested log file have been generated and saved into ./log dir! -- \n\n";
}
sub getTime {
my $time = shift || time();
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime ($time);
$year += 1900;
$mon ++;
$min = '0'.$min if length($min) <2;
$sec = '0'.$sec if length($sec) <2;
$mon = '0'.$mon if length($mon) <2;
$mday = '0'.$mday if length($mday) <2;
$hour = '0'.$hour if length($hour) <2;
my $weekday = ('sun','Mon','Tue','Wed','Thu','Fri','Sat') [$wday];
return {
'second'=> $sec,
'minute'=> $min,
'hour' => $hour,
'day' => $mday,
'month' => $mon,
'year' => $year,
'weekNO'=> $wday,
'wday' => $weekday,
'date' => "$year-$mon-$mday"
};
}
sub help_message {
print "\n$0 used to gegerate the log file for example to deal with\n\n";
print "Usage : perl $0 -file_num file_num [-debug]\n";
print " or prtl $0 -help/-h \n\n";
exit;
}
#!/usr/bin/perl -w
use strict;
# ----------------------------------------------------
# Filename : ext4.18.pl
#
# Description :
#
#
# ---------------------------------------------------
my $log_dir;
my @log_files;
if (@ARGV > 0) {
$log_dir = $ARGV[0];
}
else {
&help_message();
}
$log_dir =~ s#\|/$##; #remove "/" or "\" in the last position
@log_files = glob"$log_dir/*.log";
print "[INFO] -- obtain the log files done!\n\n";
print "log_files = @log_files \n\n";
if (defined $log_files[0] && $log_files[0] !~ /^\s*$/) {
print "[INFO] -- Start to process the obtained log files \n\n";
foreach my $file (@log_files) {
# -------------------------------------------------------------
# the following is the example code in the book
# but can not obtain the expected result in the
# windows environment.Not clear for the reason.
# And so use another way to reaslze the function
# -------------------------------------------------------------
# chomp(my $date = `date`);
# $^I = ".bak";
# while (<$file>) {
# s/^Author:.*/Author: Radal L.Schwartz/;
# s/^Phone:.*\n//;
# s/^Date:.*/Date:$date/;
# print;
# }
#
# --------------------------------------------------------------
# the following is anothor common way to reaszie the function
system ("cp $file ${file}.bak");
my $tmp_file = "${file}.tmp";
my $str = "";
chomp (my $date = `date`);
open (IN_LINE,"<",$file) or die "Can not open $file for reading!\n";
while ( ) {
s/^Author\s*:.*/Author : Randal L. Schwartz/;
s/^Phone\s*:.*//;
s/^Date\s*:.*/Date : $date/;
$str .= $_ if ($_ !~ /^s*&/)
}
close(IN_LINE);
open (LOG,">",$tmp_file) or die "Can not open $tmp_file for writing!\n";
print LOG $str;
system ("mv $tmp_file $file");
}
print "[INFO] -- Complete to process the obtained log files \n\n";
}
sub help_message () {
print "Usage : perl $0 log_dir\n\n";
exit;
}