本章主要内容是利用正则表达式进行文本的处理。本篇笔记示例内容均摘自Learning Perl第七版英文原版。
(一)使用s///进行替换
格式一般是:
s///
比如:
$_ = "He's out bowling with Barney tonight.";
s/Barney/Fred/; # 将文本里的字符串里的Barney替换成Fred
print "$_\n";
(1)使用/g进行全局替换
默认情况下,s///只进行一次替换。使用/g告诉s///替换所有非重叠的字符串:
$_ = "home, sweet home!";
s/home/cave/g;
print "$_\n";
结果:
cave, sweet cave! #把所有的home替换成了cave
也可以进行非字符的替换:
$_ = "Input data\t may have extra whitespace.";
s/\s+/ /g; #把上面字符串里的制表符以及后面的空格替换成一个空格
结果应是:
Input data may have extra whitespace.
(2)不同的分隔符
除了用/来作为分隔符,也可以用井号,或者括号,比如:
s#\Ahttps://#http://#;
s{fred}{barney};
s[fred] (barney);
s
(3)大小写转化
使用\U将所有字符变成大写:
$_ = "I saw Barney with Fred.";
s/(fred|barney)/\U$1/gi; # 结果应该是"I saw BARNEY with FRED.",这里的i代表在匹配的时候大小写不敏感
使用\L将所有大写字母变小写:
s/(fred|barney)/\L$1/gi; # 结果为"I saw barney with fred."
只将其中一部分字符更改大小写:
s/(\w+) with (\w+)/\U$2\E with $1/i; # 结果为 "I saw FRED with barney."
#这里注意,只把前面匹配的第二个字符串fred进行大写
全部小写,除了首字母要大写:
s/(fred|barney)/\u\L$1/ig; # 结果为"I saw Fred with Barney."
(二)split分割符
格式一般为:
my @fields = split /separator/, $string;
根据冒号进行分割:
my @fields = split /:/, "abc:def:g:h"; # gives ("abc", "def", "g", "h")
my @fields = split /:/, "abc:def::g:h"; # gives ("abc", "def", "", "g", "h")
有一个规定,当分割的前几列是空白的时候,结果也会显示空白,而后几列空白则会被丢弃:
my @fields = split /:/, ":::a:b:c:::"; # gives ("", "", "", "a", "b", "c") #c后面的被舍弃了
如果想保留后面的空白,可以用加一个参数-1:
my @fields = split /:/, ":::a:b:c:::", -1; # gives("", "", "", "a", "b", "c", "", "", "")
以空格为分隔符,无论有几个空格,都按照一个算:
my $some_input = "This is a \t test.\n";
my @args = split /\s+/, $some_input; # ("This", "is", "a", "test.")
(三)join功能
使用格式:
my $result = join $glue, @pieces;
比如说:
my $x = join ":", 4, 6, 8, 10, 12; # $x is "4:6:8:10:12"
(四)word边界符
前两章里讲到了\b是word的边界匹配符,但是它不能区分word里面的符号,比如doesn't。如果使用\b,会出现这样的情况:
my $string = "This doesn't capitalize correctly.";
$string =~ s/\b(\w)/\U$1/g;
print "$string\n";
得到这样的结果:
This Doesn'T Capitalize Correctly. #把t也当成了一个独立的word
如何避免这样的情况发生?使用\b{wb}:
my $string = "this doesn't capitalize correctly.";
$string =~ s/\b{wb}(\w)/\U$1/g;
print "$string\n";
结果是:
This Doesn't Capitalize Correctly.