俘获组:
Groovy正则表达式的一个最有用的特性就是能用正则表达式从另一个正则
表达式中俘获数据.看下面这个例子,如果我们想精确定位到Liverpool, England:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
locationData
=
"
Liverpool, England: 53° 25? 0? N 3° 0? 0?
"
我们能用string的split()方法,来截取我们需要的Liverpool, England(这里需要把
逗号除去).或许我们可以采用正则表达式,对于下面的例子,您对语法可能有一点生疏.
第一步,我们定义一个正则表达式,把我们感兴趣的内容都放入圆括号内:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
myRegularExpression
=
/
([a
-
zA
-
Z]
+
), ([a
-
zA
-
Z]
+
): ([
0
-
9
]
+
). ([
0
-
9
]
+
). ([
0
-
9
]
+
). ([A
-
Z]) ([
0
-
9
]
+
). ([
0
-
9
]
+
). ([
0
-
9
]
+
).
/
下面我们定义一个matcher,它是用=~操作符来完成的.
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
matcher
=
( locationData
=~
myRegularExpression )
变量matcher包含 java.util.regex.Matcher ,并被Groovy进行了增强.你可以访问你的数据像在Java平台上一样对一个Matcher对象.一个更棒的方式就是用matcher,来访问一个二维数组.
我们可以来看看数据的第一维:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
[
"
Liverpool, England: 53° 25? 0? N 3° 0? 0?
"
,
"
Liverpool
"
,
"
England
"
,
"
53
"
,
"
25
"
,
"
0
"
,
"
N
"
,
"
3
"
,
"
0
"
,
"
0
"
]
已经把满足条件的string加上原来的strng,组合成了一个数组.
这样我们就可以方便的输出我们想要的数据:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
if
(matcher.matches()) {
println(matcher.getCount()
+
"
occurrence of the regular expression was found in the string.
"
);
println(matcher[
0
][
1
]
+
"
is in the
"
+
matcher[
0
][
6
]
+
"
hemisphere. (According to:
"
+
matcher[
0
][
0
]
+
"
)
"
)
for
(
int
i
=
0
;i
<
matcher[
0
].size; i
++
)
{
println(matcher[
0
][i])
}
}
非俘获组:
有时候我们需要定义一个非俘获组,来获得我们想要的数据.来看下面的例子,我们的目标是
过滤掉它的middle name:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
names
=
[
"
Graham James Edward Miller
"
,
"
Andrew Gregory Macintyre
"
]
printClosure
=
{
matcher
=
(it
=~
/
(.
*?
)(
?
: .
+
)
+
(.
*
)
/
);
//
notice the non-matching group in the middle
if
(matcher.matches())
println(matcher[
0
][
2
]
+
"
,
"
+
matcher[
0
][
1
]);
}
names.each(printClosure);
输出:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
Miller, Graham
Macintyre, Andrew
有人可能对非俘获组不太明白,通俗点说就是在已经俘获的组除去你不想要的字符或符号.
比如:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
names
=
[
"
ZDW love beijing
"
,
"
Angel love beijing
"
,
"
Ghost hate beijing
"
]
我们只想要开头名字和结尾的城市,过滤掉love.这时
就用到了非俘获组.表示方法就是用?: 加上你要过滤的正则前面.
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
nameClosure
=
{
myMatcher
=
(it
=~
/
(.
*?
)(
?
: .
+
)
+
(.
*
)
/
)
if
(myMatcher.matches())
{
println(myMatcher[
0
][
1
]
+
"
"
+
myMatcher[
0
][
2
])
}
}
names.each(nameClosure);
我们来分析一下这个:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
(
?
: .
+
)
组都用()括起来,?:表示这是一个非俘获组 其中中间是有一个空格的.这个取决
于原字符串中间的空格,如果是逗号或其它符号,换成相应的就可以了.
.+ 任意多个字符(最少1个)
替换:
我们可能有这样的需要,在一个字符串中,把指定的字符串或符号,换成我们想要的.
比如:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
excerpt
=
"
At school, Harry had no one. Everybody knew that Dudley's gang hated that odd Harry Potter
"
+
"
in his baggy old clothes and broken glasses, and nobody liked to disagree with Dudley's gang.
"
;
matcher
=
(excerpt
=~
/
Harry Potter
/
);
excerpt
=
matcher.replaceAll(
"
Tanya Grotter
"
);
matcher
=
(excerpt
=~
/
Harry
/
);
excerpt
=
matcher.replaceAll(
"
Tanya
"
);
println(
"
Publish it!
"
+
excerpt);
这个例子中我们做了两件事情.一个是把Harry Potter换成了Tanya Grotter,另一个是
把Harry换成了Tanya.
Reluctant Operators
对于这个还是不翻译的好"勉强操作符"?.
对于.,*,+操作默认都是贪心的.意思就是说有时候把我们不想要的也
匹配进去了.这时我们就要用到Relucatant operators.
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
popesArray
=
[
"
Pope Anastasius I 399-401
"
,
"
Pope Innocent I 401-417
"
,
"
Pope Zosimus 417-418
"
,
"
Pope Boniface I 418-422
"
,
"
Pope Celestine I 422-432
"
,
"
Pope Sixtus III 432-440
"
,
"
Pope Leo I the Great 440-461
"
,
"
Pope Hilarius 461-468
"
,
"
Pope Simplicius 468-483
"
,
"
Pope Felix III 483-492
"
,
"
Pope Gelasius I 492-496
"
,
"
Pope Anastasius II 496-498
"
,
"
Pope Symmachus 498-514
"
]
我们只想要皇帝的名字和所在世纪.
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
/
Pope (.
*
)(
?
: .
*
)
?
([
0
-
9
]
+
)
-
([
0
-
9
]
+
)
/
上面是正常分组表达式,我们简单的在.*+后面再加上个?就表示Reluctant operators.
自己试验一下看看输出什么:
<!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->
popesArray
=
[
"
Pope Anastasius I 399-401
"
,
"
Pope Innocent I 401-417
"
,
"
Pope Zosimus 417-418
"
,
"
Pope Boniface I 418-422
"
,
"
Pope Celestine I 422-432
"
,
"
Pope Sixtus III 432-440
"
,
"
Pope Leo I the Great 440-461
"
,
"
Pope Hilarius 461-468
"
,
"
Pope Simplicius 468-483
"
,
"
Pope Felix III 483-492
"
,
"
Pope Gelasius I 492-496
"
,
"
Pope Anastasius II 496-498
"
,
"
Pope Symmachus 498-514
"
]
myClosure
=
{
myMatcher
=
(it
=~
/
Pope (.
*?
)(
?
: .
*
)
?
([
0
-
9
]
+
)
-
([
0
-
9
]
+
)
/
);
if
(myMatcher.matches())
println(myMatcher[
0
][
1
]
+
"
:
"
+
myMatcher[
0
][
2
]
+
"
to
"
+
myMatcher[
0
][
3
]);
}
popesArray.each(myClosure);
基本上满足了我们的要求.
你可以尝试一下如果不加?看看会发生什么错误~.