今天一个朋友问了我一个关于split的问题,突然发现以前都是使用的默认情况,全然不知spilt竟然是有两个参数的。
那么这里就好好再来学习一下split的用法。
spilt函数主要有两种参数形式:
public String[] split(String regex)
public String[] split(String regex, int limit)
第一种参数形式可以看做是第二种参数形式的第二个参数默认为0的情况,这种默认情况也是我们最常用的。
既然要好好学习spilt,不妨先来看看源码:
/**
* Splits this string around matches of the given
*
* 围绕给定的匹配将这个字符串分开
*
* The array returned by this method contains each substring of this
* string that is terminated by another substring that matches the given
* expression or is terminated by the end of the string. The substrings in
* the array are in the order in which they occur in this string. If the
* expression does not match any part of the input then the resulting array
* has just one element, namely this string.
*
* 该方法返回的数组包含该字符串的每个子字符串,该子字符串由另一个匹配给定表达式的子字符串终止,
* 或由字符串结束终止。数组中的子字符串按照它们在这个字符串中出现的顺序排列。如果表达式不匹配输
* 入的任何部分,那么结果数组只有一个元素,即这个字符串。
*
*
When there is a positive-width match at the beginning of this
* string then an empty leading substring is included at the beginning
* of the resulting array. A zero-width match at the beginning however
* never produces such empty leading substring.
*
* 当字符串开头有一个正宽度匹配时,结果数组的开头包含一个空的前导子字符串。但是一开始的零宽度匹
* 配永远不会产生这样空的前导子字符串。
*
*
The {@code limit} parameter controls the number of times the
* pattern is applied and therefore affects the length of the resulting
* array. If the limit n is greater than zero then the pattern
* will be applied at most n - 1 times, the array's
* length will be no greater than n, and the array's last entry
* will contain all input beyond the last matched delimiter. If n
* is non-positive then the pattern will be applied as many times as
* possible and the array can have any length. If n is zero then
* the pattern will be applied as many times as possible, the array can
* have any length, and trailing empty strings will be discarded.
*
* {int limit}参数控制模式应用的次数,因此会影响结果数组的长度。如果限制n > 0,则模式最多应用
* n-1次,数组的长度将不大于n,数组的最后一个条目将包含超过最后一个匹配分隔符的所有输入。如果n
* 是非正的,则模式将被尽可能多地应用,数组可以有任意长度。如果n为0,则模式将被尽可能多次应用,
* 数组可以有任意长度,并丢弃尾随的空字符串。
*/
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
源码中的注释着重对第二个参数进行了解释,第二个参数取值不同,会有不同的情况,下面我也就第二个参数的取值来对split函数进行分析。
当limit大于0时,它限制regex最多成功匹配limit-1次,也就是说字符串最多被分成limit个子串。此时,spilt会保留分割出来的空字符串(当两个regex连续匹配或者regex在头尾匹配,会产生空字符串),直到达到匹配上限。
val str = "a*b*c"
val res = str.split("\\*",2)
println(res.toSeq)//Array(a, b*c)
val str = "a*b*c"
val res = str.split("\\*",4)
println(res.toSeq)//Array(a, b, c)
val str = "*a*b*c*"
val res = str.split("\\*",3)
println(res.toSeq)//Array(, a, b*c*)
val str = "*a*b*c**"
val res = str.split("\\*",6)
println(res.toSeq)//Array(, a, b, c, , )
val str = "*a*b*c**"
val res = str.split("\\*",5)
println(res.toSeq)//Array(, a, b, c, *)
当limit等于0时,split函数会尽可能的多匹配regex,但不再保留处于末尾位置的空字符串。这里的一个特殊情况是,当被分割的字符串时,分割结果仍然是一个空字符串组成的数组。
val str = "a*b*c"
val res = str.split("\\*",0)
println(res.toSeq)//Array(a, b, c)
val str = "*a*b*c**"
val res = str.split("\\*",0)
println(res.toSeq)//Array(, a, b, c)
val str = ""
val res = str.split("\\*",0)
println(res.toSeq)//Array()
当limit为负数的时候,split函数会尽可能多的匹配regex,并且保留末尾的空字符串。
val str = "*a*b**c***"
val res = str.split("\\*",-1)
println(res.toSeq)//Array(, a, b, , c, , , )