String.split函数的用法

今天一个朋友问了我一个关于split的问题,突然发现以前都是使用的默认情况,全然不知spilt竟然是有两个参数的。

那么这里就好好再来学习一下split的用法。

spilt函数主要有两种参数形式:

public String[] split(String regex)

public String[] split(String regex, int limit)

第一种参数形式可以看做是第二种参数形式的第二个参数默认为0的情况,这种默认情况也是我们最常用的。

既然要好好学习spilt,不妨先来看看源码:

/**
  * Splits this string around matches of the given
  *
  * 围绕给定的匹配将这个字符串分开
  * 
  * 

The array returned by this method contains each substring of this * string that is terminated by another substring that matches the given * expression or is terminated by the end of the string. The substrings in * the array are in the order in which they occur in this string. If the * expression does not match any part of the input then the resulting array * has just one element, namely this string. * * 该方法返回的数组包含该字符串的每个子字符串,该子字符串由另一个匹配给定表达式的子字符串终止, * 或由字符串结束终止。数组中的子字符串按照它们在这个字符串中出现的顺序排列。如果表达式不匹配输 * 入的任何部分,那么结果数组只有一个元素,即这个字符串。 * *

When there is a positive-width match at the beginning of this * string then an empty leading substring is included at the beginning * of the resulting array. A zero-width match at the beginning however * never produces such empty leading substring. * * 当字符串开头有一个正宽度匹配时,结果数组的开头包含一个空的前导子字符串。但是一开始的零宽度匹 * 配永远不会产生这样空的前导子字符串。 * *

The {@code limit} parameter controls the number of times the * pattern is applied and therefore affects the length of the resulting * array. If the limit n is greater than zero then the pattern * will be applied at most n - 1 times, the array's * length will be no greater than n, and the array's last entry * will contain all input beyond the last matched delimiter. If n * is non-positive then the pattern will be applied as many times as * possible and the array can have any length. If n is zero then * the pattern will be applied as many times as possible, the array can * have any length, and trailing empty strings will be discarded. * * {int limit}参数控制模式应用的次数,因此会影响结果数组的长度。如果限制n > 0,则模式最多应用 * n-1次,数组的长度将不大于n,数组的最后一个条目将包含超过最后一个匹配分隔符的所有输入。如果n * 是非正的,则模式将被尽可能多地应用,数组可以有任意长度。如果n为0,则模式将被尽可能多次应用, * 数组可以有任意长度,并丢弃尾随的空字符串。 */ public String[] split(String regex, int limit) { /* fastpath if the regex is a (1)one-char String and this character is not one of the RegEx's meta characters ".$|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */ char ch = 0; if (((regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 && ((ch-'a')|('z'-ch)) < 0 && ((ch-'A')|('Z'-ch)) < 0)) && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) { int off = 0; int next = 0; boolean limited = limit > 0; ArrayList list = new ArrayList<>(); while ((next = indexOf(ch, off)) != -1) { if (!limited || list.size() < limit - 1) { list.add(substring(off, next)); off = next + 1; } else { // last one //assert (list.size() == limit - 1); list.add(substring(off, value.length)); off = value.length; break; } } // If no match was found, return this if (off == 0) return new String[]{this}; // Add remaining segment if (!limited || list.size() < limit) list.add(substring(off, value.length)); // Construct result int resultSize = list.size(); if (limit == 0) { while (resultSize > 0 && list.get(resultSize - 1).length() == 0) { resultSize--; } } String[] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); } return Pattern.compile(regex).split(this, limit); }

源码中的注释着重对第二个参数进行了解释,第二个参数取值不同,会有不同的情况,下面我也就第二个参数的取值来对split函数进行分析。

一,limit 大于 0

当limit大于0时,它限制regex最多成功匹配limit-1次,也就是说字符串最多被分成limit个子串。此时,spilt会保留分割出来的空字符串(当两个regex连续匹配或者regex在头尾匹配,会产生空字符串),直到达到匹配上限。

    val str = "a*b*c"
    val res = str.split("\\*",2)
    println(res.toSeq)//Array(a, b*c)


    val str = "a*b*c"
    val res = str.split("\\*",4)
    println(res.toSeq)//Array(a, b, c)


    val str = "*a*b*c*"
    val res = str.split("\\*",3)
    println(res.toSeq)//Array(, a, b*c*)


    val str = "*a*b*c**"
    val res = str.split("\\*",6)
    println(res.toSeq)//Array(, a, b, c, , )


    val str = "*a*b*c**"
    val res = str.split("\\*",5)
    println(res.toSeq)//Array(, a, b, c, *)

二,limit 等于 0

当limit等于0时,split函数会尽可能的多匹配regex,但不再保留处于末尾位置的空字符串。这里的一个特殊情况是,当被分割的字符串时,分割结果仍然是一个空字符串组成的数组。

    val str = "a*b*c"
    val res = str.split("\\*",0)
    println(res.toSeq)//Array(a, b, c)


    val str = "*a*b*c**"
    val res = str.split("\\*",0)
    println(res.toSeq)//Array(, a, b, c)


    val str = ""
    val res = str.split("\\*",0)
    println(res.toSeq)//Array()

三,limit 小于 0

当limit为负数的时候,split函数会尽可能多的匹配regex,并且保留末尾的空字符串。

    val str = "*a*b**c***"
    val res = str.split("\\*",-1)
    println(res.toSeq)//Array(, a, b, , c, , , )

 

你可能感兴趣的:(scala)