Java里String.split需要注意的用法

我们常常用String的split()方法去分割字符串,有两个地方值得注意:

 

1. 当分隔符是句号时("."),需要转义:

由于String.split是基于正则表达式来分割字符串,而句号在正则表达式里表示任意字符。

//Wrong:

//String[] words = tmp.split(".");



//Correct:

String[] words = tmp.split("\\.");

所以,假设分隔符在正则表达式里有一定的意义时,需要格外留心,必须将它们转义才能达到分割的效果。

 

2. 假设字符串最后有连续多个分隔符,且这些分隔符都需要被分割的话,需要调用split(String regex,int limit)这个方法:

String abc = "a,b,c,,,";

String[] str = abc.split(",");

        

System.out.println(Arrays.toString(str)+" "+str.length);

        

String[] str2 = abc.split(",",-1);

        

System.out.println(Arrays.toString(str2)+" "+str2.length);

输出如下:

[a, b, c] 3
[a, b, c, , , ] 6

需要输出csv文件的时候,尤其需要注意。

 

3. 假设需要快速分割字符串,split()并不是最有效的方法。在split()方法内,有如下的实现:

1 public String[] split(String regex, int limit) {

2       return Pattern.compile(regex).split(this, limit);

3 }

频繁调用split()会不断创建Pattern这个对象,因此可以这样去实现,减少Pattern的创建:

1 //create the Pattern object outside the loop    

2 Pattern pattern = Pattern.compile(" ");

3 

4 for (int i = 0; i < 1000000; i++)

5 {

6     String[] split = pattern.split("Hello World", 0);

7     list.add(split);

8 }

另外split()也往往比indexOf()+subString()这个组合分割字符串要稍慢,详情可看这个帖子

我在本机做过测试,感觉indexOf()+subString()比split()快一倍:

 1 public static void main(String[] args) {

 2         StringBuilder sb = new StringBuilder();

 3         for (int i = 100000; i < 100000 + 60; i++)

 4             sb.append(i).append(' ');

 5         String sample = sb.toString();

 6 

 7         int runs = 100000;

 8         for (int i = 0; i < 5; i++) {

 9             {

10                 long start = System.nanoTime();

11                 for (int r = 0; r < runs; r++) {

12                     StringTokenizer st = new StringTokenizer(sample);

13                     List<String> list = new ArrayList<String>();

14                     while (st.hasMoreTokens())

15                         list.add(st.nextToken());

16                 }

17                 long time = System.nanoTime() - start;

18                 System.out.printf("StringTokenizer took an average of %.1f us%n", time / runs

19                         / 1000.0);

20             }

21             {

22                 long start = System.nanoTime();

23                 Pattern spacePattern = Pattern.compile(" ");

24                 for (int r = 0; r < runs; r++) {

25                     List<String> list = Arrays.asList(spacePattern.split(sample, 0));

26                 }

27                 long time = System.nanoTime() - start;

28                 System.out.printf("Pattern.split took an average of %.1f us%n", time / runs

29                         / 1000.0);

30             }

31             {

32                 long start = System.nanoTime();

33                 for (int r = 0; r < runs; r++) {

34                     List<String> list = new ArrayList<String>();

35                     int pos = 0, end;

36                     while ((end = sample.indexOf(' ', pos)) >= 0) {

37                         list.add(sample.substring(pos, end));

38                         pos = end + 1;

39                     }

40                 }

41                 long time = System.nanoTime() - start;

42                 System.out

43                         .printf("indexOf loop took an average of %.1f us%n", time / runs / 1000.0);

44             }

45         }

46     }

在jdk1.7测试后,结果如下:

StringTokenizer took an average of 7.2 us
Pattern.split took an average of 7.9 us
indexOf loop took an average of 3.5 us

------------------------------------------
StringTokenizer took an average of 6.8 us
Pattern.split took an average of 5.4 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 6.0 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 5.9 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us

------------------------------------------
StringTokenizer took an average of 6.4 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.2 us

 

本文完

你可能感兴趣的:(String)