String.substring()方法简单理解(续)

上一篇博文(http://woyixiaorenne.iteye.com/blog/2305280)说到JDK1.6和1.7的String.substring()的实现出现了变化,那么究竟为什么会变化呢?

注:本文大部分内容是参考或者复制作者的内容,附上原文地址http://www.importnew.com/7418.html

1、当substring()被调用的时候,内部发生什么事?

你或许会认为由于x是不可变的对象,当x被x.substring(1,3)返回的结果赋值后,它将指向一个全新的字符串如下图:



String.substring()方法简单理解(续)_第1张图片
 
然而,这个图并不完全正确,或者说并没有完全表示出java 堆中真正发生的事情。

2. JDK6中的substring()

java中字符串是通过字符数组来支持实现的,在JDK6中,String类包含3个域,char[] value、int offset、int count。分别用于存储真实的字符数组、数组的偏移量,以及String所包含的字符的个数。

当substring()方法被调用的时候,它会创建一个新的字符串对象,但是这个字符串的值在java 堆中仍然指向的是同一个数组,这两个字符串的不同在于他们的count和offset的值。和我们之前说的一致,即新创建的String对象和原来的对象是共用同一个value数组,即指向相同的内存空间(并没有重新开辟新的内存空间),而只是通过offset和count来确定新的字串的值。

String.substring()方法简单理解(续)_第2张图片

内存中情况是:


String.substring()方法简单理解(续)_第3张图片
 
 3. jdk6中substring()将会导致的问题

如果你有一个非常长的字符串,但是你仅仅只需要这个字符串的一小部分,这就会导致性能问题(译注:可能会造成内存泄露,这个bug很早以前就有提及),因为你需要的只是很小的部分,而这个子字符串却要包含整个字符数组,在jdk6中解决办法就是使用下面的方法,它会指向一个真正的子字符串。

1 x = x.substring(x, y) + ""

 4. 在JDK7中有所改进,substring()方法在堆中真正的创建了一个新的数组,当原字符数组没有被引用后就被GC回收了.因此避免了上述问题.

String.substring()方法简单理解(续)_第4张图片
 内存中的情况是:


String.substring()方法简单理解(续)_第5张图片
 还有一点,jdk1.7中已经没有了offset和count。

 因为知识比较匮乏,所以前面一篇博文只是简单看了一下源码,知道了1.6和1.7中他们的实现方法发生了变化,但并未想到为什么会改变,看了这篇文章之后,有种恍然大悟的感觉,多查多想。

附上前面所提到的bug:

Description
Name: rmT116609			Date: 02/13/2002


java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)

and also

java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0)
Classic VM (build 1.3.0, J2RE 1.3.0 IBM build co130-20010925 (JIT enabled: jitc)
)

DESCRIPTION OF THE PROBLEM :

String.substring() and StringBuffer.substring() attempt to improve performance (and reduce the memory footprint) by sharing the underlying char[] across various Strings.

However, a problem arises in the following scenerio:

1) Create a huge temporary StringBuffer
2) Extract short strings using substring() and place them in long-term storage (in a table of some sort)
3) Delete the StringBuffer, assuming it will be garbage-collected

However, due to implementation issues the huge StringBuffer will not be garbage collected so long as the substrings extracted from it live. This poses a problem and can potencially lead to huge memory leaks.

I don't necessarily have a solution, but I *did* want to point out this problem. This lead to a memory leak of 10MB/second on my sample app and I only managed to track down the problem by looking at the sources in String.java.

Upon forcing a String copy via getChars(), the memory leak disappeared and the huge buffer was being garabage collected properly.

The API should handle this problem better or at the very least provide extensive documentation on this topic.

This bug can be reproduced always.

Source:

class Test{

public static void main(String a[]) {

 String [] buffer = new String[10000];
 for (int i = 0; i < 10000; i++) 
 {
  buffer[i] = new String(new char[10000000]).substring(0, 1);
 }
}
}

The above is really bad coding, but it demonstrates what is happening. The above will keep references to all char[10000000] instances for as long as buffer[] lives.

CUSTOMER WORKAROUND :
Copy substrings manually; however this makes the code JVM-specific.
(Review ID: 139509) 
======================================================================

                                    
 
Comments
 
 

 

Hardware and Software, Engineered to Work Together

 

你可能感兴趣的:(JDK源码简单分析)