Java中subString和split导致的内存溢出和对策

下面的一个例子说明String的substring方法引起的OutOfMemoryError问题:


[java] view plaincopyprint?public class TestGC {    
  private String large = new String(new char[100000]);    
   
  public String getSubString() {    
    return this.large.substring(0,2);    
  }    
   
  public static void main(String[] args) {    
    ArrayList subStrings = new ArrayList();    
    for (int i = 0; i <1000000; i++) {    
      TestGC testGC = new TestGC();    
      subStrings.add(testGC.getSubString());    
    }    
  }    

public class TestGC {  
  private String large = new String(new char[100000]);  
 
  public String getSubString() {  
    return this.large.substring(0,2);  
  }  
 
  public static void main(String[] args) {  
    ArrayList subStrings = new ArrayList();  
    for (int i = 0; i <1000000; i++) {  
      TestGC testGC = new TestGC();  
      subStrings.add(testGC.getSubString());  
    }  
  }  
}运行该程序,结果出现:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

为什么会出现这个情况?查看一下JDK String类substring方法的源码,可以找到原因,源码如下:


[java] view plaincopyprint?   public String substring(int beginIndex, int endIndex) { 
if (beginIndex < 0) { 
    throw new StringIndexOutOfBoundsException(beginIndex); 

if (endIndex > count) { 
    throw new StringIndexOutOfBoundsException(endIndex); 

if (beginIndex > endIndex) { 
    throw new StringIndexOutOfBoundsException(endIndex - beginIndex); 

return ((beginIndex == 0) && (endIndex == count)) ? this : 
    new String(offset + beginIndex, endIndex - beginIndex, value); 
   } 

    public String substring(int beginIndex, int endIndex) {
 if (beginIndex < 0) {
     throw new StringIndexOutOfBoundsException(beginIndex);
 }
 if (endIndex > count) {
     throw new StringIndexOutOfBoundsException(endIndex);
 }
 if (beginIndex > endIndex) {
     throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
 }
 return ((beginIndex == 0) && (endIndex == count)) ? this :
     new String(offset + beginIndex, endIndex - beginIndex, value);
    }该方法最后一行,调用了String的一个私有的构造方法,如下:


[java] view plaincopyprint?   // Package private constructor which shares value array for speed.  
   String(int offset, int count, char value[]) { 
this.value = value; 
this.offset = offset; 
this.count = count; 
   } 

    // Package private constructor which shares value array for speed.
    String(int offset, int count, char value[]) {
 this.value = value;
 this.offset = offset;
 this.count = count;
    }从该构造函数的访问权限和注释,可以看出,SUN为了优化性能而专门写了这个构造方法。
该方法为了避免内存拷贝,提高性能,并没有重新创建char数组,而是直接复用了原String对象的char[],通过改变偏移

量和长度来标识不同的字符串内容。也就是说,substring出的来String小对象,仍然会指向原String大对象的char[],

所以就导致了OutOfMemoryError问题。
找到问题之后,将上面代码中,getSubString的方法修改一下,如下:


[java] view plaincopyprint?public String getSubString() { 
    return new String(this.large.substring(0,2));  

    public String getSubString() {
        return new String(this.large.substring(0,2));
    }将substring的结果,重新new一个String出来。再运行该程序,则没有出现OutOfMemoryError的问题。为什么?因

为此时调用的是String类的public的构造方法,该方法源码如下:


[java] view plaincopyprint?   public String(String original) { 
int size = original.count; 
char[] originalValue = original.value; 
char[] v; 
    if (originalValue.length > size) { 
        // The array representing the String is bigger than the new  
        // String itself.  Perhaps this constructor is being called  
        // in order to trim the baggage, so make a copy of the array.  
           int off = original.offset; 
           v = Arrays.copyOfRange(originalValue, off, off+size); 
    } else { 
        // The array representing the String is the same  
        // size as the String, so no point in making a copy.  
    v = originalValue; 
    } 
this.offset = 0; 
this.count = size; 
this.value = v; 
   } 

    public String(String original) {
 int size = original.count;
 char[] originalValue = original.value;
 char[] v;
   if (originalValue.length > size) {
      // The array representing the String is bigger than the new
      // String itself.  Perhaps this constructor is being called
      // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
  } else {
      // The array representing the String is the same
      // size as the String, so no point in making a copy.
     v = originalValue;
  }
 this.offset = 0;
 this.count = size;
 this.value = v;
    }从代码可以看出,在String对象中value的length大于count的情况下,会重新创建一个char[],并进行内存拷贝。


除了substring方法之后,String的split方法,也存在同样的问题,split的源码如下:


[java] view plaincopyprint?public String[] split(String regex, int limit) { 
urn Pattern.compile(regex).split(this, limit); 

    public String[] split(String regex, int limit) {
 return Pattern.compile(regex).split(this, limit);
    }可以看出,String的split方法通过Pattern的split方法来实现,Pattern的split方法源码如下:


[java] view plaincopyprint?public String[] split(CharSequence input, int limit) { 
        int index = 0; 
        boolean matchLimited = limit > 0; 
        ArrayList matchList = new ArrayList(); 
        Matcher m = matcher(input); 
 
        // Add segments before each match found  
        while(m.find()) { 
            if (!matchLimited || matchList.size() < limit - 1) { 
                String match = input.subSequence(index, m.start()).toString(); 
                matchList.add(match); 
                index = m.end(); 
            } else if (matchList.size() == limit - 1) { // last one  
                String match = input.subSequence(index, 
                                                 input.length()).toString(); 
                matchList.add(match); 
                index = m.end(); 
            } 
        } 
 
        // If no match was found, return this  
        if (index == 0) 
            return new String[] {input.toString()}; 
 
        // Add remaining segment  
        if (!matchLimited || matchList.size() < limit) 
            matchList.add(input.subSequence(index, input.length()).toString()); 
 
        // Construct result  
        int resultSize = matchList.size(); 
        if (limit == 0) 
            while (resultSize > 0 && matchList.get(resultSize-1).equals("")) 
                resultSize--; 
        String[] result = new String[resultSize]; 
        return matchList.subList(0, resultSize).toArray(result); 
    } 

public String[] split(CharSequence input, int limit) {
        int index = 0;
        boolean matchLimited = limit > 0;
        ArrayList matchList = new ArrayList();
        Matcher m = matcher(input);

        // Add segments before each match found
        while(m.find()) {
            if (!matchLimited || matchList.size() < limit - 1) {
                String match = input.subSequence(index, m.start()).toString();
                matchList.add(match);
                index = m.end();
            } else if (matchList.size() == limit - 1) { // last one
                String match = input.subSequence(index,
                                                 input.length()).toString();
                matchList.add(match);
                index = m.end();
            }
        }

        // If no match was found, return this
        if (index == 0)
            return new String[] {input.toString()};

        // Add remaining segment
        if (!matchLimited || matchList.size() < limit)
            matchList.add(input.subSequence(index, input.length()).toString());

        // Construct result
        int resultSize = matchList.size();
        if (limit == 0)
            while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
                resultSize--;
        String[] result = new String[resultSize];
        return matchList.subList(0, resultSize).toArray(result);
    }方法中的第9行:Stirng match = input.subSequence(intdex, m.start()).toString();
调用了String类的subSequence方法,该方法源码如下:


[java] view plaincopyprint?public CharSequence subSequence(int beginIndex, int endIndex) { 
    return this.substring(beginIndex, endIndex); 

    public CharSequence subSequence(int beginIndex, int endIndex) {
        return this.substring(beginIndex, endIndex);
    }通过代码可以看出,最终调用的是String类的substring方法,因此存在同样的问题。split出来的小对象,直接使

用原String对象的char[]。

 


看了一下StringBuilder和StringBuffer的substring方法,则不存在这样的问题。其源码如下:


[java] view plaincopyprint?public String substring(int start, int end) { 
(start < 0) 
 throw new StringIndexOutOfBoundsException(start); 
(end > count) 
 throw new StringIndexOutOfBoundsException(end); 
(start > end) 
 throw new StringIndexOutOfBoundsException(end - start); 
    return new String(value, start, end - start); 

    public String substring(int start, int end) {
 if (start < 0)
     throw new StringIndexOutOfBoundsException(start);
 if (end > count)
     throw new StringIndexOutOfBoundsException(end);
 if (start > end)
     throw new StringIndexOutOfBoundsException(end - start);
        return new String(value, start, end - start);
    }最后一行,调用了String类的public构造方法,方法源码如下:


[java] view plaincopyprint?public String(char value[], int offset, int count) { 
    if (offset < 0) { 
        throw new StringIndexOutOfBoundsException(offset); 
    } 
    if (count < 0) { 
        throw new StringIndexOutOfBoundsException(count); 
    } 
    // Note: offset or count might be near -1>>>1.  
    if (offset > value.length - count) { 
        throw new StringIndexOutOfBoundsException(offset + count); 
    } 
    this.offset = 0; 
    this.count = count; 
    this.value = Arrays.copyOfRange(value, offset, offset+count); 

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.offset = 0;
        this.count = count;
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }该方法不是直接使用原String对象的char[],而是重新进行了内存拷贝。

你可能感兴趣的:(java程序设计)