今天下午在看那个林信良的个人网站
http://caterpillar.onlyfun.net/Gossip/中的字符串时,我留意下了他留下的Reference Site
http://www.javablogging.com/string-and-memory-leaks/ 上面有一篇文章是关于 String.subString看法的,标题是
“String and memory leaks”
里面提及到
http://nflath.com/2009/07/the-dangers-of-stringsubstring/上关于SubString造成内存泄漏的问题,这篇文章写的很不错。
恩,里面讲到了关于String.subString的本质.在Java中,字符串是很复杂的一个问题,java有对字符串的优化,比如String Pool. 对于SubString这个问题上, java也有自己的想法,比如
String oldStr = "hello,clark";
String newStr = oldStr.subString(0,4);
对于这个写法,实际上对于oldStr是一个char[]数组[h,e,l,l,0,,,c,l,a,r,k],对于subString操作,newStr并不是自己copy oldStr的char[]数组hello自己去创建一个新的char[]数组,而是java在背后进行了String Reusing Optimization,它不会自己创建一个新的char数组,而是reuse原来的char数组。所以了,这样就不会有很多的原来的char[]数组的碎片。引用
http://www.javablogging.com/string-and-memory-leaks/ 上的列子:
public static void sendEmail(String emailUrl) {
String email = emailUrl.substring(7); // 'mailto:' prefix has 7 letters
String userName = email.substring(0, email.indexOf("@"));
String domainName = email.substring(email.indexOf("@"));
}
public static void main(String[] args) {
sendEmail("mailto:user_name@domain_name.com");
}
但是这个虽然在一般情况是好,不过也是有代价的。根据
http://nflath.com/2009/07/the-dangers-of-stringsubstring/上,因为字符串不是自己新建一个char[]数组,而是引用了原来的char[]数组,这样oldStr就无法garbage collected ,因为newStr还是拥有oldStr的char[]数组的引用。这样容易引起Outof Memory 异常。解决办法是了,便是让newStr拥有自己的char[]数组,也就是自己在subString时强迫创建自己的char[]数组,这样就不会有garbage collected 的问题(reachable but unused!) 怎么办:
String sub = new String( oldString.substring(0, 4) );
http://www.javablogging.com/string-and-memory-leaks/上举的例子很贴切:
引用
public final static String START_TAG = "<title>";
public final static String END_TAG = "</title>";
public static String getPageTitle(String pageUrl) {
// retrieve the HTML with a helper function:
String htmlPage = getPageContent(pageUrl);
// parse the page content to get the title
int start = htmlPage.indexOf(START_TAG);
start = start + START_TAG.length();
int end = htmlPage.indexOf(END_TAG);
String title = htmlPage.substring(start, end);
return title;
}
直接引用作者的文字,实在是很平直,犀利啊
引用
Now, try to imagine that the htmlPage String is huge – more than 100.000 characters, but the title of this page has only 50 characters. Because of the optimization mentioned above the returned object will reuse the char array of the htmlPage instead of creating a new one… and this means that instead of returning a small string object you get back a huge String with 100.000 characters array!! If your code will invoke getPageTitle() method many times you may find out that you have stored only a thousand titles and already you are out of memory!! Scary, right?
Of course there is an easy solution for that – instead of returning the title in line 13, you can return new String(title). The String(String) constructor is always doing a subcopy of the underlying char array, so the created title will actually have only 50 characters. Now we are safe:)
So what is the lesson here? Always use new String(String)? No… In general the String optimizations are really helpful and it is worth to take advantage of them. You just have to be careful with what you are doing and be aware of what is going on ‘under the hood’ of your code. String class API is in some situations not intuitive, so beware! (or just read trough it at least once:D)