String.intern in Java 6, 7 and 8 – string pooling (在Java 6,7和8中的String.intern - 字符串常量池化)

  • java 6中字符串常量池存储于永久代区中,由于此区域大小固定,因此String.intern的使用较易方式OOM。应避免在Java6中使用此方法。
    • 永久代大小固定,且无法再运行时中动态改变
  • java 7 和 8 中字符串常量池存储于堆中,且可被垃圾回收
    • 从程序的根节点出发,如果那些JVM字符串常量池中的字符串没有被引用,那么这些字符是可被回收的
  • 字符串常量池的实现:
    • 原理:字符串常量池是通过一个固定大小的哈希表实现的。表中每个存储单元(bucket)都存有一系列具有同哈希值的字符串
    • 大小:
      • 默认值 1009 个 buckets,在Java7u40中它增长为60013(在Java8中也是同样的值)
      • 在Java 6u30, Java6u41 和 Java7u20 以及以后的版本中是可配置的,需要使用参数 XX:StringTableSize=N
      • 分配的值应该为质数,以获取更高性能
      请在Java 7 和 8 中使用 -XX:StringTableSize JVM参数以设置字符串常量池表大小。它大小固定,因为它是通过在存储单元存储了链表的哈希表实现的。为你程序中的不同字符串的数量进行预估(也就是那些你想缓存的字符串),并将池大小设为约等于此值的素数并乘上2(以减少可能发生的冲突)。这会让String.intern运行在一个常数时间内,并且每个缓存字符串所需内存会很小(在同任务量下,显式使用Java WeakHashMap会产生4-5倍多的内存开销)。
    • 手动实现(与JVM的字符串常量池原理相同)
      • 应该避免在程序中使用自己实现的字符串常量池,否则会有更高的内存开销
private static final WeakHashMap> s_manualCache =
    new WeakHashMap>( 100000 );

private static String manualIntern( final String str )
    final WeakReference cached = s_manualCache.get( str );
    if ( cached != null )
        final String value = cached.get();
        if ( value != null )
            return value;
    s_manualCache.put( str, new WeakReference( str ) );
    return str;


String pooling (字符串常量池化)

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.


At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in and

很多标准都禁止在Java 6 中使用String.intern()方法,因为如果字符串常量池化失去控制,那么就很可能发生OOM。Oracle Java 7 对字符串常量池化的实现进行了大幅修改。你可以在这两个网站上查找到修改的具体细节: 和

String.intern() in Java 6 (Java 6 中的String.intern())

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

之前所有缓存的字符串都存储在永久代区 —— 堆中一个固定大小的区域,用于存储已加载的类和字符串常量池。除了那些被显式缓存的字符串,永久代的字符串常量池也保留了所有先前在程序中用过的字符串字面量(关键在于“用过的”—— 如果一个类或方法从未被加载或调用,那么任何在其中定义的常量都不会被载入)。

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

Java 6 中这样的常量池的最大问题就是它所处的位置——永久代区。永久代大小固定,且无法再运行时中动态改变。你可以使用 XX:MaxPermSize=N 选项来设置它的大小。据我所知,永久代区大小从32M到96M不等,具体数值取决于平台。你可增加永久代区的大小,但是增加后其大小依旧固定。这样的限制要求人们更加谨慎的使用String.intern()方法——你最好不要用此方法来缓存任何不可控的用户输入。这也是为何当初在Java6中大部分字符串常量池化都是通过手动创建map来实现的原因。

String.intern() in Java 7 (Java 7 中的String.intern())

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.


String pool values are garbage collected (字符串常量池中的数据是可以被垃圾回收的)

Yes, all strings in the JVM string pool are eligible for garbage collection if there are no references to them from your program roots. It applies to all discussed versions of Java. It means that if your interned string went out of scope and there are no other references to it – it will be garbage collected from the JVM string pool.


Being eligible for garbage collection and residing in the heap, a JVM string pool seems to be a right place for all your strings, isn’t it? In theory it is true – non-used strings will be garbage collected from the pool, used strings will allow you to save memory in case then you get an equal string from the input. Seems to be a perfect memory saving strategy? Nearly so. You must know how the string pool is implemented before making any decisions.


JVM string pool implementation in Java 6, 7 and 8(虚拟机字符串常量池在Java 6,7,8中的实现)

The string pool is implemented as a fixed capacity hash map with each bucket containing a list of strings with the same hash code. Some implementation details could be obtained from the following Java bug report:


The default pool size is 1009 (it is present in the source code of the above mentioned bug report, increased in Java7u40). It was a constant in the early versions of Java 6 and became configurable between Java6u30 and Java6u41. It is configurable in Java 7 from the beginning (at least it is configurable in Java7u02). You need to specify -XX:StringTableSize=N, where N is the string pool map size. Ensure it is a prime number for the better performance.

默认池大小为1009(这个数字出现在上面提到的bug报告中的源码里)。在Java6的早期版本里,它是个常量。但在Java6u30和Java6u41两个版本中,它变成了可被配置的数值。在Java7一开始的版本里,这个值就是可配置的(至少在Java7u20中是可配置的)。你只需指定 XX:StringTableSize=N即可。这个N就是常量池的大小。确保数值是素数以获取更高性能。

This parameter will not help you a lot in Java 6, because you are still limited by a fixed size PermGen size. The further discussion will exclude Java 6.


Java7 (until Java7u40)

In Java 7, on the other hand, you are limited only by a much higher heap size. It means that you can set the string pool size to a rather high value in advance (this value depends on your application requirements). As a rule, one starts worrying about the memory consumption when the memory data set size grows to at least several hundred megabytes. In this situation, allocating 8-16 MB for a string pool with one million entries seems to be a reasonable trade off (do not use 1,000,000 as a -XX:StringTableSize value – it is not prime; use 1,000,003 instead).

换言之,在Java7中,你仅仅被一个更大堆空间所限制着。也就是说,你可以提前将字符串常量池设置为一个更高的值(这个值依据于你程序的需求)。通常,在内存数据增长了至少几百兆时,人们都会开始担心起内存开销。在这种情况下,为常量池分配 8- 16MB的,可存储一百万实体的空间是个不错的权衡(不要使用1,000,000作为 -XX:StringTableSize 的值——这不是质数;你应该转而使用1,000,003)。

You may expect a uniform distribution of interned strings in the buckets – read my experiments in the hashCode method performance tuning article.


You must set a higher -XX:StringTableSize value (compared to the default 1009) if you intend to actively use String.intern() – otherwise this method performance will soon degrade to a linked list performance.

如果你想要频繁使用String.intern()方法,你一定要为 -XX:StringTableSize设置一个更高的值(与默认值1009比)—— 否则此方法的性能将很快锐减为同链表一样的性能。

I have not noticed a dependency from a string length to a time to intern a string for string lengths under 100 characters (I feel that duplicates of even 50 character long strings are rather unlikely in the real world data, so 100 chars seems to be a good test limit for me).


Here is an extract from the test application log with the default pool size: time to intern 10.000 strings (second number) after a given number of strings was already interned (first number); Integer.toString( i ), where i between 0 and 999,999 were interned:


0; time = 0.0 sec
50000; time = 0.03 sec
100000; time = 0.073 sec
150000; time = 0.13 sec
200000; time = 0.196 sec
250000; time = 0.279 sec
300000; time = 0.376 sec
350000; time = 0.471 sec
400000; time = 0.574 sec
450000; time = 0.666 sec
500000; time = 0.755 sec
550000; time = 0.854 sec
600000; time = 0.916 sec
650000; time = 1.006 sec
700000; time = 1.095 sec
750000; time = 1.273 sec
800000; time = 1.248 sec
850000; time = 1.446 sec
900000; time = 1.585 sec
950000; time = 1.635 sec
1000000; time = 1.913 sec

These test results were obtained on Core [email protected] CPU. As you can see, they grow linearly and I was able to intern only approximately 5,000 strings per second when the JVM string pool size contained one million strings. It is unacceptably slow for most of applications having to handle a large amount of data in memory.

这些测试结果是基于Core [email protected] CPU得到的。如你所见,时间呈线性增长。当JVM字符串常量池持有一百万个字符串时,我每秒仅能大约缓存5000个字符串。这对于大部分需要在内存中处理大量数据的程序而言是不可接受的速度。

Now the same test results with -XX:StringTableSize=100003 option:

现在使用 -XX:StringTableSize=100003 选项进行同样的测试:

50000; time = 0.017 sec
100000; time = 0.009 sec
150000; time = 0.01 sec
200000; time = 0.009 sec
250000; time = 0.007 sec
300000; time = 0.008 sec
350000; time = 0.009 sec
400000; time = 0.009 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.011 sec
600000; time = 0.012 sec
650000; time = 0.015 sec
700000; time = 0.015 sec
750000; time = 0.01 sec
800000; time = 0.01 sec
850000; time = 0.011 sec
900000; time = 0.011 sec
950000; time = 0.012 sec
1000000; time = 0.012 sec

As you can see, in this situation it takes nearly constant time to insert strings in the pool (there is no more than 10 strings in the bucket on average). Here are results with the same settings, but now we will insert up to 10 million strings in the pool (which means 100 strings in the bucket on average)


2000000; time = 0.024 sec
3000000; time = 0.028 sec
4000000; time = 0.053 sec
5000000; time = 0.051 sec
6000000; time = 0.034 sec
7000000; time = 0.041 sec
8000000; time = 0.089 sec
9000000; time = 0.111 sec
10000000; time = 0.123 sec

Now let’s increase the pool size to one million buckets: (1,000,003 to be precise):


1000000; time = 0.005 sec
2000000; time = 0.005 sec
3000000; time = 0.005 sec
4000000; time = 0.004 sec
5000000; time = 0.004 sec
6000000; time = 0.009 sec
7000000; time = 0.01 sec
8000000; time = 0.009 sec
9000000; time = 0.009 sec
10000000; time = 0.009 sec

As you can see, times are flat and do not look much different from “zero to one million” table for the ten times small string pool. Even my slow laptop can add one million new strings to the JVM string pool per second provided that the pool size is high enough.


Shall we still use manual string pools? (我们还要使用自己的实现的池嘛?)

Now we need to compare this JVM string pool with a WeakHashMap

private static final WeakHashMap> s_manualCache =
    new WeakHashMap>( 100000 );

private static String manualIntern( final String str )
    final WeakReference cached = s_manualCache.get( str );
    if ( cached != null )
        final String value = cached.get();
        if ( value != null )
            return value;
    s_manualCache.put( str, new WeakReference( str ) );
    return str;

This is the output for the same test using this manual pool:


0; manual time = 0.001 sec
50000; manual time = 0.03 sec
100000; manual time = 0.034 sec
150000; manual time = 0.008 sec
200000; manual time = 0.019 sec
250000; manual time = 0.011 sec
300000; manual time = 0.011 sec
350000; manual time = 0.008 sec
400000; manual time = 0.027 sec
450000; manual time = 0.008 sec
500000; manual time = 0.009 sec
550000; manual time = 0.008 sec
600000; manual time = 0.008 sec
650000; manual time = 0.008 sec
700000; manual time = 0.008 sec
750000; manual time = 0.011 sec
800000; manual time = 0.007 sec
850000; manual time = 0.008 sec
900000; manual time = 0.008 sec
950000; manual time = 0.008 sec
1000000; manual time = 0.008 sec

Manually written pool has provided comparable performance when JVM has sufficient memory. Unfortunately, for my test case (interning String.valueOf(0 < N < 1,000,000,000) ) of very short strings to intern, it allowed me to keep only ~2.5M such strings with -Xmx1280M. JVM string pool (size=1,000,003), on the other hand, provided the same flat performance characteristics until JVM ran out of memory with 12,72M strings in the pool (5 times more). As I think, it is a valuable hint to get rid of manual string pooling in your programs.

当手动实现的池具有足够大小时,它就可以和JVM池的表现相提并论。不幸的是,对于我的这个缓存超短字符串的测试用例(缓存 String.valueOf(0 < N < 1,000,000,000)),在使用 -Xmx1280M 选项下,JVM仅仅允许我保留约250万的字符串。换言之,直到JVM因存储1272万(5倍多)个字符串而内存溢出时,才可以达到JVM字符串常量池(大小=1,000,003)同样流畅的性能。因此我认为,避免在程序中使用手动实现的字符串常量池是明智的。

String.intern() in Java 7u40+ and Java 8 (在Java7u40+ 和 Java8中的String.intern())

String pool size was increased in Java7u40 (this was a major performance update) to 60013. This value allows you to have approximately 30.000 distinct strings in the pool before your start experiencing collisions. Generally, this is sufficient for data which actually worth to intern. You can obtain this value using -XX:+PrintFlagsFinal JVM parameter.

字符串常量池在Java7u40版本中增长为60013(这是一个主要的性能提升)。这个数字允许你向池中缓存约30万个不同字符串同时不遇到冲突。总的来说,这对于真正值得你缓存的数据来说空间足够大了。你可以通过 -XX:+PrintFlagsFinal 虚拟机参数获取这个值。

I have tried to run the same tests on the original release of Java 8. Java 8 still accepts -XX:StringTableSize parameter and provides the comparable to Java 7 performance. The only important difference is that the default pool size was increased in Java 8 to 60013:

我尝试过在Java8原版中进行同样的测试。Java8依旧支持 -XX:StringTableSize 参数且提供了与Java7 同样的性能表现。只是唯一的不同在于默认池大小增长为60013了:

50000; time = 0.019 sec
100000; time = 0.009 sec
150000; time = 0.009 sec
200000; time = 0.009 sec
250000; time = 0.009 sec
300000; time = 0.009 sec
350000; time = 0.011 sec
400000; time = 0.012 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.013 sec
600000; time = 0.014 sec
650000; time = 0.018 sec
700000; time = 0.015 sec
750000; time = 0.029 sec
800000; time = 0.018 sec
850000; time = 0.02 sec
900000; time = 0.017 sec
950000; time = 0.018 sec
1000000; time = 0.021 sec

Test Code 测试代码

Test code for this article is rather simple: a method creates and interns new strings in a loop. We also measure time it took to intern the current 10.000 strings. It worth to run this program with -verbose:gc JVM parameter to see when and what garbage collections will happen. You may also want to specify the maximal heap size using -Xmx parameter.

本文的测试代码相当简单:一个在循环创建并缓存字符串的方法。我们同时计算了它缓存当前10,000个字符串的耗时。运行此程序时,十分提倡使用 -verbose:gc 这个虚拟机参数,以便查看GC何时何地发生。你也可能想通过使用 -Xmx参数来指定最大堆空间。

There are 2 tests: testStringPoolGarbageCollection will show you that a JVM string pool is actually garbage collected - check the garbage collection log messages as well as time it took to intern the strings on the second pass. This test will fail on Java 6 default PermGen size, so either update it, or update the test method argument, or use Java 7.

这里有2个测试:testStringPoolGarbageCollection 测试将会证明JVM字符串常量池真的可以被垃圾回收 —— 查看垃圾回收日志并在随后查看缓存字符串的耗时。这个测试在Java6中默认的永久代区大小中会失败。因此要么更新大小,要么更新测试方法参数,要么使用Java7。

Second test will show you how many interned strings could be stored in memory. Run it on Java 6 with 2 different memory settings - for example -Xmx128M and -Xmx1280M (10 times more). Most likely you will see that it will not affect the number of strings you can put in the pool. On the other hand, in Java 7 you will be able to fill the whole heap with your strings.

第二个测试将会向你展示内存中可缓存多少字符串。请在Java6中通过两个不同的内存设定运行此测试。例如 -Xmx128M 和 -Xmx1280M(十倍)。你很可能会发现这并不会影响可在池中缓存字符串的数目。换言之,在Java7中,你可以用你的字符串填满整个堆。

 * Testing String.intern.
 * Run this class at least with -verbose:gc JVM parameter.
public class InternTest {
    public static void main( String[] args ) {

     * Use this method to see where interned strings are stored
     * and how many of them can you fit for the given heap size.
    private static void testLongLoop()
        test( 1000 * 1000 * 1000 );
        //uncomment the following line to see the hand-written cache performance
        //testManual( 1000 * 1000 * 1000 );

     * Use this method to check that not used interned strings are garbage collected.
    private static void testStringPoolGarbageCollection()
        //first method call - use it as a reference
        test( 1000 * 1000 );
        //we are going to clean the cache here.
        //check the memory consumption and how long does it take to intern strings
        //in the second method call.
        test( 1000 * 1000 );

    private static void test( final int cnt )
        final List lst = new ArrayList( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
            final String str = "Very long test string, which tells you about something " +
            "very-very important, definitely deserving to be interned #" + i;
//uncomment the following line to test dependency from string length
//            final String str = Integer.toString( i );
            lst.add( str.intern() );
            if ( i % 10000 == 0 )
                System.out.println( i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
        System.out.println( "Total length = " + lst.size() );

    private static final WeakHashMap> s_manualCache =
        new WeakHashMap>( 100000 );

    private static String manualIntern( final String str )
        final WeakReference cached = s_manualCache.get( str );
        if ( cached != null )
            final String value = cached.get();
            if ( value != null )
                return value;
        s_manualCache.put( str, new WeakReference( str ) );
        return str;

    private static void testManual( final int cnt )
        final List lst = new ArrayList( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
            final String str = "Very long test string, which tells you about something " +
                "very-very important, definitely deserving to be interned #" + i;
            lst.add( manualIntern( str ) );
            if ( i % 10000 == 0 )
                System.out.println( i + "; manual time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
        System.out.println( "Total length = " + lst.size() );

Summary (总结)

  • Stay away from String.intern() method on Java 6 due to a fixed size memory area (PermGen) used for JVM string pool storage.
  • Java 7 and 8 implement the string pool in the heap memory. It means that you are limited by the whole application memory for string pooling in Java 7 and 8.
  • Use -XX:StringTableSize JVM parameter in Java 7 and 8 to set the string pool map size. It is fixed, because it is implemented as a hash map with lists in the buckets. Approximate the number of distinct strings in your application (which you intend to intern) and set the pool size equal to some prime number close to this value multiplied by 2 (to reduce the likelihood of collisions). It will allow String.intern to run in the constant time and requires a rather small memory consumption per interned string (explicitly used Java WeakHashMap will consume 4-5 times more memory for the same task).
  • The default value of -XX:StringTableSize parameter is 1009 in Java 6 and Java 7 until Java7u40. It was increased to 60013 in Java 7u40 (same value is used in Java 8 as well).
  • If you are not sure about the string pool usage, try -XX:+PrintStringTableStatistics JVM argument. It will print you the string pool usage when your program terminates.
  • 不要在Java6中使用String.intern方法,由于Java6对JVM字符串常量池的存储是在一个固定内存区域(永久代区)进行的。
  • Java 7 和 8 将池实现于堆内存中。也就是说在Java 7和8中,你是由程序总内存大小所限制的
  • 请在Java 7 和 8 中使用 -XX:StringTableSize JVM参数以设置字符串常量池表大小。它大小固定,因为它是通过在存储单元存储了链表的哈希表实现的。为你程序中的不同字符串的数量进行预估(也就是那些你想缓存的字符串),并将池大小设为约等于此值的素数并乘上2(以减少可能发生的冲突)。这会让String.intern运行在一个常数时间内,并且每个缓存字符串所需内存会很小(在同任务量下,显式使用Java WeakHashMap会产生4-5倍多的内存开销)。
  • 在Java6以及Java 7 直到 Java7u40前,-XX:StringTableSize 参数默认值是1009。在Java7u40中它增长为60013(在Java8中也是同样的值)。
  • 如果你不确定字符串常量池的使用情况,尝试使用 -XX:+PrintStringTableStatics 虚拟机参数。它将会在你程序结束时打印出你的字符串常量池的使用情况。
