我们来看一段简单的代码和反编译字节码
public class test { public static void main(String[] args) { String test = "test"; } }
Constant pool: #1 = Class #2 // test #2 = Utf8 test #3 = Class #4 // java/lang/Object #4 = Utf8 java/lang/Object #5 = Utf8 <init> #6 = Utf8 ()V #7 = Utf8 Code #8 = Methodref #3.#9 // java/lang/Object."<init>":()V #9 = NameAndType #5:#6 // "<init>":()V #10 = Utf8 LineNumberTable #11 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 Ltest; #14 = Utf8 main #15 = Utf8 ([Ljava/lang/String;)V #16 = String #2 // test #17 = Utf8 args #18 = Utf8 [Ljava/lang/String; #19 = Utf8 Ljava/lang/String; #20 = Utf8 SourceFile #21 = Utf8 test.java { public test(); flags: ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #8 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 2: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Ltest; public static void main(java.lang.String[]); flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=2, args_size=1 0: ldc #16 // String test 2: astore_1 3: return LineNumberTable: line 6: 0 line 15: 3 LocalVariableTable: Start Length Slot Name Signature 0 4 0 args [Ljava/lang/String; 3 1 1 test Ljava/lang/String; }
#16 = String #2
#2 =utf8 test
当这个字符串常量内容test在类初始化的时候是以符号链接Symbol存放,存放的是UTF-8编码的c里的char数组,存放的索引是在#16而不是#2,这在类的初始化的时候已经直接关联好了。
对于 String test="test" 代码所对应的调用指令
0: ldc #16
2: astore_1
可以看到一个语句拆成了2个部分,一个是ldc #16 和保存引用到参数test那我们来看看ldc指令是如何执行的,在interpreterRuntime.cpp中我们看到了ldc的执行
IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide)) // access constant pool constantPoolOop pool = method(thread)->constants(); int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) : get_index_u1(thread, Bytecodes::_ldc); constantTag tag = pool->tag_at(index); if (tag.is_unresolved_klass() || tag.is_klass()) { klassOop klass = pool->klass_at(index, CHECK); oop java_class = klass->java_mirror(); thread->set_vm_result(java_class); } else { #ifdef ASSERT // If we entered this runtime routine, we believed the tag contained // an unresolved string, an unresolved class or a resolved class. // However, another thread could have resolved the unresolved string // or class by the time we go there. assert(tag.is_unresolved_string()|| tag.is_string(), "expected string"); #endif oop s_oop = pool->string_at(index, CHECK); thread->set_vm_result(s_oop); } IRT_END
oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) { oop str = NULL; CPSlot entry = this_oop->slot_at(which); if (entry.is_metadata()) { ObjectLocker ol(this_oop, THREAD); if (this_oop->tag_at(which).is_unresolved_string()) { // Intern string Symbol* sym = this_oop->unresolved_string_at(which); str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL))); this_oop->string_at_put(which, str); } else { // Another thread beat us and interned string, read string from constant pool str = this_oop->resolved_string_at(which); } } else { str = entry.get_oop(); } assert(java_lang_String::is_instance(str), "must be string"); return str; }
StringTable不是常量池
StringTable存放的是string的cache table, 用于存放字符串常量的引用的表,避免产生新的string的开销。 StringTable数据结构是我们常用的java中的hashtable, 先计算字符串的hashcode,根据hashcode到对应的数组,然后遍历里面的链表结构比较字符串里的每个字符,直到找到相同的。当数据比较多的时候,会导致查找效率变慢,java会在进入safepoint点的时候判断是否需要做一次rehash,就是扩大数组的容量来提高查找的效率。 |
在调用ldc指令后,会把symbol 的c++ char 数组转化成新的unicode的java char 数组,并生成新的string的引用,将这个引用保存到StringTable中,当然同时这个引用也保存到了常量池中。
String.intern()的方法原理是通过找到字符串所在Stringtable里保存的引用,代码如下
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str)) JVMWrapper("JVM_InternString"); JvmtiVMObjectAllocEventCollector oam; if (str == NULL) return NULL; oop string = JNIHandles::resolve_non_null(str); oop result = StringTable::intern(string, CHECK_NULL); return (jstring) JNIHandles::make_local(env, result); JVM_END我们又看到了熟悉的StringTable:intern 的方法,而在这里和ldc有点不同的是,这时候引用已经存在,如果StringTable里不存在这个字符串的时候,会直接将该String的引用存放到StringTable中。
前段时间看到有博客提到了这句话“使用String.intern()方法则可以将一个String类的保存到一个全局String表中,如果具有相同值的Unicode字符串已经在这个表中,那么该方法返回表中已有字符串的地址,如果在表中没有相同值的字符串,则将自己的地址注册到表中”,博客中阐述这个解释是错误的,同时举了例子
String s1=new String("kvill"); System.out.println( s1==s1.intern() );
但实际上这个例子举的是错误的。我们来看字节码。
0: new #16 // class java/lang/String 3: dup 4: ldc #18 // String kvill 6: invokespecial #20 // Method java/lang/String."<init>":(Ljava/lang/String;)V 9: astore_1
String test="kvill"; String s1 = new String(test);这样的代码就可以比较清楚的看到这已经是完全两个不同的String对象了
而要证明原话是否正确,只要将程序改成
char[] test={'k','v','i','l','l'}; String s1=new String(test); System.out.println(s1==s1.intern());我们可以清楚的看到返回的结果是true,也就是说Stringtable里保存的是s1这个引用。