关于String的intern方法的讨论

引子

今天,同学发来一个关于String的程序段,让我判断,先给出代码,大家可先想想执行结果。

 

public class InternTest {

	public static void main(String[] args) {
	    String s = new String("1");
	    s.intern();
	    String s2 = "1";
	    System.out.println(s == s2);
	    System.out.println(s.equals(s2));
	    
	    String s3 = new String("1") + new String("1");
	    s3.intern();
	    String s4 = "11";
	    System.out.println(s3 == s4);
	    System.out.println(s3.equals(s4));
	}

}

 

何为String

在《java8语言规范中》中String类型的说明如下:
1、Instances of class String represent sequences of Unicode code points(String类的实例表示Unicode字符序列)
2、A String object has a constant (unchanging) value (一个String对象有一个不可变的常量值)。
3、String literals are references to instances of class String(String字段时String类实例的引用)。
4、The string concatenation operator + implicitly creates a new String object when the result is not a constant expression(String的合并操作“+”会隐式的生成一个新的String对象)。
有了语言规范的定义,我们大概清楚了String使用的限制。一般,String变量的定义方式有3种:
1、使用关键字 new,如:String str = new String("spring");
2、直接定义,如 String str = “spring";

3、连接生成,如 String str = "spr"+new String("ing");

== & equals

我们知道java中使用 == 和 equals来比较两个对象。equals最初是在Object对象中实现的。

 

 public boolean equals(Object obj) {
        return (this == obj);  //这里 equals与==是等价的
    }
但是一般我们定义类的时候,会重载Object的hashCode与equals方法。String也不例外,重载后equals表示String的内容组成是否相等。

 

 

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }
总结下来,==一般用来比较java虚拟机栈中的对象(虚拟机栈中保存基本类型和引用类型的引用)是否相等,而equals表示堆中的内容是否相等。

 

 

String的intern()方法

 public native String intern();它返回一个字符串对象的标准表示形式。字符串池最初是空的,由String类私有并维护。调用该方法,如果池中包含一个字符串,有equals(Object)判断,等于该字符串对象 则返回池中的字符串。否则,该字符串对象将添加到池中,并返回该字符串对象的引用。
因此, 对于任意两个字符串S和T,S intern() = = T intern()是真的当且仅当s.equals(t)是真的
String的intern()方法时一个本地方法。通过JNI调用底层的C++动态库,其实现源代码如下

 

 

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
  JVMWrapper("JVM_InternString");
  JvmtiVMObjectAllocEventCollector oam;
  if (str == NULL) return NULL;
  oop string = JNIHandles::resolve_non_null(str);
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);
JVM_END
再继续看看StringTable::intern(String,CHECK_NULL)
oop StringTable::intern(oop string, TRAPS)
{
  if (string == NULL) return NULL;
  ResourceMark rm(THREAD);
  int length;
  Handle h_string (THREAD, string);
  jchar* chars = java_lang_String::as_unicode_string(string, length);
  oop result = intern(h_string, chars, length, CHECK_NULL);
  return result;
}

 

oop StringTable::intern(Handle string_or_null, jchar* name,
                        int len, TRAPS) {
  unsigned int hashValue = hash_string(name, len);
  int index = the_table()->hash_to_index(hashValue);
  oop found_string = the_table()->lookup(index, name, len, hashValue); //调用lookup()方法

  // Found
  if (found_string != NULL) return found_string;

  debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
  assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
         "proposed name of symbol must be stable");

  Handle string;
  // try to reuse the string if possible
  if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
    string = string_or_null;
  } else {
    string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
  }

  // Grab the StringTable_lock before getting the_table() because it could
  // change at safepoint.
  MutexLocker ml(StringTable_lock, THREAD);

  // Otherwise, add to symbol to table
  return the_table()->basic_add(index, string, name, len,
                                hashValue, CHECK_NULL);
}
Symbol* SymbolTable::lookup(int index, const char* name,
                              int len, unsigned int hash) {
  int count = 0;
  for (HashtableEntry* e = bucket(index); e != NULL; e = e->next()) {
    count++;  // count all entries in this bucket, not just ones with same hash
    if (e->hash() == hash) {
      Symbol* sym = e->literal();
      if (sym->equals(name, len)) {     //如上所述,用equals方式比较
        // something is referencing this symbol now.
        sym->increment_refcount();
        return sym;
      }
    }
  }
  // If the bucket size is too deep check if this hash code is insufficient.
  if (count >= BasicHashtable::rehash_count && !needs_rehashing()) {
    _needs_rehashing = check_rehash_table(count);
  }
  return NULL;
}

 

下面是StringTable的数据结构,注意,StringTable并非常量池。

 

class StringTable : public Hashtable {
  friend class VMStructs;
private:
  // The string table
  static StringTable* _the_table;
  // Set if one bucket is out of balance due to hash algorithm deficiency
  static bool _needs_rehashing;
  // Claimed high water mark for parallel chunked scanning
  static volatile int _parallel_claimed_idx;

  static oop intern(Handle string_or_null, jchar* chars, int length, TRAPS);
  oop basic_add(int index, Handle string_or_null, jchar* name, int len,
                unsigned int hashValue, TRAPS);

  oop lookup(int index, jchar* chars, int length, unsigned int hashValue);

  // Apply the give oop closure to the entries to the buckets
  // in the range [start_idx, end_idx).
  static void buckets_do(OopClosure* f, int start_idx, int end_idx);

  StringTable() : Hashtable((int)StringTableSize,
                              sizeof (HashtableEntry)) {} ....}

 

StringTable数据结构是我们常用的java中的hashtable, 先计算字符串的hashcode,根据hashcode到对应的数组,然后遍历里面的链表结构比较字符串里的每个字符,直到找到相同的。当数据比较多的时候,会导致查找效率变慢,java会在进入safepoint点的时候判断是否需要做一次rehash,就是扩大数组的容量来提高查找的效率。

引子的具体分析

1、命令行切换到类所在目录,编译程序:javac InternTest.java
2、分析编译后的字节码:javap -verbose InternTest
首先是常量池:
public class InternTest
  SourceFile: "InternTest.java"
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #16.#29        //  java/lang/Object."":()V
   #2 = Class              #30            //  java/lang/String
   #3 = String             #31            //  1
   #4 = Methodref          #2.#32         //  java/lang/String."":(Ljava/l
ang/String;)V
   #5 = Methodref          #2.#33         //  java/lang/String.intern:()Ljava/la
ng/String;
   #6 = Fieldref           #34.#35        //  java/lang/System.out:Ljava/io/Prin
tStream;
   #7 = Methodref          #36.#37        //  java/io/PrintStream.println:(Z)V
   #8 = Methodref          #2.#38         //  java/lang/String.equals:(Ljava/lan
g/Object;)Z
   #9 = Methodref          #36.#39        //  java/io/PrintStream.println:()V
  #10 = Class              #40            //  java/lang/StringBuilder
  #11 = Methodref          #10.#29        //  java/lang/StringBuilder."":(
)V
  #12 = Methodref          #10.#41        //  java/lang/StringBuilder.append:(Lj
ava/lang/String;)Ljava/lang/StringBuilder;
  #13 = Methodref          #10.#42        //  java/lang/StringBuilder.toString:(
)Ljava/lang/String;
  #14 = String             #43            //  11
  #15 = Class              #44            //  InternTest
  #16 = Class              #45            //  java/lang/Object
  #17 = Utf8               
  #18 = Utf8               ()V
  #19 = Utf8               Code
  #20 = Utf8               LineNumberTable
  #21 = Utf8               main
  #22 = Utf8               ([Ljava/lang/String;)V
  #23 = Utf8               StackMapTable
  #24 = Class              #46            //  "[Ljava/lang/String;"
  #25 = Class              #30            //  java/lang/String
  #26 = Class              #47            //  java/io/PrintStream
  #27 = Utf8               SourceFile
  #28 = Utf8               InternTest.java
  #29 = NameAndType        #17:#18        //  "":()V
  #30 = Utf8               java/lang/String
  #31 = Utf8               1
  #32 = NameAndType        #17:#48        //  "":(Ljava/lang/String;)V
  #33 = NameAndType        #49:#50        //  intern:()Ljava/lang/String;
  #34 = Class              #51            //  java/lang/System
  #35 = NameAndType        #52:#53        //  out:Ljava/io/PrintStream;
  #36 = Class              #47            //  java/io/PrintStream
  #37 = NameAndType        #54:#55        //  println:(Z)V
  #38 = NameAndType        #56:#57        //  equals:(Ljava/lang/Object;)Z
  #39 = NameAndType        #54:#18        //  println:()V
  #40 = Utf8               java/lang/StringBuilder
  #41 = NameAndType        #58:#59        //  append:(Ljava/lang/String;)Ljava/l
ang/StringBuilder;
  #42 = NameAndType        #60:#50        //  toString:()Ljava/lang/String;
  #43 = Utf8               11
  #44 = Utf8               InternTest
  #45 = Utf8               java/lang/Object
  #46 = Utf8               [Ljava/lang/String;
  #47 = Utf8               java/io/PrintStream
  #48 = Utf8               (Ljava/lang/String;)V
  #49 = Utf8               intern
  #50 = Utf8               ()Ljava/lang/String;
  #51 = Utf8               java/lang/System
  #52 = Utf8               out
  #53 = Utf8               Ljava/io/PrintStream;
  #54 = Utf8               println
  #55 = Utf8               (Z)V
  #56 = Utf8               equals
  #57 = Utf8               (Ljava/lang/Object;)Z
  #58 = Utf8               append
  #59 = Utf8               (Ljava/lang/String;)Ljava/lang/StringBuilder;
  #60 = Utf8               toString
再看看我们的main方法
深度为4的操作数栈,局部变量Slot个数为5,一个输入参数
         0: new           #2                  // class java/lang/String
         3: dup                               //复制栈顶数值 并将 复制值压入栈顶
         4: ldc           #3                  // String 1
         6: invokespecial #4                  // Method java/lang/String."   
":(Ljava/lang/String;)V                       //创建String  s对象    
         9: astore_1                         // 将String 1的引用 保存到 slot 1中,即s变量。
        10: aload_1
        11: invokevirtual #5                  // Method java/lang/String.intern:()Ljava/lang/String;
        14: pop
        15: ldc           #3                  // String 1
        17: astore_2                          
        18: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
        21: aload_1
        22: aload_2
        23: if_acmpne     30
        26: iconst_1
        27: goto          31
        30: iconst_0
        31: invokevirtual #7                  // Method java/io/PrintStream.println:(Z)V
        34: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
        37: aload_1
        38: aload_2
        39: invokevirtual #8                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
        42: invokevirtual #7                  // Method java/io/PrintStream.println:(Z)V
        45: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
        48: invokevirtual #9                  // Method java/io/PrintStream.println:()V
        51: new           #10                 // class java/lang/StringBuilder
        54: dup
        55: invokespecial #11                 // Method java/lang/StringBuilder."":()V
        58: new           #2                  // class java/lang/String
        61: dup
        62: ldc           #3                  // String 1
        64: invokespecial #4                  // Method java/lang/String."":(Ljava/lang/String;)V
        67: invokevirtual #12                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        70: new           #2                  // class java/lang/String
        73: dup
        74: ldc           #3                  // String 1
        76: invokespecial #4                  // Method java/lang/String."":(Ljava/lang/String;)V
        79: invokevirtual #12                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        82: invokevirtual #13                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
        85: astore_3
        86: aload_3
        87: invokevirtual #5                  // Method java/lang/String.intern:()Ljava/lang/String;
        90: pop
        91: ldc           #14                 // String 11
        93: astore        4
        95: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
        98: aload_3
        99: aload         4
       101: if_acmpne     108
       104: iconst_1
       105: goto          109
       108: iconst_0
       109: invokevirtual #7                  // Method java/io/PrintStream.println:(Z)V
       112: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
       115: aload_3
       116: aload         4
       118: invokevirtual #8                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
       121: invokevirtual #7                  // Method java/io/PrintStream.println:(Z)V
       124: return
其中 String s2 = "1";的代码对应字节码 为 ldc #3 ,astore_2 。其中ldc表示将 int.float或者String类型从常量池中推到 操作数栈顶。 在interpreterRuntime.cpp中我们看到了ldc的执行
IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide))
  // access constant pool
  constantPoolOop pool = method(thread)->constants();
  int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) : get_index_u1(thread, Bytecodes::_ldc);
  constantTag tag = pool->tag_at(index);

  if (tag.is_unresolved_klass() || tag.is_klass()) {
    klassOop klass = pool->klass_at(index, CHECK);
    oop java_class = klass->java_mirror();
    thread->set_vm_result(java_class);
  } else {
#ifdef ASSERT
    // If we entered this runtime routine, we believed the tag contained
    // an unresolved string, an unresolved class or a resolved class.
    // However, another thread could have resolved the unresolved string
    // or class by the time we go there.
    assert(tag.is_unresolved_string()|| tag.is_string(), "expected string");
#endif
    oop s_oop = pool->string_at(index, CHECK);
    thread->set_vm_result(s_oop);
  }
IRT_END
因为这是个字符串常量,代码调用了pool->string_at(index, CHECK) ,最后代码调用了string_at_impl方法
 
oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) {
  oop str = NULL;
  CPSlot entry = this_oop->slot_at(which);
  if (entry.is_metadata()) {
    ObjectLocker ol(this_oop, THREAD);
    if (this_oop->tag_at(which).is_unresolved_string()) {
      // Intern string
      Symbol* sym = this_oop->unresolved_string_at(which);
      str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL)));
      this_oop->string_at_put(which, str);
    } else {
      // Another thread beat us and interned string, read string from constant pool
      str = this_oop->resolved_string_at(which);
    }
  } else {
    str = entry.get_oop();
  }
  assert(java_lang_String::is_instance(str), "must be string");
  return str;
}
在代码中,我们可以看到在没有调用ldc 之前,字符串常量值是用symbol 来表示的,而当调用ldc之后,通过调用StringTable::intern产生了String的引用,并且存放在常量池中。如果再调用ldc指令的话,直接从常量池根据索引取出String的引用(this_oop->resolved_string_at(which)),而避免再次从StringTable中去查找一次。
以此方法来分析。
1、堆中new一个String变量,s持有其堆中引用,并且会在常量池中生成一个”1“对象。
2、调用s.intern()方法,最终调用StringTable.intern(),试图将变量s的引用加入到常量池中,发现其已经存在。
3、s2="1",查找常量池中是否有”1“,有,则返回常量池中”1“的引用 保存在 s2中。
4、所以 s==s2 结果为false。
5、s3 = new String("1")+new String("1"); 首先 会在堆中生成String对象 并在常量池中生成”1“。我们知道jvm会使用StringBuilder来优化使用”+“的字符串生成。语句执行完成后,堆中有 String "11"的对象,而常量池中并没有。
6、s3.intern()将其加入常量池,jdk7开始,不再复制常量值,与堆栈中的s3相同,常量池中保存s3在堆中的引用。
7、s4 = "11",调用ldc命令,查询常量池,存在,直接返回其引用。所以 s3==s4.
 
大致就是这样子,后来搜索了一下,发现同学也是在一篇博客中看到的, 深入解析String#intern,讲解的很细致,推荐大家看看,本篇对其也有参考,另外,参考了  Java (JDK7)中的String常量和String.intern的实现,因为String的intern()方法使用hashTable,故数据量比较大的时候会出现较多的哈希冲突,链接法效率较低,所以会经常出现性能问题,这方面暂不讨论,上述博客有分析到,大家自己去看看并探索吧。

 

你可能感兴趣的:(java语言)