前提:android sdk 和 jdk 版本对照 中可以看到 从 android 5.0 后就需要使用 jdk 7 以上的版本,这里讨论的默认字符串常量池放置在虚拟机的堆内存中,不再过多讨论永久代(Permanent Generation, PermGen) 的方法区,事实上从 java 8 后,JVM 就已经没有永久代了,取而代之的是元空间(MetaSpace)
The String class represents character strings. All string literals in Java programs, such as “abc”, are implemented as instances of this class.
Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared
从简介上看 String 是常量,具有不可变性,表现在
从 JDK 原码看,String 内部使用一个名为 value 的数组去存储字符,JDK 8 是一个 char[]
,而 JDK 11 则是 byte[]
,两者都是通过 private final 修饰,实际上可以通过反射去修改 value 数组的内容从而达到破坏 String
不可变性。
而在 Android 原码中对 String 作出了修改,将 value 进行了注释
// BEGIN Android-changed: The character data is managed by the runtime.
/*
We only keep track of the length here and compression here. This has several consequences
throughout this class:
- References to value[i] are replaced by charAt(i).
- References to value.length are replaced by calls to length().
- Sometimes the result of length() is assigned to a local variable to avoid repeated calls.
- We skip several attempts at optimization where the values field was assigned to a local
variable to avoid the getfield opcode.
These changes are not all marked individually.
If STRING_COMPRESSION_ENABLED, count stores the length shifted one bit to the left with the
lowest bit used to indicate whether or not the bytes are compressed (see GetFlaggedCount in
the native code).
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*
@Stable
private final byte[] value;
*/
将构造方法内部注释并抛出异常
// JDK 11
public String() {
this.value = "".value;
this.coder = "".coder;
}
// Android 11, API 30
public String() {
// Android-changed: Constructor unsupported as all calls are intercepted by the runtime.
throw new UnsupportedOperationException("Use StringFactory instead.");
}
// Android 13, API 33
public String() {
// BEGIN Android-changed: Implemented as compiler and runtime intrinsics.
/*
this.value = "".value;
this.coder = "".coder;
*/
throw new UnsupportedOperationException("Use StringFactory instead.");
// END Android-changed: Implemented as compiler and runtime intrinsics.
}
并对很多内部方法修改为 native 调用,例如 charAt
// BEGIN Android-changed: Replace with implementation in runtime to access chars (see above).
/*
public char charAt(int index) {
if (isLatin1()) {
return StringLatin1.charAt(value, index);
} else {
return StringUTF16.charAt(value, index);
}
}
*/
@FastNative
public native char charAt(int index);
// END Android-changed: Replace with implementation in runtime to access chars (see above).
从实际代码测试,通过反射 String 来看
public void testString() {
Field[] valueFieldOfString = String.class.getDeclaredFields();
for(Field field: valueFieldOfString) {
LogUtil.info(TAG, "field name = " + field.getName());
}
}
没有打印 value 字段,所以运行时应该使用的是 android 原码中的 String 类
com.example.mytestapplication I Solution: field name = CASE_INSENSITIVE_ORDER
com.example.mytestapplication I Solution: field name = count
com.example.mytestapplication I Solution: field name = hash
com.example.mytestapplication I Solution: field name = CASE_INSENSITIVE_ORDER
com.example.mytestapplication I Solution: field name = serialPersistentFields
com.example.mytestapplication I Solution: field name = serialVersionUID
再根据 android 原码中的 String 类中 charAt(int) 方法是 native方法,而 JDK 中则是个普通成员方法,再次通过反射
public void testString() {
try {
Method chatAtMethod = String.class.getDeclaredMethod("charAt", int.class);
boolean isNativeMethod = Modifier.isNative(chatAtMethod.getModifiers());
LogUtil.info(TAG, "charAt isNativeMethod = " + isNativeMethod);
} catch (NoSuchMethodException e) {
throw new RuntimeException(e);
}
}
运行在Android 11 的 Pixel 3XL 上,打印结果 isNativeMethod 为 true
com.example.mytestapplication I Solution: charAt isNativeMethod = true
更加证明运行时应该使用的是 android 原码中的 String 类,但是直接调用 String 的构造方法又没有报错,从注释来看,android 的修改只是为了编译通过,真正调用的的是在虚拟机中实现
public void testString() {
String a = "123";
// 如果使用的 Android 原码的 String 类,这里应该会报错,但实际上没有
String b = new String();
String c = b;
}
目前主要以 jdk 7 及以上内存模型为分析基础前提
对于 String 对象的创建,一般有两个方法
String a1 = "aaa"; // 栈内存分配引用a1,堆内存字符串常量池分配 "aaa" 字面量对象内存
String a11 = a1; // 栈内存分配引用a11,保存的是常量池对象 "aaa" 内存地址
// 栈内存分配引用a2,堆内存分配新的 "aaa" 对象内存,常量池由于已经存在"aaa",直接复用,引用 a2 保存堆内存中的 "aaa" 对象地址
String a2 = new String("aaa");
LogUtil.info(TAG, "(a1 == a11) = " + (a1 == a11)); // 指向的内存地址一致,true
LogUtil.info(TAG, "(a1 == a2) = " + (a1 == a2)); // 指向的内存地址不一致, false
LogUtil.info(TAG, "a1.equals(a2) = " + a1.equals(a2)); // 字符串内容一致, true
值的注意的是第二种 IDE 会提示 'new String()'is redundant ,详细信息为
Initializes a newly created String object so that it represents the same sequence of characters as the argument;
in other words, the newly created string is a copy of the argument string.
Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
Effective Java 中也强烈不建议使用上面所说的通过调用 new String() 的方式去创建对象的方式,如果在有循环的场景中会造成极大的内存浪费
再来看几个例子
String b1 = "bbb"; // 堆内存字符串常量池创建 "bbb" 字符串对象
String b2 = "b"; // 堆内存字符串常量池创建 "b" 字符串对象
String b3 = "bb" + "b"; // 由于 + 号两边都是字符创常量,编译优化成 "bbb"
String b4 = "bb" + b2; // 由于 + 号旁边有引用,无法在编译期间确认结果,编译优化使用 StringBuilder.append().toString() 在堆内存中创建新的 "bbb" 对象
LogUtil.info(TAG, "b1 == b3 = " + (b1 == b3)); // 编译期间,编译器会将值相加的操作优化,实际上 b3 == "bbb", true
LogUtil.info(TAG, "b1 == b4 = " + (b1 == b4)); // b4 由于是值加上引用,结果会返回一个新的地址给到引用b4,false
final String b5 = "b";
String b6 = "bb" + b5; // b5由于是final修饰的常量,javac 编译优化会将 b6 编译优化成 "bbb"
LogUtil.info(TAG, "b1 == b6 = " + (b1 == b6)); // true
final String b7 = getB7();
String b8 = "bb" + b7; // b7虽然是final修饰,但由方法返回对象地址,无法在编译期间确认结果,编译优化使用 StringBuilder.append().toString() 在堆内存中创建新的 "bbb" 对象
LogUtil.info(TAG, "b1 == b8 = " + (b1 == b8)); // false
LogUtil.info(TAG, "b4 == b8 = " + (b4 == b8)); // false
值得注意的是 String 类中还有个 intern() 方法,直译过来就是扣押或者拘留的意思
/**
* Returns a canonical representation for the string object.
*
* A pool of strings, initially empty, is maintained privately by the
* class {@code String}.
*
* When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.
*
* It follows that for any two strings {@code s} and {@code t},
* {@code s.intern() == t.intern()} is {@code true}
* if and only if {@code s.equals(t)} is {@code true}.
*
* All literal strings and string-valued constant expressions are
* interned. String literals are defined in section 3.10.5 of the
* The Java™ Language Specification.
*
* @return a string that has the same contents as this string, but is
* guaranteed to be from a pool of unique strings.
* @jls 3.10.5 String Literals
*/
// Android-added: Annotate native method as @FastNative.
@FastNative
public native String intern();
这个方法在 jdk 和 android 中都是native方法,从注释上可以得知,其作用是当字符串常量池存在与当前字符串相等的对象,则直接返回字符串常量池当中的对象,否则,将该字符串加入到字符串常量池后返回该字符串的引用,最后返回的都是常量池字符串对象的引用。
从上面注释上看出,存在一个字符串对象在 heap 内存而不在字符串常量池的情况,此时我们需要考虑的是在什么场景中有这种情况,看下面例子
String s1 = new String("hello"); // s1 指向堆内存的 "hello" 字符串对象
String s2 = s1.intern(); // s2 指向常量池的 "hello" 字符串对象
String s3 = "hello"; // // s2 指向常量池的 "hello" 字符串对象
LogUtil.info(TAG, "s1 == s2 = " + (s1 == s2)); // false
LogUtil.info(TAG, "s2 == s3 = " + (s2 == s3)); // true
上面的例子很好理解,接下来修改 s1 的创建方式
String s1 = new String("he") + new String("llo");
String s2 = s1.intern();
String s3 = "hello";
LogUtil.info(TAG, "s1 == s2 = " + (s1 == s2)); // true
LogUtil.info(TAG, "s2 == s3 = " + (s2 == s3)); // true
我们发现,s1 == s2 返回 true 了,按照之前的理解,s1 指向堆内存的 “hello” 对象并且在常量池创建了 “hello” 对象,s2 和 s3 指向常量池的 “hello” 对象,也即是 s1 != s2。那这是为什么呢?这一切又跟字符串拼接的编译优化有关,实际上创建s1时没有在常量池生成 “hello” 对象,只是在常量池保存了一个指向堆内存中 “hello” 对象的引用,所以 s1 == s2 == s3,通过反编译 s1 的创建方式变成了
String s1 = (new StringBuilder()).append("he").append(new String("llo")).toString();
查看 StringBuilder 的 toString() 方法
public String toString() {
// Create a copy, don't share the array
return isLatin1() ? StringLatin1.newString(value, 0, count)
: StringUTF16.newString(value, 0, count);
}
通过判断当前字符是否是纯拉丁字母文字决定采用哪种编码格式返回 String 对象(可以降低内存,详见ASCII,Unicode 和 UTF-8 编码区别),我们只看其中一个方法 StringLatin1.newString 即可
public static String newString(byte[] val, int index, int len) {
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
}
发现其调用的字符串构造方法是
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
这就是跟常规 new String(“hello”) 不一样的地方,上面的方法并没有传入字面量 “hello”, 所以不会在常量池中创建对象
接着再调整一下顺序
String s3 = "hello"; // s3 指向常量池的 "Hello" 对象
// 这种情况下,只会在堆中创建 "hello" 对象,而不会在常量池中创建 "hello" 对象。会在常量池中创建 "he" 和 "llo" 对象,但这个不在讨论范围
String s1 = new String("he") + new String("llo");
String s2 = s1.intern(); // 常量池中已经有 "hello" 对象,返回常量池对象的地址给到引用 s2
LogUtil.info(TAG, "s1 == s2 = " + (s1 == s2)); // false
LogUtil.info(TAG, "s2 == s3 = " + (s2 == s3)); // true
根据上面的例子,使用对象拼接的新对象,不会在常量池中创建字符串
String s1 = new String("he") + new String("llo");
需要注意的一点,由于 JVM 6 和 JVM 7 实现的差异(可见文档前提),intern() 方法的实现有差异,JVM 6 调用 intern() 时,会在 PermGen 中的字符串常量池中创建实例对象,而 JVM 7 只会在堆内存的字符串常量池中创建指向堆内存字符串对象的引用,从而避免了对象的重复创建而减少了内存消耗
一个程序会有大量的字符串使用场景,由于 String 的不可变性,可以避免字符串被篡改,如果对字符串修改,则会返回一个新的字符串对象,保证了数据的安全性。也因为不可变性,也确保了字符串的线程安全。虚拟机创建字符串对象时,还会去堆内存中查找是否有相同的字符串,然后把该对象的内存地址给到新的引用,避免了频繁创建而导致的内存浪费。
https://www.cnblogs.com/goloving/p/14875086.html
https://www.cnblogs.com/wkfvawl/p/11656137.html
https://www.cnblogs.com/one12138/p/11379840.html
https://blog.csdn.net/weixin_43134177/article/details/133364561
https://blog.csdn.net/m0_61840987/article/details/141466360
https://blog.csdn.net/superterence/article/details/120055661
字符串编译优化参考1
字符串编译优化参考2