1. CharSequence接口定义了一个只读的char序列。String 实现 CharSequence , Serializable , Comaprable<String>
2. 由 char[] value, int offset, int count 和 int hash组成。
3. 构造函数 public String(String original) 基本上来说是没用的,因为String本身是immutable的,没必要copy一下。但有一个用处:节省空间。比如说,你用subString生成的一个String,它是重用原来的char[]的,而原来的那个String你已经不想用了(这个条件很重要), 你可以用这个构造函数重建一个String来释放空间。
public String(String original) { int size = original.count; char[] originalValue = original.value; char[] v; if (originalValue.length > size) { // The array representing the String is bigger than the new // String itself. Perhaps this constructor is being called // in order to trim the baggage, so make a copy of the array. int off = original.offset; v = Arrays.copyOfRange(originalValue, off, off+size); } else { // The array representing the String is the same // size as the String, so no point in making a copy. v = originalValue; } this.offset = 0; this.count = size; this.value = v; }
4. 提供了一组构造函数,按照给定的Charset来将一个byte 数组 decode成一个字符串。
5. 接收StringBuilder或StringBuffer的构造函数,会将StringBuilder或StringBuffer先toString() 然后将新生成的String中的value , offset 和count赋值给当前String,这时会将char array拷贝一份,所以之后StringBuilder或StringBuffer的修改不会影响到生成的String。但问题是这样不就多生成了一个String对象么?
6. length()返回的是count的值,也就是说是Code Unit的个数,而不是Code Point的个数, 对于Supplementary Character这个length()不是字符个数。如果想得到Code Point的个数可以使用codePointCount方法。
7. getBytes方法,当不传Charset时就用系统默认的Charset将字符串转成相应的编码。
8. public boolean contentEquals(StringBuffer sb) 对 sb作了同步,防止sb中途被修改:
public boolean contentEquals(StringBuffer sb) { synchronized(sb) { return contentEquals((CharSequence)sb); } }
9. public boolean contentEquals(CharSequence cs) 对StringBuilder和StringBuffer做了特殊处理,这样就避免在调用StringBuilder或StringBUffer的charAt方法时做的边界检查了:
public boolean contentEquals(CharSequence cs) { if (count != cs.length()) return false; // Argument is a StringBuffer, StringBuilder if (cs instanceof AbstractStringBuilder) { char v1[] = value; char v2[] = ((AbstractStringBuilder)cs).getValue(); int i = offset; int j = 0; int n = count; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } // Argument is a String if (cs.equals(this)) return true; // Argument is a generic CharSequence char v1[] = value; int i = offset; int j = 0; int n = count; while (n-- != 0) { if (v1[i++] != cs.charAt(j++)) return false; } return true; }
10. 当作Case Insensitve比较时代价有点高:
public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len) { char ta[] = value; int to = offset + toffset; char pa[] = other.value; int po = other.offset + ooffset; // Note: toffset, ooffset, or len might be near -1>>>1. if ((ooffset < 0) || (toffset < 0) || (toffset > (long)count - len) || (ooffset > (long)other.count - len)) { return false; } while (len-- > 0) { char c1 = ta[to++]; char c2 = pa[po++]; if (c1 == c2) { continue; } if (ignoreCase) { // If characters don't match but case may be ignored, // try converting both characters to uppercase. // If the results match, then the comparison scan should // continue. char u1 = Character.toUpperCase(c1); char u2 = Character.toUpperCase(c2); if (u1 == u2) { continue; } // Unfortunately, conversion to uppercase does not work properly // for the Georgian alphabet, which has strange rules about case // conversion. So we need to make one last check before // exiting. if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) { continue; } } return false; } return true; }
11. hashCode的计算:
public int hashCode() { int h = hash; if (h == 0 && count > 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i < len; i++) { h = 31*h + val[off++]; } hash = h; } return h; }
12. String, StringBuilder和StringBuffer 共享了一段indexOf的逻辑:
static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex >= sourceCount) { return (targetCount == 0 ? sourceCount : -1); } if (fromIndex < 0) { fromIndex = 0; } if (targetCount == 0) { return fromIndex; } char first = target[targetOffset]; int max = sourceOffset + (sourceCount - targetCount); for (int i = sourceOffset + fromIndex; i <= max; i++) { /* Look for first character. */ if (source[i] != first) { while (++i <= max && source[i] != first); } /* Found first character, now look at the rest of v2 */ if (i <= max) { int j = i + 1; int end = j + targetCount - 1; for (int k = targetOffset + 1; j < end && source[j] == target[k]; j++, k++); if (j == end) { /* Found whole string. */ return i - sourceOffset; } } } return -1; }
由此可见,想让indexOf(String substr, int fromIndex) 返回String.length()的唯一方法是, fromIndex >= String.length(),并且子串为空串。
13. lastIndexOf写得有点奇怪:
static int lastIndexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { /* * Check arguments; return immediately where possible. For * consistency, don't check for null str. */ int rightIndex = sourceCount - targetCount; if (fromIndex < 0) { return -1; } if (fromIndex > rightIndex) { fromIndex = rightIndex; } /* Empty string always matches. */ if (targetCount == 0) { return fromIndex; } int strLastIndex = targetOffset + targetCount - 1; char strLastChar = target[strLastIndex]; int min = sourceOffset + targetCount - 1; int i = min + fromIndex; startSearchForLastChar: while (true) { while (i >= min && source[i] != strLastChar) { i--; } if (i < min) { return -1; } int j = i - 1; int start = j - (targetCount - 1); int k = strLastIndex - 1; while (j > start) { if (source[j--] != target[k--]) { i--; continue startSearchForLastChar; } } return start - sourceOffset + 1; } }
不是很理解为什么一定要从最后一个字符开始比较,虽然是last index,但匹配字符串没必要从最后一个字符开始比较,而且还用了一个让人不是很舒服的label。我也试着写了一个,才发现,其实要写得逻辑很严密还是需要费一番周折的:
private static int myLastIndexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) { if (fromIndex < 0) { return -1; } int rightIndex = sourceCount - targetCount; if (fromIndex > rightIndex) { fromIndex = rightIndex; } if (targetCount == 0) { return fromIndex; } char firstChar = target[targetOffset]; int max = targetOffset + targetCount; for (int i = sourceOffset + fromIndex; i >= sourceOffset; i--) { if (source[i] != firstChar) { while (--i >= sourceOffset && source[i] != firstChar) ; } if (i >= sourceOffset) { int j = targetOffset + 1; for (int k = i + 1; j < max && source[k] == target[j]; k++, j++) ; if (j == max) { return i - sourceOffset; } } } return -1; }
14. subString重用了原来的char[]。
15. public String replace(CharSequence target, CharSequence replacement) 是纯字面的匹配。replace方法中除了replaceFirst以外都是找出所有匹配的进行替换。
16. trim把比空格小的unicode都当成空白了:
public String trim() { int len = count; int st = 0; int off = offset; /* avoid getfield opcode */ char[] val = value; /* avoid getfield opcode */ while ((st < len) && (val[off + st] <= ' ')) { st++; } while ((st < len) && (val[off + len - 1] <= ' ')) { len--; } return ((st > 0) || (len < count)) ? substring(st, len) : this; }
17. intern 方法的注释很精确了:
Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String
.
When the intern method is invoked, if the pool already contains a string equal to this String
object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this String
object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s
and t
, s.intern() == t.intern()
is true
if and only if s.equals(t)
is true
.
All literal strings and string-valued constant expressions are interned.
17. 最后写了两个test case :
@Test public void testString() { String str = new String("ABC"); String str1 = new String("ABC"); assertFalse(str == str1); assertTrue(str.intern() == str1.intern()); assertEquals(str.length(), str.indexOf("", str.length())); assertEquals("ss", "\\w\\w".replace((CharSequence)"\\w", "s")); }