Conceptually, Java strings are sequences of Unicode characters. For example, the string "Java\u2122" consists of the five Unicode characters J, a, v, a, and ™. Java does not have a built-in string type. Instead, the standard Java library contains a predefined class called, naturally enough, String. Each quoted string is an instance of the String class:
概念上讲,Java字符串就是Unicode字符序列。例如,字符串"Java\u2122"由5个Unicode字符J,a,v,a和™组成。Java没有内建的string类型。但是,标准Java库提供了一个类,很自然的,叫做String。每个被引起来的字符串就是一个String实例:
String e = ""; // an empty string
String greeting = "Hello";
Code Points and Code Units 代码点和代码单元
Java strings are implemented as sequences of char values. As we discussed on page 41, the char data type is a code unit for representing Unicode code points in the UTF-16 encoding. The most commonly used Unicode characters can be represented with a single code unit. The supplementary characters require a pair of code units.
Java字符串是以char值序列的方式实现的。如我们在41页中提到的,char数据类型是一个表示UTF-16编码中各个Unicode代码点的代码单元。最常用的Unicode字符可以用一个单独的代码单元表示。增补字符需要一对代码单元。
The length method yields the number of code units required for a given string in the UTF-16 encoding. For example:
length方法返回指定的UTF-16编码字符串所需代码单元的数量,例如:
String greeting = "Hello";
int n = greeting.length(); // is 5.
To get the true length, that is, the number of code points, call
要得到真实的长度,即代码点的数量,调用:
int cpCount = greeting.codePointCount(0, greeting.length());
The call s.charAt(n) returns the code unit at position n, where n is between 0 and s.length() – 1. For example,
s.charAt(n) 返回位置n对应的代码单元,这里n介于0和s.length()-1之间。例如:
char first = greeting.charAt(0); // first is 'H'
char last = greeting.charAt(4); // last is 'o'
To get at the ith code point, use the statements
要获得第i个代码点,使用语句:
int index = greeting.offsetByCodePoints(0, i);
int cp = greeting.codePointAt(index);
NOTE
Java counts the code units in strings in a peculiar fashion: the first code unit in a string has position 0. This convention originated in C, where there was a technical reason for counting positions starting at 0. That reason has long gone away and only the nuisance remains. However, so many programmers are used to this convention that the Java designers decided to keep it.
Java以一种特殊的方式计算字符串中的代码单元:字符串中的第一个代码单元的位置是0。这个约定源于C,在C中位置从0开始计数是有技术原因的。这个技术原因现在早已不存在了,但是却留下了这个令人讨厌的方式。但是由于很多程序员习惯了这个约定,所以Java的设计者们决定保留它。
Why are we making a fuss about code units? Consider the sentence
我们为什么要在代码单元上小题大做?看一下这个句子
is the set of integers
The character requires two code units in the UTF-16 encoding. Calling
在UTF-16编码中,字符需要两个代码单元,调用
char ch = sentence.charAt(1)
doesn't return a space but the second code unit of . To avoid this problem, you should not use the char type. It is too low-level.
并不返回一个空格,而是其第二个代码单元。要避免这个问题,你不应当使用char类型。这个类型太低级。
If your code traverses a string, and you want to look at each code point in turn, use these statements:
如果你的代码遍历一个字符串,并且你想逐个查看每个代码单元,请使用下面的语句:
int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i += 2;
else i++;
Fortunately, the codePointAt method can tell whether a code unit is the first or second half of a supplementary character, and it returns the right result either way. That is, you can move backwards with the following statements:
幸运的是,codePointAt方法可以告诉我们何处是一个辅助字符的前一半或者后一半,并且对于任一一种都可以返回正确的结果。也就是说,你也可以用下面的语句进行逆向遍历
i--;
int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i--;
Substrings 子串
You extract a substring from a larger string with the substring method of the String class. For example,
用substring方法可以从一个大的字符串中提取字串。例如
String greeting = "Hello";
String s = greeting.substring(0, 3);
creates a string consisting of the characters "Hel".
得到一个由字符”Hel”组成的字串。
The second parameter of substring is the first code unit that you do not want to copy. In our case, we want to copy the code units in positions 0, 1, and 2 (from position 0 to position 2 inclusive). As substring counts it, this means from position 0 inclusive to position 3 exclusive.
substring的第二个参数是你第一个不想复制的代码单元。在我们的例子中,我们想复制的是位置0、1、2(从位置0到位置2,包含端点)。也就是从位置0(包含)到位置3(不包含)。
There is one advantage to the way substring works: Computing the number of code units in the substring is easy. The string s.substring(a, b) always has b - a code units. For example, the substring "Hel" has 3 – 0 = 3 code units.
substring的这种工作方式有一个优点:计算字串中的代码单元数量是简单的。字符串s.substring(a,b)的代码单元数总是等于b-a。从例子即可看出。
String Editing 字符串编辑
The String class gives no methods that let you change a character in an existing string. If you want to turn greeting into "Help!", you cannot directly change the last positions of greeting into 'p' and '!'. If you are a C programmer, this will make you feel pretty helpless. How are you going to modify the string? In Java, it is quite easy: concatenate the substring that you want to keep with the characters that you want to replace.
Java中的String类型虽然不提供字符串编辑的方法,但是,你可以采用将某个字符串的字串和其他字串相连接的方式。例如你希望将”Hello”修改为”Help!”,你可以这样做
greeting = greeting.substring(0, 3) + "p!";
This declaration changes the current value of the greeting variable to "Help!".
Because you cannot change the individual characters in a Java string, the documentation refers to the objects of the String class as being immutable. Just as the number 3 is always 3, the string "Hello" will always contain the code unit sequence describing the characters H, e, l, l, o. You cannot change these values. You can, as you just saw however, change the contents of the string variable greeting and make it refer to a different string, just as you can make a numeric variable currently holding the value 3 hold the value 4.
在Java中,你不能改变Java字串中的某个值,但是,你可以改变变量的内容,即使得字符串变量指向其他字符串。
Isn't that a lot less efficient? It would seem simpler to change the code units than to build up a whole new string from scratch. Well, yes and no. Indeed, it isn't efficient to generate a new string that holds the concatenation of "Hel" and "p!". But immutable strings have one great advantage: the compiler can arrange that strings are shared.
虽然生成新的字符组合效率会降低,但是不可变的字符串有一大优点:编译器可以将字符串共享。
To understand how this works, think of the various strings as sitting in a common pool. String variables then point to locations in the pool. If you copy a string variable, both the original and the copy share the same characters. Overall, the designers of Java decided that the efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
要理解这个工作过程,假设各种字符串处在一个共享池中。字符串变量指向池中的某一个位置。如果你复制一个字符串,原字串和拷贝共享同一个字符序列。总之,Java设计者认为共享的有效性远大于提取字串在连接的字符串编辑的有效性。
Look at your own programs; we suspect that most of the time, you don't change strings—you just compare them. Of course, in some cases, direct manipulation of strings is more efficient. (One example is assembling strings from individual characters that come from a file or the keyboard.) For these situations, Java provides a separate StringBuilder class that we describe in Chapter 12. If you are not concerned with the efficiency of string handling, you can ignore StringBuilder and just use String.
看看我们的程序,大多数时候,我们并不改变字串,而是进行比较。当然,有时候直接对字符串进行操作更为有效。(一个例子就是编译来自于一个文件或者键盘的独立字符序列。)对此种情况而言,Java提供独立的StringBuilder类。我们在12章中讨论。如果你对字符串处理的效率不感兴趣,你可以跳过StringBuilder,仅仅使用String就可以了。
C++ NOTE
C programmers generally are bewildered when they see Java strings for the first time because they think of strings as arrays of characters:
C程序员第一次看到Java字符串的时候会感到疑惑,因为他们会认为字符串其实就是字符数组:
char greeting[] = "Hello";
That is the wrong analogy: a Java string is roughly analogous to a char* pointer,
这是一个错误的类比:Java字符串组略的类同于一个char*指针。
char* greeting = "Hello";
When you replace greeting with another string, the Java code does roughly the following:
当你用另一个字符串代替greeting的时候,Java代码粗略的进行如下工作:
char* temp = malloc(6);
strncpy(temp, greeting, 3);
strncpy(temp + 3, "p!", 3);
greeting = temp;
Sure, now greeting points to the string "Help!". And even the most hardened C programmer must admit that the Java syntax is more pleasant than a sequence of strncpy calls. But what if we make another assignment to greeting?
当然,现在greeting指向字符串”Help!”。即使是最铁杆的C程序员也必须承认Java语句比使用一组strncpy函数要令人愉快。但如果我们要给greeting另作指派又会如何呢?
greeting = "Howdy";
Don't we have a memory leak? After all, the original string was allocated on the heap. Fortunately, Java does automatic garbage collection. If a block of memory is no longer needed, it will eventually be recycled.
这难道不会产生内存泄漏?毕竟,原字串是分配在堆上的。幸运的是,Java具有垃圾自动回收机制。如果某部分内存不再需要,它将最终被回收。
If you are a C++ programmer and use the string class defined by ANSI C++, you will be much more comfortable with the Java String type. C++ string objects also perform automatic allocation and deallocation of memory. The memory management is performed explicitly by constructors, assignment operators, and destructors. However, C++ strings are mutable—you can modify individual characters in a string.
如果你是C++程序员,并且使用由ANSI C++定义的string类,你将感到和Java中的String类型一样的舒服。C++中的string对象也能自动分配和释放内存。内存管理由构造函数、赋值运算符和析构函数清晰的执行。但是C++字符串是可变的,你可以修改字串中的独立字符。
Concatenation 连接
Java, like most programming languages, allows you to use the + sign to join (concatenate) two strings.
和大多数编程语言一样,Java也可以使用+号将两个字符串相连接。
String expletive = "Expletive";
String PG13 = "deleted";
String message = expletive + PG13;
The above code sets the variable message to the string "Expletivedeleted". (Note the lack of a space between the words: the + sign joins two strings in the order received, exactly as they are given.)
以上的代码将变量message设置为字符串”Expletivedeleted”。(之所以两个单词之间会缺少空格,是因为+号精确的按照两个单词给出的顺序将其连接起来)
When you concatenate a string with a value that is not a string, the latter is converted to a string. (As you see in Chapter 5, every Java object can be converted to a string.) For example:
当你将一个字符串和一个非字符串连接时,后者将转换为字符串(第五章中,你将看到,Java对象都可以转换成字符串),例如:
int age = 13;
String rating = "PG" + age;
sets rating to the string "PG13".
将rating设置为”PG13”。
This feature is commonly used in output statements. For example,
这一功能通常用于输出语句,例如
System.out.println("The answer is " + answer);
is perfectly acceptable and will print what one would want (and with the correct spacing because of the space after the word is).
可以很好的接受,并打印出你想要的(由于is 后面有空格,所以也能正确的打印出空格)
Testing Strings for Equality 测试字符串相等
To test whether two strings are equal, use the equals method. The expression
要测试两个字符串是否相等,使用equals方法。表达式
s.equals(t)
returns TRue if the strings s and t are equal, false otherwise. Note that s and t can be string variables or string constants. For example, the expression
返回true当字符串t和s相等时,否则,返回false。注意,s和t可以是字符串变量,也可以是字符串常量。例如,表达式
"Hello".equals(greeting)
is perfectly legal. To test whether two strings are identical except for the upper/lowercase letter distinction, use the equalsIgnoreCase method.
也是很合法的。要测试两个字符串除了大小写的差别是否相同,使用equalsIgnoreCase方法。
"Hello".equalsIgnoreCase("hello")
Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places.
不要使用==运算符来测试两个字符串是否相等!这种方法仅能判断两个字符串是否存储在同一个位置上。当然,如果字符串存储在同一个位置上,他们肯定相等。但是在不同的位置存储相同的字符串的多个拷贝也是完全有可能的。
String greeting = "Hello"; //initialize greeting to a string
if (greeting == "Hello") . . .
// probably true
if (greeting.substring(0, 3) == "Hel") . . .
// probably false
If the virtual machine would always arrange for equal strings to be shared, then you could use the == operator for testing equality. But only string constants are shared, not strings that are the result of operations like + or substring. Therefore, never use == to compare strings lest you end up with a program with the worst kind of bug—an intermittent one that seems to occur randomly.
如果虚拟机总是将字符串分配为共享的,那么,你可以使用==运算符来测试相等。但是只有字符串常量是共享的,而那些+或者substring运算产生的字符串则不是共享的。所以,千万不可以使用==来比较字符串,以免你编写出的程序存在最糟糕的一种bug——一种不连续发生的貌似随机的Bug。
C++ NOTE
If you are used to the C++ string class, you have to be particularly careful about equality testing. The C++ string class does overload the == operator to test for equality of the string contents. It is perhaps unfortunate that Java goes out of its way to give strings the same "look and feel" as numeric values but then makes strings behave like pointers for equality testing. The language designers could have redefined == for strings, just as they made a special arrangement for +. Oh well, every language has its share of inconsistencies.
如果你习惯了C++的string类,那么你需要特别注意相等性测试。C++中的string类在进行字符串内容的相等比较的时候,运算符==进行了重载。Java打破自己的形式,给字符串赋予数字值一般的外表,而实际上又让这些字符串在比较的时候像指针一样,这也许是个不幸的事情。这门语言的设计者可以重新定义string中的==符号,就像他们特别分配了+号一样。嗯,好吧,每个语言都有其矛盾的一面。
C programmers never use == to compare strings but use strcmp instead. The Java method compareTo is the exact analog to strcmp. You can use
C程序员从不使用==来比较字符串,而是使用strcmp。Java中的compareTo方法精确的类似于strcmp,你可以使用
if (greeting.compareTo("Hello") == 0) . . .
but it seems clearer to use equals instead.
但是使用equals似乎更清晰明了。
The String class in Java contains more than 50 methods. A surprisingly large number of them are sufficiently useful so that we can imagine using them frequently. The following API note summarizes the ones we found most useful.
Java中的String类有多达50个方法。这么多的方法都十分的有用,所以我们可以经常使用它们。下面的API注释中,我们总结了最有用的一些方法
(译者:以下内容不再翻译,仅供参考。)
NOTE
You will find these API notes throughout the book to help you understand the Java Application Programming Interface (API). Each API note starts with the name of a class such as java.lang.String—the significance of the so-called package name java.lang is explained in Chapter 4. The class name is followed by the names, explanations, and parameter descriptions of one or more methods.
We typically do not list all methods of a particular class but instead select those that are most commonly used, and describe them in a concise form. For a full listing, consult the on-line documentation.
We also list the version number in which a particular class was introduced. If a method has been added later, it has a separate version number.
java.lang.String 1.0
- char charAt(int index)
returns the code unit at the specified location. You probably don't want to call this method unless you are interested in low-level code units.
- int codePointAt(int index) 5.0
returns the code point that starts or ends at the specified location.
- int offsetByCodePoints(int startIndex, int cpCount) 5.0
returns the index of the code point that is cpCount code points away from the code point at startIndex.
- int compareTo(String other)
returns a negative value if the string comes before other in dictionary order, a positive value if the string comes after other in dictionary order, or 0 if the strings are equal.
- boolean endsWith(String suffix)
returns TRue if the string ends with suffix.
- boolean equals(Object other)
returns true if the string equals other.
- boolean equalsIgnoreCase(String other)
returns true if the string equals other, except for upper/lowercase distinction.
- int indexOf(String str)
- int indexOf(String str, int fromIndex)
- int indexOf(int cp)
- int indexOf(int cp, int fromIndex)
return the start of the first substring equal to the string str or the code point cp, starting at index 0 or at fromIndex, or -1 if str does not occur in this string.
- int lastIndexOf(String str)
- int lastIndexOf(String str, int fromIndex)
- int lastindexOf(int cp)
- int lastindexOf(int cp, int fromIndex)
return the start of the last substring equal to the string str or the code point cp, starting at the end of the string or at fromIndex.
- int length()
returns the length of the string.
- int codePointCount(int startIndex, int endIndex) 5.0
returns the number of code points between startIndex and endIndex - 1. Unpaired surrogates are counted as code points.
- String replace(CharSequence oldString, CharSequence newString)
returns a new string that is obtained by replacing all substrings matching oldString in the string with the string newString. You can supply String or StringBuilder objects for the CharSequence parameters.
- boolean startsWith(String prefix)
returns true if the string begins with prefix.
- String substring(int beginIndex)
- String substring(int beginIndex, int endIndex)
return a new string consisting of all code units from beginIndex until the end of the string or until endIndex - 1.
- String toLowerCase()
returns a new string containing all characters in the original string, with uppercase characters converted to lower case.
- String toUpperCase()
returns a new string containing all characters in the original string, with lowercase characters converted to upper case.
- String trim()
returns a new string by eliminating all leading and trailing spaces in the original string.
Reading the On-Line API Documentation
As you just saw, the String class has lots of methods. Furthermore, there are thousands of classes in the standard libraries, with many more methods. It is plainly impossible to remember all useful classes and methods. Therefore, it is essential that you become familiar with the on-line API documentation that lets you look up all classes and methods in the standard library. The API documentation is part of the JDK. It is in HTML format. Point your web browser to the docs/api/index.html subdirectory of your JDK installation. You will see a screen like that in Figure 3-2.
Figure 3-2. The three panes of the API documentation
[View full size image]
The screen is organized into three frames. A small frame on the top left shows all available packages. Below it, a larger frame lists all classes. Click on any class name, and the API documentation for the class is displayed in the large frame to the right (see Figure 3-3). For example, to get more information on the methods of the String class, scroll the second frame until you see the String link, then click on it.
Figure 3-3. Class description for the String class
[View full size image]
Then scroll the frame on the right until you reach a summary of all methods, sorted in alphabetical order (see Figure 3-4). Click on any method name for a detailed description of that method (see Figure 3-5). For example, if you click on the compareToIgnoreCase link, you get the description of the compareToIgnoreCase method.
Figure 3-4. Method summary of the String class
[View full size image]
Figure 3-5. Detailed description of a String method
[View full size image]
TIP
Bookmark the docs/api/index.html page in your browser right now.
文章来源: http://x-spirit.spaces.live.com/Blog/cns!CC0B04AE126337C0!330.entry