分解 是指将字节或字符序列分割为像单词这样的逻辑块的过程。Java 提供StreamTokenizer 类, 像下面这样操作:
import java.io.*;
public class token1 {
public static void main(String args[]) {
if (args.length != 1) {
System.err.println("missing filename");
System.exit(1);
}
try {
FileReader fr = new FileReader(args[0]);
BufferedReader br = new BufferedReader(fr);
StreamTokenizer st = new StreamTokenizer(br);
st.resetSyntax();
st.wordChars('a', 'z');
int tok;
while ((tok = st.nextToken()) != StreamTokenizer.TT_EOF) {
if (tok == StreamTokenizer.TT_WORD)
;// st.sval has token
}
br.close();
} catch (IOException e) {
System.err.println(e);
}
}
}
这个例子分解小写单词 (字母a-z)。如果你自己实现同等地功能,它可能像这样:
import java.io.*;
public class token2 {
public static void main(String args[]) {
if (args.length != 1) {
System.err.println("missing filename");
System.exit(1);
}
try {
FileReader fr = new FileReader(args[0]);
BufferedReader br = new BufferedReader(fr);
int maxlen = 256;
int currlen = 0;
char wordbuf[] = new char[maxlen];
int c;
do {
c = br.read();
if (c >= 'a' && c <= 'z') {
if (currlen == maxlen) {
maxlen *= 1.5;
char xbuf[] = new char[maxlen];
System.arraycopy(wordbuf, 0, xbuf, 0, currlen);
wordbuf = xbuf;
}
wordbuf[currlen++] = (char) c;
} else if (currlen > 0) {
String s = new String(wordbuf, 0, currlen); // do something
// with s
currlen = 0;
}
} while (c != -1);
br.close();
} catch (IOException e) {
System.err.println(e);
}
}
}
第二个程序比前一个运行快大约 20%,代价是写一些微妙的底层代码。
StreamTokenizer 是一种混合类,它从字符流(例如 BufferedReader)读取, 但是同时以字节的形式操作,将所有的字符当作双字节(大于 0xff) ,即使它们是字母字符。