Regexes let you create powerful text-processing applications. One application you might find helpful extracts comments from a Java, C, or C++ source file, and records those comments in another file. Listing 2 presents that application's source code:
Listing 2. ExtCmnt.java
- // ExtCmnt.java
- import java.io.*;
- import java.util.regex.*;
- class ExtCmnt {
- public static void main (String [] args) {
- if (args.length != 2) {
- System.err.println ("usage: java ExtCmnt infile outfile");
- return;
- }
- Pattern p;
- try {
- // The following pattern lets this extract multi-line comments that
- // appear on a single line (e.g., /* same line */) and single-line
- // comments (e.g., // some line). Furthermore, the comment may
- // appear anywhere on the line.
- p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
- } catch (PatternSyntaxException e) {
- System.err.println ("Regex syntax error: " + e.getMessage ());
- System.err.println ("Error description: " + e.getDescription ());
- System.err.println ("Error index: " + e.getIndex ());
- System.err.println ("Erroneous pattern: " + e.getPattern ());
- return;
- }
- BufferedReader br = null;
- BufferedWriter bw = null;
- try {
- FileReader fr = new FileReader (args [0]);
- br = new BufferedReader (fr);
- FileWriter fw = new FileWriter (args [1]);
- bw = new BufferedWriter (fw);
- Matcher m = p.matcher ("");
- String line;
- while ((line = br.readLine ()) != null) {
- m.reset (line);
- /* entire line must match */
- if (m.matches ()){
- bw.write (line);
- bw.newLine ();
- }
- }
- } catch (IOException e) {
- System.err.println (e.getMessage ());
- return;
- } finally // Close file {
- try {
- if (br != null)
- br.close ();
- if (bw != null)
- bw.close ();
- } catch (IOException e) {}
- }
- }
- }
// ExtCmnt.java
// The following pattern lets this extract multiline comments that
// appear on a single line (e.g., /* same line */) and single-line
// comments (e.g., // some line). Furthermore, the comment may
// appear anywhere on the line.
p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
if (m.matches ()) /* entire line must match */
finally // Close file.
这个输出显示ExtCmnt 并不完美:p = Pattern.compile (".*/\\*.*\\*/|.*//.*$"); 没有描绘一个注释。出现在out中的行因为ExtCmnt的matcher匹配了//字符。
关于pattern ".*/\\*.*\\*/|.*//.*$"由一些有趣的事,竖线元字符metacharacter (|)。依照SDK documentation,圆括号元字符在capturing group和 竖线元字符是逻辑操作符号。vertical bar 描述了一个matcher,它使用操作符左侧的正则表达式结构来在matcher的文本中定为一个match。假如没有match存在,matcher使 用操作符号右侧的正则表达式进行再次的匹配尝试。
温习
尽管正则表达式简化了在text处理程序中pattern匹配的代码,除非你理解它们,否则你不能有效的在你的程序中使用正则表达式。这篇文章通过介绍给 你regex terminology,the java.util.regex 包和示范regex constructs的程序来让你对正则表达式有一个基本的理解。既然你对regexes有了一个基本的理解,建立在通过阅读additional articles (see Resources)和学习java.util.regex's SDK 文档,那里你可以学习更多的regex constructs ,例如POSIX (Portable Operating System Interface for Unix) 字符类。
我鼓励你用这篇文章中或者其它以前文章中资料中问题email me。(请保持问题和这个栏目讨论的文章相关性。)你的问题和我的回答将出现在相关的学习guides中。)
After writing Java 101 articles for 28 consecutive months, I'm taking a two-month break. I'll return in May and introduce a series on data structures and algorithms.
About the author
Jeff Friesen has been involved with computers for the past 23 years. He holds a degree in computer science and has worked with many computer languages. Jeff has also taught introductory Java programming at the college level. In addition to writing for JavaWorld, he has written his own Java book for beginners—Java 2 by Example, Second Edition (Que Publishing, 2001; ISBN: 0789725932)—and helped write Using Java 2 Platform, Special Edition (Que Publishing, 2001; ISBN: 0789724685). Jeff goes by the nickname Java Jeff (or JavaJeff). To see what he's working on, check out his Website at http://www.javajeff.com.