Java Regular Expression (Java正则表达式)

In current Project, we need to parse log file (written by log4j) and extract dedicated information for future actions:

 

  1. Display info in GUI
  2. Analyze error and send mail to responder
  3. Backup key logs into DB which can be shared to external system
  4. ...
Oh, what's the topic I am going to talk in this blog? ERP, CRM or requirement for DMS? OK, that's Reguler Expression in Java. Forgive me, sometime enginers start talking in topic A but finally they made a deal for topic B. 
So Regular Expression is one solution to parse some log files.

I will talk about general regular expression semantics first and then some regular expression usage with java.

Regular Expression Semantics

Common Match Sysmbols

RE

Description

.

Matches any sign

^regex

regex must match at the beginning of the line

regex$

Finds regex must match at the end of the line

[abc]

Set definition, can match the letter a or b or c

[abc[vz]]

Set definition, can match a or b or c followed by either v or z

[^abc]

When a "^" appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c

[a-d1-7]

Ranges, letter between a and d and figures from 1 to 7, will not match d1

X|Z

Finds X or Z

XZ

Finds X directly followed by Z

$

Checks if a line end follows

 

Special Metacharacters

RE

Description

\d

Any digit, short for [0-9]

\D

A non-digit, short for [^0-9]

\s

A whitespace character, short for [ \t\n\x0b\r\f]

\S

A non-whitespace character, for short for [^\s]

\w

A word character, short for [a-zA-Z_0-9]

\W

A non-word character [^\w]

\S+

Several non-whitespace characters

 

Quantifier

RE

Description

Examples

*

Occurs zero or more times, is short for {0,}

X* - Finds no or several letter X, .* - any character sequence

+

Occurs one or more times, is short for {1,}

X+ - Finds one or several letter X

?

Occurs no or one times, ? is short for {0,1}

X? -Finds no or exactly one letter X

{X}

Occurs X number of times, {} describes the order of the preceding liberal

\d{3} - Three digits, .{10} - any character sequence of length 10

{X,Y}

.Occurs between X and Y times,

\d{1,4}- \d must occur at least once and at a maximum of four

*?

? after a qualifier makes it a "reluctant quantifier", it tries to find the smallest match.

 

 

Java Regular Expression Usage

To be updated.

你可能感兴趣的:(java,C++,c,log4j,正则表达式)