Learn how to write a Regular Expression:

Learn how to write a Regular Expression:
---------------------------------------------------------------
http://geekswithblogs.net/brcraju/articles/235.aspx
What Regular Expression?
A regular expression is a pattern that can match various text strings, used for validations.

Where and when to use Regular Expression?
It can be used in the programming languages which supports or has regular expression class as in built or it supports third party regular expression libraries.

Regular expressions can be used to valid different type of data without increase the code with if and case conditions. A number of if conditions can be omitted with single line of regular expression checking.

Benefits of Regular Expression:
The following are benefits (not all included) of use of Regular Expression.
a) # line of code can be reduced.
b) Speed Coding.
c) Easy maintenance (you don’t need to change if validation criteria changes, just check the regular expression string).
d) Easy to understand (you don’t need to understand the programmer logic on large if statements and case statements).

Elements of Regular Expression:
Here are the basic elements of regular expression characters/literals, which can be used to build big regular expressions:

^ ---->Start of a string.
$ ---->End of a string.
. ----> Any character (except \n newline)
{...}----> Explicit quantifier notation.
[...] ---->Explicit set of characters to match.
(...) ---->Logical grouping of part of an expression.
* ---->0 or more of previous expression.
+ ---->1 or more of previous expression.
? ---->0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\ ---->Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
\w ----> matches any word character, equivalent to [a-zA-Z0-9]
\W ----> matches any non word character, equivalent to [^a-zA-Z0-9].
\s ----> matches any white space character, equivalent to [\f\n\r\v]
\S----> matches any non-white space characters, equivalent to [^\f\n\r\v]
\d ----> matches any decimal digits, equivalent to [0-9]
\D----> matches any non-digit characters, equivalent to [^0-9]

\a ----> Matches a bell (alarm) \u0007.
\b ----> Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.
\t ---->Matches a tab \u0009.
\r ---->Matches a carriage return \u000D.
\v ---->Matches a vertical tab \u000B.
\f ---->Matches a form feed \u000C.
\n ---->Matches a new line \u000A.
\e ---->Matches an escape \u001B

$number ----> Substitutes the last substring matched by group number number (decimal).
${name} ----> Substitutes the last substring matched by a (? ) group.
$$ ----> Substitutes a single "$" literal.
$& ----> Substitutes a copy of the entire match itself.
$` ----> Substitutes all the text of the input string before the match.
$' ----> Substitutes all the text of the input string after the match.
$+ ----> Substitutes the last group captured.
$_ ----> Substitutes the entire input string.

(?(expression)yes|no) ----> Matches yes part if expression matches and no part will be ommited.


Simple Example:
Let us start with small example, taking integer values:
When we are talking about integer, it always has fixed series, i.e. 0 to 9 and we will use the same to write this regular expression in steps.

a) Regular expression starts with “^”
b) As we are using set of characters to be validated, we can use [].
c) So the expression will become “^[1234567890]”
d) As the series is continues we can go for “-“ which gives us to reduce the length of the expression. It becomes “^[0-9]”
e) This will work only for one digit and to make it to work for n number of digits, we can use “*”, now expression becomes “^[0-9]*”
f) As with the starting ending of the expression should be done with “$”, so the final expression becomes “^[0-9]*$”

Note: Double quotes are not part of expression; I used it just to differentiate between the sentences.

Is this the way you need to write:
This is one of the way you can write regular expression and depending on the requirements and personal expertise, regular expression could be compressed much shorter, for example above regular expression could be reduced as.

a) Regular expression starts with “^”
b) As we are checking for the digits, there is a special character to check for digits “\d”
c) And digits can follow digits , we use “*”
d) As expression ends with “$”, the final regular expression will become
"^\d*$”

Digits can be validated with different ways of regular expressions:

1) ^[1234567890]*$
2) ^[0-9]*$
3) ^\d*$

Which one to choose?
Every one of above expressions will work in the same way, choose the way you are comfort, it is always recommended to have a smaller and self expressive and understandable, as these will effect when you write big regular expression.

Example on exclude options:
There are many situation which demands us to exclude only certain portion or certain characters,
Eg: a) Take all alpha numeric and special symbols except “&”
b) Take all digits except “7”
then we cannot prepare a big list which includes all instead we use the symbol of all and exclude the characters / symbols which need to be validated.
Eg: “^\w[^&]*$” is the solution to take all alpha numeric and special symbols except “&”.

Other Examples:
a) There should not be “1” as first digit,?
^[^1]\d*$ ? this will exclude 1 as first digit.

b) There should not be “1” at any place?
^\d[^1]*$ ? this will exclude the 1 at any place in the sequence.

Note: Here ^ operator is used not only to start the string but also used to negate the values.

Testing of Regular expression:
There are several ways of testing this
a) You can write a windows based program.
b) You can write a web based application.
c) You can even write a service based application.


Windows base sample code:
Here are steps which will be used for regular expression checking in dotNet:

a) Use System.Text.RegularExpression.Regex to include the Regex class.
b) Create an Regex object as follows:
Regex regDollar= new System.Text.RegularExpressions.Regex("^[0-9]*$ ");
c) Call the IsMatch(string object) of the Regex call, which will return true or flase.
d) Depending on the return state you can decide whether passed string is valid for regular expression or not.]

Here is the snap shot code as function:

Public boolean IsValid(string regexpObj, string passedString)
{
//This method is direct method without any exceptional throwing..
Regex regDollar= new System.Text.RegularExpressions.Regex(regexpObj);
return regDollar.IsMatch(passedString);
}

With minor changes to the above function it can be used in windows or webbased or even as a service.

Another way -- Online checking:
At last if you are fed up with above and you have internet connection and you don’t have time to write sample, use the following link to test online

http://www.regexplib.com/RETester.aspx


MORE INFO:
You can find more information on these type of characters at

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterescapes.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterclasses.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpcongroupingconstructs.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterclasses.asp


--Here is the end of article, hope this basic build will definetely useful for writing a big and good Regular Expression ---

Express your code with REGULAR EXPRESSIONS :))

posted on Thursday, October 23, 2003 10:39 AM

Feedback

# re: Learn how to write a Regular Expression:

An excellent article. Great writeup Raju

Some of the finer points I believe you can add to this article are.

1. The input is considered to be text which is parsed to return the matches.
2. The ^ regex literal matches the start of a LINE of input
3. The $ literal matches the end of a LINE of input
4. Hence the example about integers which you have mentioned would not match any thing other than to validate if a given input is of a numeric word as the complete input or not.

If the input is "I am 123" it will not match the input / return to you 123 in the input.
It will only return matches for inputs like "123" "234"

So if you are trying to convey something like "This pattern is going to return to you all the numbers in the given input" (I understand it that way after reading through your article) then it should be modified as \d* thats more than enf.

Another point I would like you to add is you can ask people who are new to regex to actually use tools like Expresso ( http://www.codeproject.com/dotnet/Expresso.asp) using which you can build expressions and test them immediately (the beauty of this tool is that it uses plain english to help you build the expression) a must have for a developer who is new to regex.

Regards,
Ansari
10/25/2003 7:45 AM | Tameem Ansari

# re: Learn how to write a Regular Expression:

Thanks Ansari for your input, I defintely agree with your four points, I tried to concentrated on easy way of understanding and this could be applied with small set of character's, like integers, and you are right I concentrated mostly on nemeric, except in the section

"Example on exclude options"

where I have given example for exclude of & from alpha numeric string. at the end of the string it has printed some junk

“&”

actually it is & within double quotes, I think site is not handling that part.

I should have mentioned this tool, the tool (Expresso) is cool and very much useful to play for the beginners, Thanks for reminding.
10/27/2003 6:33 AM | Ramchander

# re: Learn how to write a Regular Expression:

DotNetNuke 2.2
10/5/2004 3:59 AM | BangTech, Inc.

# Please send me the regex for validating url in asp

Please send me the regex for validating url in asp
10/7/2004 11:30 AM | Deepak Chauhan

# re: Learn how to write a Regular Expression:

An very very good article!

By the way, could you tell me how to validate data field with supporting unicode.
For example, I want to check the input name with only characters (a-z, A-Z, 0-9).
So, I created the RegExp: ^\[a-zA-Z0-9]*$.
When I input the string, e.g. "Smith", it is Ok. But when I put the
"Freinke's Tê", The Regex didn't work? Do you have any ideas?

Thanks so much!
3/11/2005 12:28 AM | Nghia Ngo

# re: Learn how to write a Regular Expression:

Nghia Ngo,

For your example you have used special character Single quote as an input, which you have not included in your regular expression, and with respect to unicode supporting.

Look at the following lines from unicode organization:

Unicode is a large character set—regular expression engines that are only adapted to handle small character sets will not scale well.
Unicode encompasses a wide variety of languages which can have very different characteristics than English or other western European text.

The following link breifs outline on how you can write for your purpose with regarding to unicode, they have given brief with example:

http://www.unicode.org/reports/tr18/tr18-9.html


3/11/2005 8:55 AM | Ramchander

# re: Learn how to write a Regular Expression:

Hi, there usefull article. Thanks.

Is it possible to check the following expression:
1. Matches any word ([a-zA-Z0-9])
2. Quatified: from 4 to 10 excluding a, b, and c words.

Thank you
5/10/2005 8:19 PM | Ruslan

# Regular Expression

5/16/2005 3:33 AM | C# Developer 's blog

# re: Learn how to write a Regular Expression:

I've used this RegEx to find HTML remarks, even if they contain tags:

"(<!--(?>[^<>]+|<(?<NEST>)|>(?<-NEST>))*(?(NEST)(?!))>)"

The only thing is, I want to ignore RTF colour tags but still match the
rest of the expression. To find RTF colour tags I use:

"(\\cf\d{1})"

(Which matches '\cf2, \cf1 etc'). I cannot match it into its own group.
Is their a RegEx for exclusive matching?

OK, this maybe deep-end so I might have to settle for RegEx.Split...
5/16/2005 2:51 PM | Dan

# Regular Expression

5/16/2005 5:02 AM | C# Developer 's blog

# re: Learn how to write a Regular Expression:

I am trying to validate an email address but would like to exclude certain domains (e.g. hotmail, yahoo etc.). I am currently using the foll. but it doesn't seem to work.


" *\\w+([-+.]\\w+)*@[^(hotmail|yahoo)]\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)* *"

Could you help? Thanks!

Nirmal
[email protected]
5/26/2005 12:31 AM | Nirmal

# re: Learn how to write a Regular Expression:

How do I match double quotes since I am specifying the pattern inside the quotes?

It doesn't like it when I try to escape it:

For example, matching on a word inside quotes like below does not work.

RegExp myReg = new RegExp("\"w+\"");

6/21/2005 1:21 AM | Ali

# re: Learn how to write a Regular Expression:

Hi,
I have an email validation, but it does not allow hyphens in the domain name. I have tried inserting '-' into the [ ] like this: [-A-Za-z], but it still returns false. How do I allow hypens??

if (str.match(/^(.*|[A-Za-z]\w*)@(\[\d{1,3}(\.\d{1,3}){3}]|[A-Za-z]\w*(\.[A-Za-z]\w*)+)$/) == null)
{
alert("The e-mail address seems incorrect.");
return false;
}
6/23/2005 4:57 PM | Shane

# re: Learn how to write a Regular Expression:

how to write a regular expression for (php)a password ?
password must have a capital letter, a digital, a small letter and no( ibm, sun, hp) maximum length is 16.

Thanks
6/28/2005 3:17 PM | george

Post Comment

你可能感兴趣的:(Learn how to write a Regular Expression:)