Email Address Validation
Many email address validators will actually throw up errors when faced with a valid, but unusual, email address. Many, for example, assume that an email address with a domain name extension of more than three letters is invalid. However, new TLDs such as ".info", ".name" and ".aero" are perfectly valid but longer than three characters. Many email address validators fail to take into account that you do not necessarily need a domain name in an email address - an IP address is fine.
The first step to creating a PHP script for validating email addresses is to work out exactly what is and is not valid. RFC 2822, that specifies what is and is not allowed in an email address, states that the form of an email address must be of the form "local-part @ domain".
The "local-part" of an email address must be between 1 and 64 characters in length and may be made up in any one of three ways. It can be made up of a selection of characters (and only these characters) from the following selection (though the period can not be the first of these):
In order to check that email addresses conform to these guidelines, we'll need to use regular expressions. First, we need to match the three possible forms of the local part of an email address, using the two patterns below (we'll add in escape characters later, when we put the function together):
We can use the two patterns we've defined here to check for obsolete local parts of email addresses too, saving ourselves from needing a third pattern.
Next, we need to check the domain portion of the email address. It can either be an IP address or a domain name, so we can use the two patterns here to validate it:
The above pattern will match any valid domain name, but will also match an IP address, so we only need the above to check the "domain" portion of the email.
Putting it all together gives us the following function. Call it like any normal function, and you will get back a value of "true" if the string entered is a valid email address, or "false" if the input was an invalid email address.
Using the function above is relatively simple, as you can see:
You can now validate email addresses entered into your site against the specifications that define email addresses (more or less - domain names that start with a number are supposed to be invalid, but do exist).
Finally, please do remember that because an email looks valid does not mean it is in use. Using a script for validating email addresses is a good start to email address validation, but though it can tell you an email address is technically valid it cannot tell you if it is in use. You might benefit from checking in more depth, for example seeing if a domain name is registered. Even better, fire off an email to the address given by a user and get them to click a link to confirm it is real - the only way to be 100% sure.
The first step to creating a PHP script for validating email addresses is to work out exactly what is and is not valid. RFC 2822, that specifies what is and is not allowed in an email address, states that the form of an email address must be of the form "local-part @ domain".
The "local-part" of an email address must be between 1 and 64 characters in length and may be made up in any one of three ways. It can be made up of a selection of characters (and only these characters) from the following selection (though the period can not be the first of these):
- A to Z
- 0 to 9
- !
- #
- $
- %
- &
- '
- *
- +
- -
- /
- =
- ?
- ^
- _
- `
- {
- |
- }
- ~
- .
- dave
- +1~1+
- {_dave_}
- "[[ dave ]]"
- dave."dave" (Note that this is considered an obsolete form of address - new addresses created should not be of this form, but it is still considered valid.)
- -- dave -- (spaces are invalid unless enclosed in quotation marks)
- [dave] (square brackets are invalid, unless contained within quotation marks)
- .dave (the local part of a domain name cannot start with a period)
In order to check that email addresses conform to these guidelines, we'll need to use regular expressions. First, we need to match the three possible forms of the local part of an email address, using the two patterns below (we'll add in escape characters later, when we put the function together):
^[A-Za-z0-9!#$%&'*+-/=?^_`{|}~][A-Za-z0-9!#$%&'*+-/=?^_`{|}~\.]{0,63}$
^"[^(\|")]{0,62}"$
We can use the two patterns we've defined here to check for obsolete local parts of email addresses too, saving ourselves from needing a third pattern.
Next, we need to check the domain portion of the email address. It can either be an IP address or a domain name, so we can use the two patterns here to validate it:
^\[?[0-9\.]+\]?$
^[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9](.[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9])+$
The above pattern will match any valid domain name, but will also match an IP address, so we only need the above to check the "domain" portion of the email.
Putting it all together gives us the following function. Call it like any normal function, and you will get back a value of "true" if the string entered is a valid email address, or "false" if the input was an invalid email address.
function check_email_address($email) {
// First, we check that there's one @ symbol, and that the lengths are right
if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
// Email invalid because wrong number of characters in one section, or wrong number of @ symbols.
return false;
}
// Split it into sections to make life easier
$email_array = explode("@", $email);
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++) {
if (!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
return false;
}
}
if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) { // Check if domain is IP. If not, it should be valid domain name
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2) {
return false; // Not enough parts to domain
}
for ($i = 0; $i < sizeof($domain_array); $i++) {
if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i])) {
return false;
}
}
}
return true;
}
Using the function above is relatively simple, as you can see:
if (check_email_address($email)) {
echo $email . ' is a valid email address.';
} else {
echo $email . ' is not a valid email address.';
}
You can now validate email addresses entered into your site against the specifications that define email addresses (more or less - domain names that start with a number are supposed to be invalid, but do exist).
Finally, please do remember that because an email looks valid does not mean it is in use. Using a script for validating email addresses is a good start to email address validation, but though it can tell you an email address is technically valid it cannot tell you if it is in use. You might benefit from checking in more depth, for example seeing if a domain name is registered. Even better, fire off an email to the address given by a user and get them to click a link to confirm it is real - the only way to be 100% sure.