A cross-site scripting attack is one of the top 5 security attacks carried out on a daily basis across the Internet, and your PHP scripts may not be immune.
Also known as XSS, the attack is basically a type of code injection attack which is made possible by incorrectly validating user data, which usually gets inserted into the page through a web form or using an altered hyperlink. The code injected can be any malicious client-side code, such as JavaScript, VBScript, HTML, CSS, Flash, and others. The code is used to save harmful data on the server or perform a malicious action within the user’s browser.
Unfortunately, cross-site scripting attacks occurs mostly, because developers are failing to deliver secure code. Every PHP programmer has the responsibility to understand how attacks can be carried out against their PHP scripts to exploit possible security vulnerabilities. Reading this article, you’ll find out more about cross-site scripting attacks and how to prevent them in your code.

Learning by Example

Let’s take the following code snippet.
1   

2   
3   
4   

Here we have a simple form in which there is a text box for data input and a submit button. Once the form is submitted, it will submit the data to post.php for processing. Let’s say all post.php does is output the data like so:
echo $_POST["comment"];
Without any filtering, a hacker could submit the following through the form which will generates a popup in the browser with the message “hacked”.
This example, despite its being malicious in nature, does not seem to do much harm. But think about what could happen in the JavaScript code was written to steal a user’s cookie and extract sensitive information from it? There are far worse XSS attacks than a simple alert() call.
Cross-site scripting attacks can be grouped in two major categories, based on how they deliver the malicious payload: non-persistent XSS, and persistent XSS. Allow me to discuss each type in detail.

Non-persistent XSS

Also known as reflected XSS attack, meaning that the actual malicious code is not stored on the server but rather gets passed through it and presented to the victim, is the more popular XSS strategy of the two delivery methods. The attack is launched from an external source, such as from an e-mail message or a third-party website.
Here’s an example of a portion of a simple search result script:
1    
2     // Get search results based on the query
3     echo "You searched for: " . $_GET["query"];
4    
5     // List search results
6     ...
The example can be a very unsecure results page where the search query is displayed back to the user. The problem here is that the $_GET["query"] variable isn’t validated or escaped, therefore an attacker could send the following link to the victim:
http://example.com/search.php?query=
Without validation, the page would contain:
You searched for:

Persistent XSS

This type of attack happens when the malicious code has already slipped through the validation process and it is stored in a data store. This could be a comment, log file, notification message, or any other section on the website which required user input at one time. Later, when this particular information is presented on the website, the malicious code gets executed.
Let’s use the following example for a rudimentary file-based comment system. Assuming the same form I presented earlier, let’s say the receiving script simply appends the comment to a data file.
1    
2     file_put_contents("comments.txt", $_POST["comment"], FILE_APPEND);

Elsewhere the contents of comments.txt is shown to visitors:
1    
2     echo file_get_contents("comments.txt");
When a user submit a comment it gets saved to the data file. Then the entire file (thus the entire series of comments) is displayed to the readership. If malicious code is submitted then it will be saved and displayed as is without any validation or escaping.

Preventing Cross-Site Scripting Attacks

Fortunately, as easily as an XSS attack can carried out against an unprotected website, protecting against them are just as easy. Prevention must always be in your thoughts, though, even before you write a single line of code.
The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources. This can’t be emphasized enough. Every bit of data must be validated on input and escaped on output. This is the golden rule of preventing XSS.
In order to implement solid security measures which prevents XSS attacks, we should be mindful of data validation, data sanitization, and output escaping.

Data Validation

Data validation is the process of ensuring that your application is running with correct data. If your PHP script expects an integer for user input, then any other type of data would be discarded. Every piece of user data must be validated when it is received to ensure it is of the corrected type, and discarded if it doesn’t pass the validation process.
If you wanted to validate a phone number, for example, you would discard any strings containing letters, because a phone number should consist of digits only. You should also take the length of the string into consideration. If you wanted to be more permissive, you could allow a limited set of special characters such as plus, parenthesis, and dashes which are often used in formatting phone numbers specific to your intended locale.
1    
2     // validate a US phone number
3     if (preg_match('/^((1-)?\d{3}-)\d{3}-\d{4}$/', $phone)) {
4        echo $phone . " is valid format.";
5     }

Data Sanitization

Data sanitization focuses on manipulating the data to make sure it is safe by removing any unwanted bits from the data and normalizing it to the correct form. For example, if you are expecting a plain text string as user input, you may want to remove any HTML markup from it.
Sometimes, data validation and sanitization/normalization can go hand in hand.
// escape output sent to the browser
echo "You searched for: " . htmlspecialchars($_GET["query"]);

Output Escaping

In order to protect the integrity of displayed/output data, you should escape the data when presenting it to the user. This prevents the browser from applying any unintended meaning to any special sequence of characters that may be found.
All Together Now!
To better understand the three aspects of data processing, let’s take another look at the file-based comment system from earlier and modify it to make sure it’s secure. The potential vulnerabilities in the code stem from the fact that $_POST["comment"] is blindly appended to the comments.txt file which is then displayed directly to the user. To secure it, the $_POST["comment"] value should be validated and sanitized before it is added to the file, and the file’s contents should be escaped when displayed to the user.
// validate comment
$comment = trim($_POST["comment"]);
if (empty($comment)) {
    exit("must provide a comment");
}

// sanitize comment
$comment = strip_tags($comment);

// comment is now safe for storage
file_put_contents("comments.txt", $comment, FILE_APPEND);

// escape comments before display
$comments = file_get_contents("comments.txt");
echo htmlspecialchars($comments);
The script first validates the incoming comment to make sure a non-zero length string as been provided by the user. After all, a blank comment isn’t very interesting.
Data validation needs to happen within a well defined context, meaning that if I expect an integer back from the user, then I validate it accordingly by converting the data into an integer and handle it as an integer. If this results in invalid data, then simply discard it and let the user know about it.
Then the script sanitizes the comment by removing any HTML tags it may contain.
And finally, the comments are retrieved, filtered, and displayed.
Generally the htmlspecialchars() function is sufficient for filtering output intended for viewing in a browser. If you’re using a character encoding in your web pages other than ISO-8859-1 or UTF-8, though, then you’ll want to use htmlentities(). For more information on the two functions, read their respective write-ups in the official PHP documentation.
Bear in mind that no single solution exists that is 100% secure on a constantly evolving medium like the Web. Test your validation code thoroughly with the most up to date XSS test vectors. Using the test data from the following sources should reveal if your code is still prone to XSS attacks.
  • RSnake XSS cheatsheet (a pretty comprehensive list of XSS vectors you can use to test your code)
  • Zend Framework’s XSS test data
  • XSS cheatsheet (makes use of HTML5 features)

Summary

Hopefully this article gave you a good explanation of what cross-site scripting attacks are and how you can prevent them from happening to your code. Never trust data coming from the user or from any other third party sources. You can protect yourself by validating the incoming values in a well defined context, sanitizing the data to protect your code, and escaping output to protect your users. After you’ve written your code, be sure your efforts work correctly by testing the code as thoroughly as you can.