今天遇到一个问题,因为HTML中的一个URL被encode了,所以测试工具无法识别url.因此去了解了下html, xml, url encode
简单来说
HTML character references
Character entity references have the format &name; where "name" is a case-sensitive alphanumeric string.
The character entity references <, >, " and & are predefined in HTML and SGML, because <, >, " and & are already used to delimit markup.
XML character references
Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:[7]
& has the special problem that it starts with the character to be escaped. A simple Internet search finds thousands of sequences &amp;amp; … in HTML pages for which the algorithm to replace an ampersand by the corresponding character entity reference was applied too often.
http://en.wikipedia.org/wiki/Character_encodings_in_HTML
List of XML and HTML character entity references
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
URL encode
ASCII Control characters
Why: These characters are not printable.
Non-ASCII characters
Why: These are by definition not legal in URLs since they are not in the ASCII set.
"Reserved characters"
Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
"Unsafe characters"
Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
Example
XSS (Cross Site Scripting) Prevention Cheat Sheet
http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet