Handling Invalid Characters in an XML String (zz.IS2120.BG57IV3)

There are 5 predefined entity references in XML:

//z 2013-08-20 18:03:27 [email protected][T191,L2147,R75,V2925]

&lt; < less than
&gt; > greater than
&amp; & ampersand 
&apos; ' apostrophe
&quot; " quotation mark

//z 2014-04-10 17:47:22 BG57IV3@XCL T1043027031.K.F253293061 [T191,L2414,R116,V3989]
严格来讲,只有 < 和 & 在xml是非法的。但作为一个良好的习惯,上述字符串最好都替换掉的。
Note:
 Only the characters "<" and "&" are strictly illegal in XML. Apostrophes, quotation marks and greater than signs are legal, but it is a good habit to replace them.



Recipe 15.7. Handling Invalid Characters in an XML String

Problem

//z 2012-11-15 17:45:37 [email protected] .K[T3,L107,R3,V27]
You are creating an XML string. Before adding a tag containing a text element, you want to check it to determine whether the string contains any of the following invalid characters:

	<
	>
	"
	'
	&

If any of these characters are encountered, you want them to be replaced with their escaped form:

	&lt;
	&gt;
	&quot;
	&apos;
	&amp;

Solution

//z 2012-11-15 17:45:37 [email protected] .K[T3,L107,R3,V27]
There are different ways to accomplish this, depending on which XML-creation approach you are using. If you are using XmlWriter, theWriteCData,WriteString,WriteAttributeString,WriteValue, and WriteElementString methods take care of this for you. If you are usingXmlDocument andXmlElements, theXmlElement.InnerText method will handle these characters.

The two ways to handle this using an XmlWriter work like this. TheWriteCData method will wrap theinvalid character text in aCDATA section, as shown in the creation of theInvalidChars1 element in the example that follows. The other method, usingXmlWriter, is to use theWriteElementString method that will automatically escape the text for you, as shown while creating theInvalidChars2 element.

	// Set up a string with our invalid chars.
	string invalidChars = @"<>\&'";
	XmlWriterSettings settings = new XmlWriterSettings();
	settings.Indent = true;
	using (XmlWriter writer = XmlWriter.Create(Console.Out, settings))
	{
	    writer.WriteStartElement("Root");
	    writer.WriteStartElement("InvalidChars1");
	    writer.WriteCData(invalidChars);
	    writer.WriteEndElement();
	    writer.WriteElementString("InvalidChars2", invalidChars);
	    writer.WriteEndElement();
	}

The output from this is:

	<?xml version="1.0" encoding="IBM437"?>
	<Root>
	    <InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
	    <InvalidChars2>&lt;&gt;\&amp;'</InvalidChars2>
	</Root>

There are two ways you can handle this problem with XmlDocument andXmlElement. The first way is to surround the text you are adding to the XML element with a CDATA section and add it to theInnerXML property of the XmlElement:

	// Set up a string with our invalid chars.
	string invalidChars = @"<>\&'";
	XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
	invalidElement1.AppendChild(xmlDoc.CreateCDataSection(invalidChars));

The second way is to let the XmlElement class escape the data for you by assigning the text directly to theInnerText property like this:

	// Set up a string with our invalid chars.
	string invalidChars = @"<>\&'";
	XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
	invalidElement2.InnerText = invalidChars;

The whole XmlDocument is created with these XmlElements in this code:

	public static void HandlingInvalidChars( )
	{
	    // Set up a string with our invalid chars.
	    string invalidChars = @"<>\&'";

	    XmlDocument xmlDoc = new XmlDocument( );
	    // Create a root node for the document.
	    XmlElement root = xmlDoc.CreateElement("Root");
	    xmlDoc.AppendChild(root);

	    // Create the first invalid character node.
	    XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
	    // Wrap the invalid chars in a CDATA section and use the
	    // InnerXML property to assign the value as it doesn't
	    // escape the values, just passes in the text provided.
	    invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";
	    // Append the element to the root node.
	    root.AppendChild(invalidElement1);

	    // Create the second invalid character node.
	    XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
	    // Add the invalid chars directly using the InnerText
	    // property to assign the value as it will automatically
	    // escape the values.
	    invalidElement2.InnerText = invalidChars;
	    // Append the element to the root node.
	    root.AppendChild(invalidElement2);

	    Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml); 
	    Console.WriteLine( ); 
	}

The XML created by this procedure (and output to the console) looks like this:

	Generated XML with Invalid Chars: 
	<Root><InvalidChars1><![CDATA[<>\&']]></InvalidChars1><InvalidChars2>&lt;&gt;\ 
	&amp;'</InvalidChars2></Root>

Discussion

The CDATA node allows you to represent the items in the text section as character data, not as escapedXML, for ease of entry. Normally thesecharacters would need to be in their escaped format (&lt; for< and so on), but theCDATA section allows you to enter them as regular text.

When the CDATA tag is used in conjunction with the InnerXml property of theXmlElement class, you can submit characters that would normally need to be escaped first. TheXmlElement class also has an InnerText property that will automatically escape any markup found in the string assigned. This allows you to add these characters without having to worry about them.

See Also

See the "XmlDocument Class," "XmlWriter Class," "XmlElement Class," and "CDATA Sections" topics in the MSDN documentation.

//z 2012-11-15 17:45:37 [email protected] .K[T3,L107,R3,V27]
XML 非法 字符 转义 字符 处理

你可能感兴趣的:(Handling Invalid Characters in an XML String (zz.IS2120.BG57IV3))