I get exception if in XElement
's content I include characters such as '\x1A', '\x1B', '\x1C', '\x1D', '\x1E' or '\x1F'.
using System;
using System.Collections.Generic;
using System.Xml.Linq;
namespace LINQtoXMLInvalidChars
{
class Program
{
private static readonly IReadOnlyCollection<char> InvalidCharactersInXml = new List<char>
{
'<',
'>',
'&',
'\'',
'\"',
'\x1A',
'\x1B',
'\x1C',
'\x1D',
'\x1E',
'\x1F'
};
static void Main()
{
foreach (var c in InvalidCharactersInXml)
{
var xEl = new XElement("tag", "Character: " + c);
var xDoc = new XDocument(new XDeclaration("1.0", "utf-8", null), xEl);
try
{
Console.Write("Writing " + c + ": ");
Console.WriteLine(xDoc);
}
catch (Exception e)
{
Console.WriteLine("Oops. " + e.Message);
}
}
Console.ReadKey();
}
}
}
In an answer from Jon Skeet to the question String escape into XML I read
You set the text in a node, and it will automatically escape anything it needs to.
So now I'm confused. Do I misunderstand something?
Some background information: The string content of the XElement
comes from the end user. I see two options for making my application robust: 1) to Base-64 encode the string before passing it in to XElement
2) to narrow the accepted set of characters to e.g. alphanumeric characters.
Most of those characters simply aren't valid in XML 1.0 at all. Personally I wish that LINQ to XML would fail to produce a document that later it wouldn't be able to parse, but basically you should avoid them.
I would also recommend avoiding \x
as an escape sequence anyway, preferring \u
- the fact that \x
will take "up to" 4 hex digits can be very confusing.
From the XML 1.0 spec:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Now U+000D and U+000A are interesting cases - they won't be escaped in text nodes; they'll just be included verbatim. Whether or not that's then present when you parse the node will depend on parse settings (and whether there are non-whitespace characters around it).
In terms of how to handle this in your case: you definitely have options of:
We can't really tell which of these is most appropriate in your scenario.
See more on this question at Stackoverflow