I am working on a project which requires me to process xhtml files to fix the content of certain tags. The fixing itself is not a problem, however I have troubles when saving the files.
THe code I am using is:
var spanNodesList = p.GetSpanNodesList(xDoc);
foreach (XElement span in spanNodesList)
{
if (span.Value == null || span.Value == "")
{
span.Remove();
}
else
{
string[] words = p.SplitNodeText(span.Value);
XElement parent = span.Parent;
span.Remove();
foreach (string word in words)
{
parent.Add(new XElement("span", word,
new XAttribute("id", "w" + p.currentNodeID.ToString())));
p.currentNodeID++;
}
}
}
List<XElement> GetSpanNodesList(XDocument file)
{
//Get only 'word' nodes
var spanNodes = file.Descendants("{http://www.w3.org/1999/xhtml}span");
if (spanNodes != null)
{
var spanNodesList = spanNodes.ToList();
spanNodesList.RemoveAll(x => ((x.Attribute("id") == null) || !x.Attribute("id").Value.Contains("w")));
return spanNodesList;
}
else return null;
}
As firstly, I couldn't get any elements, I have found out somewhere in SO that I might need to add namespace reference to file.Descendants("{http://www.w3.org/1999/xhtml}span");
as it yielded no results. This has indeed helped and I get the nodes I want. However, the resulting code produces has two problems.
<span id="w1" xmlns="">Word one</span>
<span id="w2" xmlns="">Word two</span>
<span id="w3" xmlns="">Word three</span>
It adds the xmlns attribute, which I don't need (and which was not in the original file) and it adds <?xml version="1.0" encoding="utf-8"?>
header. I assume this is expected behaviour resulting from what I coded, so my question is - what can I do to remove these 'problems'. Or perhaps there's a better way of dealing with xHtml files? Also, I don't know if this is relevant, but source files have references to a number of different namespaces...
Cheers Bartosz
When you add the span
element, you're doing it without a namespace - whereas some ancestor element has set the default namespace. All you need to do is use the right namespace for your new elements:
XNamespace ns = "http://www.w3.org/1999/xhtml";
...
parent.Add(new XElement(ns + "span", ...);
Likewise you can use:
var spanNodes = file.Descendants(ns + "span");
which is rather more readable, IMO. You almost certainly don't need to worry about the XML declaration.
See more on this question at Stackoverflow