I have an xml like
<xml>
<Test>
<TestData>
<TestData>
<Name>Alex</Name>
</TestData>
</TestData>
</Test>
<Name>
<NameData>
<NameData>
<Name>Chris</Name>
</NameData>
</NameData>
</Name>
</xml>
I want to remove the duplicate nodes of TestData and NameData from the xml so that xml looks something like this
<xml>
<Test>
<TestData>
<Name>Alex</Name>
</TestData>
</Test>
<Name>
<NameData>
<Name>Chris</Name>
</NameData>
</Name>
</xml>
I have tried searching for some clue but every recommendation I come across has the node name specified. I have xmls which has large amount. Any c# class/method I can use to remove the duplicate.
LINQ to XML makes this reasonably easy if some assumptions are met:
<TestData><TestData><TestData>
. I'm sure it's feasible to work around that, but it's trickier.TestData
has text content as well as the nested TestData
element)In that case, it's just a matter of checking that there's exactly one child element, and that it's got the same name as the parent element... then replace it.
Here's some code to do exactly that:
using System;
using System.Linq;
using System.Xml.Linq;
class Program
{
static void Main(string[] args)
{
var doc = XDocument.Load("test.xml");
var replacements = doc.Descendants()
.Select(GetReplacementForParent)
.Where(r => r != null)
.ToList();
foreach (var replacement in replacements)
{
replacement.Parent.ReplaceWith(replacement);
}
Console.WriteLine(doc);
}
static XElement GetReplacementForParent(XElement element)
{
var child = element.Elements(element.Name).FirstOrDefault();
// TODO: Use a more efficient approach for counting children, maybe.
// TODO: Check for non-element content? Check for attributes?
return child != null && element.Elements().Count() == 1
? child : null;
}
}
See more on this question at Stackoverflow