My name
is
Jon Skeet

Removing Duplicate Nodes from Xml using C#

I have an xml like

<xml>
 <Test>
  <TestData>
   <TestData>
    <Name>Alex</Name>
   </TestData>
  </TestData>
 </Test>
 <Name>
 <NameData>
   <NameData>
    <Name>Chris</Name>
   </NameData>
  </NameData>
 </Name>
</xml>

I want to remove the duplicate nodes of TestData and NameData from the xml so that xml looks something like this

 <xml>
     <Test>
      <TestData>
        <Name>Alex</Name>
      </TestData>
     </Test>
     <Name>
     <NameData>
        <Name>Chris</Name>
      </NameData>
     </Name>
    </xml>

I have tried searching for some clue but every recommendation I come across has the node name specified. I have xmls which has large amount. Any c# class/method I can use to remove the duplicate.

LINQ to XML makes this reasonably easy if some assumptions are met:

There are no elements with "triple duplication" e.g. <TestData><TestData><TestData>. I'm sure it's feasible to work around that, but it's trickier.
We don't need to worry about non-element children (e.g. where TestData has text content as well as the nested TestData element)
We don't need to worry about attributes

In that case, it's just a matter of checking that there's exactly one child element, and that it's got the same name as the parent element... then replace it.

Here's some code to do exactly that:

using System;
using System.Linq;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        var doc = XDocument.Load("test.xml");
        var replacements = doc.Descendants()
            .Select(GetReplacementForParent)
            .Where(r => r != null)
            .ToList();
        foreach (var replacement in replacements)
        {
            replacement.Parent.ReplaceWith(replacement);
        }
        Console.WriteLine(doc);
    }

    static XElement GetReplacementForParent(XElement element)
    {
        var child = element.Elements(element.Name).FirstOrDefault();
        // TODO: Use a more efficient approach for counting children, maybe.
        // TODO: Check for non-element content? Check for attributes?
        return child != null && element.Elements().Count() == 1
            ? child : null;
    }
}

See more on this question at Stackoverflow