I get a string which has CDATA and I want to remove that.
Input : "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>"
Output I want : <text>Hello</text>
<text>World</text>
I want to take all data between <text>
and </text>
and add it to a list.
The code I try is :
private List<XElement> Foo(string input)
{
string pattern = "<text>(.*?)</text>";
input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>" //For Testing
var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
var a = matches.Cast<Match>().Select(m => m.Groups[1].Value.Trim()).ToArray();
List<XElement> li = new List<XElement>();
XElement xText;
for (int i = 0; i < a.Length; i++)
{
xText = new XElement("text");
xText.Add(System.Net.WebUtility.HtmlDecode(a[i]));
li.Add(xText);
}
return li;
}
But, Here I get output as :
<text><![CDATA[Hello]]></text>
<text><![CDATA[World]]></text>
Can anyone please help me up.
It seems to me that you shouldn't be using a regular expression at all. Instead, construct a valid XML document be wrapping it all in a root element, then parse it and extract the elements you want.
You also want to replace all CDATA nodes with their equivalent text nodes. You can do that before or after you extract the elements into a list, but I've chosen to do it before:
using System;
using System.Linq;
using System.Xml.Linq;
class Test
{
static void Main()
{
string input = "<Text><![CDATA[Hello]]></Text><Text><![CDATA[World]]></Text>";
string xml = "<root>" + input + "</root>";
var doc = XDocument.Parse(xml);
var nodes = doc.DescendantNodes().OfType<XCData>().ToList();
foreach (var node in nodes)
{
node.ReplaceWith(new XText(node.Value));
}
var elements = doc.Root.Elements().ToList();
elements.ForEach(Console.WriteLine);
}
}
See more on this question at Stackoverflow