I have the task to write a reader for a file format with the following specification:
29
(group separator in the ASCII table).I see two ways to read the xml part of the file. The first one is to build a string byte by byte until I find the separator.
The other is to use some library that would parse the xml and automatically detect the end of well-formed xml.
The question is: is there any .NET library that would stop automatically after the last closing tag in the XML?
(or, can anyone suggest a saner way to read this kind of file format?)
UPDATE: Following the answer from Peter Duniho, with slight modifications, I ended up with this (it works, though not thoroughly unit-tested yet).
int position = 0;
MemoryStream ms;
using (FileStream fs = File.OpenRead("file.xml"))
using (ms = new MemoryStream())
{
int current;
while ((current = fs.ReadByte()) > 0)
{
position++;
if (current == 29)
break;
ms.WriteByte((byte)current);
}
}
var xmlheader = new XmlDocument();
xmlheader.LoadXml(Encoding.UTF8.GetString(ms.ToArray()));
While the "read to the closing tag" sounds appealing, you'd need to have a parser which didn't end up buffering all the data.
I would read all the data into a byte[]
, then search for the separator there - then you can split the binary data into two, and parse each part appropriately. I would do that entirely working in binary, with no strings involved - you can create a MemoryStream
for each section using new MemoryStrem(byte[], int, int)
and then pass that to an XML parser and whatever your final section parser is. That way you don't need to worry about handling UTF-8, or detecting if a later version of the XML doesn't use UTF-8, etc.
So something like:
byte[] allData = File.ReadAllBytes(filename);
int separatorIndex = Array.IndexOf(allData, (byte) 29);
if (separatorIndex == -1)
{
// throw an exception or whatever
}
var xmlStream = new MemoryStream(allData, 0, separatorIndex);
var lastPartStream = new MemoryStream(
allData, separatorIndex + 1, allData.Length - separatorIndex - 1);
See more on this question at Stackoverflow