How to read XML data from the header of a mixed xml/binary file in C#

I have the task to write a reader for a file format with the following specification:

  1. First section is plain xml with metadata (utf-8);
  2. Last section is a stream of 16bit values (binary);
  3. These two sections are separated by one byte with value 29 (group separator in the ASCII table).

I see two ways to read the xml part of the file. The first one is to build a string byte by byte until I find the separator.

The other is to use some library that would parse the xml and automatically detect the end of well-formed xml.

The question is: is there any .NET library that would stop automatically after the last closing tag in the XML?

(or, can anyone suggest a saner way to read this kind of file format?)


UPDATE: Following the answer from Peter Duniho, with slight modifications, I ended up with this (it works, though not thoroughly unit-tested yet).

        int position = 0;
        MemoryStream ms;

        using (FileStream fs = File.OpenRead("file.xml"))
        using (ms = new MemoryStream())
        {
            int current;
            while ((current = fs.ReadByte()) > 0)
            {
                position++;

                if (current == 29)
                    break;

                ms.WriteByte((byte)current);
            }
        }

        var xmlheader = new XmlDocument();
        xmlheader.LoadXml(Encoding.UTF8.GetString(ms.ToArray()));
Jon Skeet
people
quotationmark

While the "read to the closing tag" sounds appealing, you'd need to have a parser which didn't end up buffering all the data.

I would read all the data into a byte[], then search for the separator there - then you can split the binary data into two, and parse each part appropriately. I would do that entirely working in binary, with no strings involved - you can create a MemoryStream for each section using new MemoryStrem(byte[], int, int) and then pass that to an XML parser and whatever your final section parser is. That way you don't need to worry about handling UTF-8, or detecting if a later version of the XML doesn't use UTF-8, etc.

So something like:

byte[] allData = File.ReadAllBytes(filename);
int separatorIndex = Array.IndexOf(allData, (byte) 29);
if (separatorIndex == -1)
{
    // throw an exception or whatever
}
var xmlStream = new MemoryStream(allData, 0, separatorIndex);
var lastPartStream = new MemoryStream(
      allData, separatorIndex + 1, allData.Length - separatorIndex - 1);

people

See more on this question at Stackoverflow