Encoding errors in embedded Json file

I have run into an issue and can't quite get my head around it.

I have this code:

    public List<NavigationModul> LoadNavigation()
    {
        byte[] navBytes = NavigationResources.Navigation;
        var encoding = GetEncoding(navBytes);            
        string json = encoding.GetString(navBytes);
        List<NavigationModul> navigation = JsonConvert.DeserializeObject<List<NavigationModul>>(json);
        return navigation;
    }

    public static Encoding GetEncoding(byte [] textBytes)
    {
       if (textBytes[0] == 0x2b && textBytes[1] == 0x2f && textBytes[2] == 0x76) return Encoding.UTF7;
        if (textBytes[0] == 0xef && textBytes[1] == 0xbb && textBytes[2] == 0xbf) return Encoding.UTF8;
        if (textBytes[0] == 0xff && textBytes[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
        if (textBytes[0] == 0xfe && textBytes[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
        if (textBytes[0] == 0 && textBytes[1] == 0 && textBytes[2] == 0xfe && textBytes[3] == 0xff) return Encoding.UTF32;
        return Encoding.ASCII;
    }

The Goal is to load an embedded Json File (NavigationResources.Navigation) from a ResourceFile. The Navigation File is an embedded file. We are just jusing the ResourceManager to avoid Magic strings.

After loading the bytes of the embedded file and checking for its encoding, I now read the String from the file and pass it to the JsonConverter.DeserializeObject function.

But unfortunaly this fails due to invalid Json. Long story short: The loaded json string still contains encoding identification bytes. And I can't figure out how to get rid of it.

I also tryed to convert the utf8 bytearray to default encoding before loading the string but this only makes the encoding bytes become a visible charecter.

I talked to my peers and they told me that they have run into the same problem reading embedded batchfiles, leading to broken batchfiles. They didn't know how to fix the problem either, but came up with a workaround for the batchfiles itself (add a blank line into the batchfile to make it work)

Any suggestions on how to fix this?

Jon Skeet
people
quotationmark

Here's a simpler approach, removing the BOM after decoding:

// Your data is always in UTF-8 apparently, so just rely on that.
string text = Encoding.UTF8.GetString(data);
if (text.StartsWith("\ufeff"))
{
    text = text.Substring(1);
}

This has the downside of copying the string, of course.

Or if you do want to skip the bytes:

// Again, we're assuming UTF-8
int start = data.Length >= 3 && data[0] == 0xef &&
            data[1] == 0xbb && data[2] == 0xbf)
            ? 3 : 0;
string text = Encoding.UTF8.GetString(data, start, data.Length - start);

That way you don't need to use Skip and ToArray, and it avoids doing any extraneous copying.

people

See more on this question at Stackoverflow