I am implementing a TryParse()
method for an ASCII string class. The method takes a string and converts it to a C-style string (i.e. a null-terminated ASCII string).
I had been using only a Parse()
, doing the conversion to ASCII using::
public static bool Parse(string s, out byte[] result)
{
result = null;
if (s == null || s.Length < 1)
return false;
byte[]d = new byte[s.Length + 1]; // Add space for null-terminator
System.Text.Encoding.ASCII.GetBytes(s).CopyTo(d, 0);
// GetBytes can throw exceptions
// (so can CopyTo() but I can replace that with a loop)
result = d;
return true;
}
However, as part of the idea of a TryParse is to remove the overhead of exceptions, and GetBytes()
throws exceptions, I'm looking for a different method that does not do so.
Maybe there is a TryGetbytes()
-like method?
Or maybe we can reason about the expected format of a standard .Net string
and perform the change mathematically (I'm not overly familiar with UTF encodings)?
EDIT: I guess for non-ASCII chars in the string, the TryParse()
method should return false
EDIT: I expect when I get around to implementing the ToString()
method for this class I may need to do the reverse there.
Two options:
You could just ignore Encoding
entirely, and write the loop yourself:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (c > 127)
{
return false;
}
buffer[i] = (byte) c;
}
result = buffer;
return true;
}
That's simple, but may be slightly slower than using Encoding.GetBytes
.
The second option would be to use a custom EncoderFallback
:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
var fallback = new CustomFallback();
var encoding = new ASCIIEncoding { EncoderFallback = fallback };
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
// Use overload of Encoding.GetBytes that writes straight into the buffer
encoding.GetBytes(s, 0, s.Length, buffer, 0);
if (fallback.HadErrors)
{
return false;
}
result = buffer;
return true;
}
That would require writing CustomFallback
though - it would need to basically keep track of whether it had ever been asked to handle invalid input.
If you didn't mind an encoding processing the data twice, you could call Encoding.GetByteCount
with a UTF-8-based encoding with a replacement fallback (with a non-ASCII replacement character), and check whether that returns the same number of bytes as the number of chars in the string. If it does, call Encoding.ASCII.GetBytes
.
Personally I'd go for the first option unless you have reason to believe it's too slow.
See more on this question at Stackoverflow