Does Java read 0xA0 as 0xFFFD

One of my data processing modules crashed while reading ANSI input. Looking at the string in question using a hex viewer, there was a mysterious 0xA0 byte at the end of it.

Turns out this is non-breaking space.

I tried replacing that:

String s = s.replace("\u00A0", "");

But it didn't work.

I then went and printed out what that character is using charAt and Java reports

65533

or 0xFFFD

Plugging that into the replace code, I finally got rid of it!

But why do I see an 0xA0 in the file, but Java reads it as 0xFFFD?

BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8"));
 String line = r.readLine();
 while (line != null){
     // do stuff
     line = r.readLine();
  }
Jon Skeet
people
quotationmark

U+FFFD is the "Unicode replacement character", which is generally used to represent "some binary data which couldn't be decoded correctly in the encoding you were using". (Sometimes ? is used for this instead, but U+FFFD is generally a better idea, as it's unambiguous.)

Its presence is usually a sign that you've tried to use the wrong encoding. You haven't specified which encoding you were using - or indeed how you were using it - but that's probably the problem. Check the encoding you're using and the encoding of the file. Be aware that "ANSI" isn't an encoding - there are lots of encodings which are known as ANSI encodings, and you'll need to pick the right one for your file.

people

See more on this question at Stackoverflow