java unicode conversion on linux not working on max os x

I am writing a java application on Ubuntu Linux that reads in a text file and creates an xml file from the data. Some of the text contains curly apostrophes and quotes that I convert to straight apostrophes and quotes using the following code:

dataLine = dataLine.replaceAll( "[\u2018|\u2019]", "\u0027" ).replaceAll( "[\u201C|\u201D]", "\u005c\u0022" );

This works fine, but when I port the jar file to a Mac OSX machine, I get three question marks where I should get straight apostrophes and quotes. I created a test application on the Mac using the same line of code to do the conversion and the same test file for input and it worked fine. Why doesn't the jar file created on the Linux machine work correctly on a Mac? I thought java was supposed to be cross platform compatible.

Jon Skeet
people
quotationmark

Chances are you'tr not reading the file correctly to start with. You haven't shown how you're reading the file, but my guess is that you're just using FileReader, or an InputStreamReader without specifying the encoding. In that case, the default platform encoding is used - and if that's not the actual encoding of the file, you won't be reading the right characters. You should be able to detect that without doing any replacement at all.

Instead, you should use a FileInputStream and wrap it in an InputStreamReader with the correct encoding - which is likely to be UTF-8 as it's XML. (You should be able to check this easily.)

people

See more on this question at Stackoverflow