My name
is
Jon Skeet

Replacing Special Characters in Strings

I've seen a lot of Q&A here on SO related to this question. And I have used a few examples but something just isn't working:

def input = 'now is thé timé'
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")

The output of the above is

now-is-th-tim

If I do the following:

String input = 'now is th\u00E9 tim\u00E9'
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")

I get

now-is-the-time

which is what I want. I even tried the following:

def input = groovy.json.StringEscapeUtils.escapeJavaScript('now is thé timé')
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")

but I get

now-is-th\u221A\u00A9-tim\u221A\u00A9

Any suggestions?

UPDATE: Based on the comments, I tried the following:

import java.text.Normalizer

def input = new File('file.txt').text
def results = Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "")
    .replaceAll(/ +/, "-")
println results

file.txt contains the text I had placed in the string. And that is working as expected. So there is something going on with the encoding of the string definition in groovy.

Given that your second snippet works, I strongly suspect that for the first snippet the encoding you're using in your editor isn't the same as the encoding your Groovy interpreter/compiler is using.

In other words, the problem isn't in the second line of your code - it's in the first line. You're not starting with the input text that you think you are.

See more on this question at Stackoverflow