I've seen a lot of Q&A here on SO related to this question. And I have used a few examples but something just isn't working:
def input = 'now is thé timé'
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")
The output of the above is
now-is-th-tim
If I do the following:
String input = 'now is th\u00E9 tim\u00E9'
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")
I get
now-is-the-time
which is what I want. I even tried the following:
def input = groovy.json.StringEscapeUtils.escapeJavaScript('now is thé timé')
println Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "").replaceAll(/ +/, "-")
but I get
now-is-th\u221A\u00A9-tim\u221A\u00A9
Any suggestions?
UPDATE: Based on the comments, I tried the following:
import java.text.Normalizer
def input = new File('file.txt').text
def results = Normalizer.normalize(input, Normalizer.Form.NFD).replaceAll(/[^A-z0-9 ]/, "")
.replaceAll(/ +/, "-")
println results
file.txt contains the text I had placed in the string. And that is working as expected. So there is something going on with the encoding of the string definition in groovy.
Given that your second snippet works, I strongly suspect that for the first snippet the encoding you're using in your editor isn't the same as the encoding your Groovy interpreter/compiler is using.
In other words, the problem isn't in the second line of your code - it's in the first line. You're not starting with the input text that you think you are.
See more on this question at Stackoverflow