Java UTF 16 conversion to UTF 8

Step 1: Making a REST call using HttpClient to Twitter endpoint and getting a tweet message containing an emoticon. Twitter APIs returns the string with UTF-8 encoding.
Example: Message = 😄;
Step 2: I am using Java to read the string, using InputStreamReader, with charset UTF-8. Still, the string's length turns out to be 2, rather than being 1.
How can this be possible, when I am explicitly parsing it using UTF-8?
On net I found several resources, where its mentioned that an emoticon is a high codepoint character, and thus java considers it to be of 2 characters (surrogate pair), which doesn't makes sense.
Can someone help me with it?

Jon Skeet
people
quotationmark

You've got a string with length 2 - because the length() property returns the number of UTF-16 code units, not the number of Unicode characters. Bear in mind that a String in Java is really a sequence of UTF-16 code units, not a sequence of characters.

As you say, that emoji is represented with a surrogate pair - it is U+1F604, represented in UTF-16 as U+D83D U+DE04.

If you call String.codePointCount instead of length(), you'll get 1:

public class Test {
    public static void main(String[] args) {
        String emoji = "\ud83d\ude04";
        System.out.println(emoji.length()); // 2
        System.out.println(emoji.codePointCount(0, emoji.length())); // 1
    }
}

Note that the fact that you created the string by decoding UTF-8 is entirely irrelevant to its content. Assuming you've got a string equal to the one in my sample code above, the decoding worked fine.

people

See more on this question at Stackoverflow