Is SQL server 2008 storing uf8

I have a java servlet that is using utf8. I have entered numerous characters (chinese traditional, russian, etc) and they seem to be stored in and retreived from sql server 2008 fine. Does sql server 2008 handle utf8 encoded strings? If not, why no problems, yet.

    psmt.setString(7, myString); //myString is UTF8 encoded
    psmt.executeUpdate(); 
Jon Skeet
people
quotationmark

No, your string is actually UTF-16-encoded - it's a Java string, and Java strings are sequences of UTF-16 code units.

It (mostly) doesn't matter how SQL server stores the value internally, so long as it can represent the same character repertoire. Your SQL schema should determine what values can be stored - how they're stored is irrelevant.

There are potentially many important encodings here:

  • The encoding you use when you present data from your app, e.g. via HTML
  • The encoding used internally by Java (UTF-16, as least as it's exposed by the language)
  • The encoding used to transfer data to the database (which is under the control of the database driver; this may or may not need to be configured)
  • The encoding used to store the data within the database

All of these can be different - so long as they can store the same set of characters. So while the bytes used to represent ☃ (U+2603, Unicode Snowman) will be different when it's encoded in UTF-8 and UTF-16, so long as it can be encoded (and is encoded properly), that's all that matters.

Now there are some cases where the abstraction falls down, of course - if you ask for the length of a string in Java with the String.length(), that's the length in UTF-16 code units, not in Unicode characters. Similarly, if you've got SQL fields which are limited to a particular length, you should pay careful attention to what those units are. (I don't know offhand for SQL Server 2008, but I've seen databases where that's in terms of UTF-16 code units, and others where it's in terms of bytes required in the UTF-8 encoding.) So in that sort of situation, the difference can be important - but the general "Can I store my data?" question doesn't require the encodings to be the same.

people

See more on this question at Stackoverflow