{"id":25645,"date":"2018-12-18T10:00:57","date_gmt":"2018-12-18T18:00:57","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?p=25645"},"modified":"2018-12-18T08:55:22","modified_gmt":"2018-12-18T16:55:22","slug":"introducing-utf-8-support-in-sql-server-2019-preview","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2018\/12\/18\/introducing-utf-8-support-in-sql-server-2019-preview\/","title":{"rendered":"Introducing UTF-8 support in SQL Server 2019 preview"},"content":{"rendered":"

With the first public preview of SQL Server 2019<\/a>, we announced support for the widely used UTF-8 character encoding as an import or export encoding, and as database-level or column-level collation for string data.\u00a0This is an asset for companies extending their businesses to a global scale, where the requirement of providing global multilingual database applications and services is critical to meet customer demands, and specific market regulations. The benefits of introducing UTF-8 support extend to scenarios where legacy applications require internationalization and use inline queries: the amount of changes and testing involved to convert an application and underlying database to UTF-16 can be costly, by requiring complex string processing logic that affect application performance.<\/p>\n

To limit the amount of changes required for the above scenarios, UTF-8 is enabled in existing the data types CHAR and VARCHAR. String data is automatically encoded to UTF-8 when creating or changing an object\u2019s collation to a collation with the \u201cUTF8\u201d suffix, for example from LATIN1_GENERAL_100_CI_AS_SC to LATIN1_GENERAL_100_CI_AS_SC_UTF8. Refer to Set or Change the Database Collation<\/a> and\u00a0 Set or Change the Column Collation<\/a> for more details on how to perform those changes. NCHAR and NVARCHAR remain unchanged and only allow UTF-16 encoding.<\/p>\n

UTF-8 is only available to Windows collations that support supplementary characters<\/a>, as introduced in SQL Server 2012. You can see all available UTF-8 collations by executing the bellow command in your SQL Server 2019 CTP:<\/p>\n

SELECT Name, Description FROM fn_helpcollations() \r\nWHERE Name like '%UTF8';<\/pre>\n

Additionally, if your dataset uses primarily Latin characters, significant storage savings may also be achieved as compared to UTF-16 data types. For example, changing an existing column data type from NCHAR(10) to CHAR(10) using an UTF-8 enabled collation, translates into nearly 50 percent reduction in storage requirements. This is because NCHAR(10) requires 22 bytes for storage, whereas CHAR(10) requires 12 bytes for the same Unicode string.<\/p>\n

Getting started<\/h2>\n