Khmer Unicode on SQL Server

03 Oct

The physical storage of character strings in SQL Server is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. SQL Server supports objects that have different collations being stored in a single database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations.

Microsoft SQL Server collation settings depend on the type of installation. In general, choose a SQL Server collation that supports the Microsoft Windows system locale most commonly used at your site. To identify your Windows system locale on computers running Microsoft Windows 2000, click Regional Settings in Control Panel, and then click the General tab if necessary to display the current system locale.

In most cases, a computer runs the Windows system locale that matches the language requirements of the user, so Microsoft SQL Server Setup automatically detects the Windows system locale and chooses the appropriate collation.

Notes: For Khmer Unicode, we must set collation of each column to: SQL_Latin1_General_CP850_BIN

Unicode Support of SQL Server
One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

Unicode support was introduced to SQL Server starting with SQL Server 7.0. Microsoft SQL Server allows the storage of Unicode data only in UCS-2 encoding.
Data Types
In Microsoft SQL Server, these data types support Unicode data:

* nchar
* nvarchar
* ntext

Note: The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types. Use of nchar, nvarchar, and ntext is the same as char, varchar and text, respectively, except that:

* Unicode supports a wider range of characters
* More space is needed to store Unicode characters
* The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar
* Unicode constants are specified with a leading N: N’A Unicode string’
* All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity


Posted by on October 3, 2009 in SQL Server


16 responses to “Khmer Unicode on SQL Server

  1. Samnang

    March 23, 2010 at 10:35 AM

    This is a good article for solving Khmer Unicode problem with SQL server.

    Thank you for posting this

  2. Chomnit Chouy

    April 24, 2010 at 11:26 AM

    i already change Collation to SQL_Latin1_General_CP850_BIN for each column need to store unicode string and set its data type to nvarchar(1000) or nvarchar(MAX). When I use insert statement i use N b4 the unicode string it is ok! I can see the string in khmer. BUT when I try to select using sql query with where condition it doesn’t work at all.

    • sochinda

      April 26, 2010 at 8:11 AM


      It really work, man.

      1- Change collation
      2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
      3- Select Query: select * from where Field2 = N”

      Just easy.

      • Heng Sillin

        October 6, 2011 at 4:36 PM

        Hi Sochinda,
        I also had this problem for few days a go. I already change collation, and insert data to the table but I cannot use the query to select unicode charater when execute the query the result come out with all of rows, i use this “Select * from TblTest where Name=N’ស៊ីលីន’ ” do you have any way to solve it?

      • sochinda

        October 11, 2011 at 11:17 AM

        I think, this way is perfect working good, so if you still in the trouble, please change to use Linq, it’s working with Khmer Unicode well.

  3. sochinda

    February 9, 2011 at 9:21 AM

    For this case, if you use LINQ there is any problem will come any more.

  4. Nak

    February 9, 2011 at 6:11 PM

    can we use khmer unicode as a primary key? let say i want the datatype of the field “ID” is NCHAR to store khmer unicode..


    • sochinda

      February 10, 2011 at 3:03 PM

      Oops! I think it can but not good.

  5. sovantha

    August 11, 2011 at 10:43 AM

    I found this very helpful. Thanks Sochinda.

  6. Yang

    September 14, 2011 at 10:24 AM

    sochinda :
    It really work, man.
    1- Change collation
    2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
    3- Select Query: select * from where Field2 = N”
    Just easy.

    If the unicode characters is come from Excelsheets imported to MSSQL,
    the data is also ok with your solution ?


  7. Naomi

    July 7, 2014 at 1:29 AM

    Good web site you have got here.. It’s hard to find high-quality writing like yours
    nowadays. I honestly appreciate individuals like you!
    Take care!!

  8. peeky Movie Reivews

    August 3, 2014 at 1:49 PM

    magnificent publish, very informative. I ponder why the opposite specialists of this sector don’t notice
    this. You must proceed your writing. I’m sure, you’ve a great readers’
    base already!


    September 28, 2014 at 1:45 PM

    Write more, thats all I have to say. Literally, it seems as though
    you relied on the video to make your point. You definitely know what
    youre talking about, why waste your intelligence on just posting videos to
    your weblog when you could be giving us something informative to read?

  10. marathon training

    December 22, 2014 at 11:15 AM

    After exploring a number of the blog posts on your site,
    I seriously appreciate your technique of blogging.
    I saved as a favorite it to my bookmark website list and will be checking back
    in the near future. Please visit my website as
    well and let me know what you think.

  11. stainless steel juicer steamer

    January 18, 2015 at 5:52 AM

    A quality juicer that works well at lower speeds
    is the ideal type to look for. Most high functioning juicers
    will cost somewhere in the range of $200 to $2,000.
    You might think that taking the time to juice your own fruit
    is time consuming and wasteful.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: