Khmer Unicode on SQL Server

Collations
The physical storage of character strings in SQL Server is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. SQL Server supports objects that have different collations being stored in a single database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations.

Microsoft SQL Server collation settings depend on the type of installation. In general, choose a SQL Server collation that supports the Microsoft Windows system locale most commonly used at your site. To identify your Windows system locale on computers running Microsoft Windows 2000, click Regional Settings in Control Panel, and then click the General tab if necessary to display the current system locale.

In most cases, a computer runs the Windows system locale that matches the language requirements of the user, so Microsoft SQL Server Setup automatically detects the Windows system locale and chooses the appropriate collation.

Notes: For Khmer Unicode, we must set collation of each column to: SQL_Latin1_General_CP850_BIN

Unicode Support of SQL Server
One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

Unicode support was introduced to SQL Server starting with SQL Server 7.0. Microsoft SQL Server allows the storage of Unicode data only in UCS-2 encoding.
Data Types
In Microsoft SQL Server, these data types support Unicode data:

* nchar
* nvarchar
* ntext

Note: The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types. Use of nchar, nvarchar, and ntext is the same as char, varchar and text, respectively, except that:

* Unicode supports a wider range of characters
* More space is needed to store Unicode characters
* The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar
* Unicode constants are specified with a leading N: N’A Unicode string’
* All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity

16 comments

  1. Samnang · March 23, 2010

    This is a good article for solving Khmer Unicode problem with SQL server.

    Thank you for posting this

  2. Chomnit Chouy · April 24, 2010

    i already change Collation to SQL_Latin1_General_CP850_BIN for each column need to store unicode string and set its data type to nvarchar(1000) or nvarchar(MAX). When I use insert statement i use N b4 the unicode string it is ok! I can see the string in khmer. BUT when I try to select using sql query with where condition it doesn’t work at all.

    • sochinda · April 26, 2010

      Hi,

      It really work, man.

      1- Change collation
      2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
      3- Select Query: select * from where Field2 = N”

      Just easy.

      • Heng Sillin · October 6, 2011

        Hi Sochinda,
        I also had this problem for few days a go. I already change collation, and insert data to the table but I cannot use the query to select unicode charater when execute the query the result come out with all of rows, i use this “Select * from TblTest where Name=N’ស៊ីលីន’ ” do you have any way to solve it?

      • sochinda · October 11, 2011

        I think, this way is perfect working good, so if you still in the trouble, please change to use Linq, it’s working with Khmer Unicode well.

  3. sochinda · February 9, 2011

    For this case, if you use LINQ there is any problem will come any more.

  4. Nak · February 9, 2011

    can we use khmer unicode as a primary key? let say i want the datatype of the field “ID” is NCHAR to store khmer unicode..

    Thanks

    • sochinda · February 10, 2011

      Oops! I think it can but not good.

  5. sovantha · August 11, 2011

    I found this very helpful. Thanks Sochinda.

  6. Yang · September 14, 2011

    sochinda :
    Hi,
    It really work, man.
    1- Change collation
    2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
    3- Select Query: select * from where Field2 = N”
    Just easy.

    If the unicode characters is come from Excelsheets imported to MSSQL,
    the data is also ok with your solution ?

    Thanks

  7. Pingback: Khmer Unicode on MS SQL Server (2005) | Tiniss Blog
  8. Naomi · July 7, 2014

    Good web site you have got here.. It’s hard to find high-quality writing like yours
    nowadays. I honestly appreciate individuals like you!
    Take care!!

  9. peeky Movie Reivews · August 3, 2014

    magnificent publish, very informative. I ponder why the opposite specialists of this sector don’t notice
    this. You must proceed your writing. I’m sure, you’ve a great readers’
    base already!

  10. http://43crnetwork.co.uk · September 28, 2014

    Write more, thats all I have to say. Literally, it seems as though
    you relied on the video to make your point. You definitely know what
    youre talking about, why waste your intelligence on just posting videos to
    your weblog when you could be giving us something informative to read?

  11. marathon training · December 22, 2014

    After exploring a number of the blog posts on your site,
    I seriously appreciate your technique of blogging.
    I saved as a favorite it to my bookmark website list and will be checking back
    in the near future. Please visit my website as
    well and let me know what you think.

  12. stainless steel juicer steamer · January 18, 2015

    A quality juicer that works well at lower speeds
    is the ideal type to look for. Most high functioning juicers
    will cost somewhere in the range of $200 to $2,000.
    You might think that taking the time to juice your own fruit
    is time consuming and wasteful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s