Khmer Unicode on SQL Server

Collations
The physical storage of character strings in SQL Server is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. SQL Server supports objects that have different collations being stored in a single database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations.

Microsoft SQL Server collation settings depend on the type of installation. In general, choose a SQL Server collation that supports the Microsoft Windows system locale most commonly used at your site. To identify your Windows system locale on computers running Microsoft Windows 2000, click Regional Settings in Control Panel, and then click the General tab if necessary to display the current system locale.

In most cases, a computer runs the Windows system locale that matches the language requirements of the user, so Microsoft SQL Server Setup automatically detects the Windows system locale and chooses the appropriate collation.

Notes: For Khmer Unicode, we must set collation of each column to: SQL_Latin1_General_CP850_BIN

Unicode Support of SQL Server
One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

Unicode support was introduced to SQL Server starting with SQL Server 7.0. Microsoft SQL Server allows the storage of Unicode data only in UCS-2 encoding.
Data Types
In Microsoft SQL Server, these data types support Unicode data:

* nchar
* nvarchar
* ntext

Note: The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types. Use of nchar, nvarchar, and ntext is the same as char, varchar and text, respectively, except that:

* Unicode supports a wider range of characters
* More space is needed to store Unicode characters
* The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar
* Unicode constants are specified with a leading N: N’A Unicode string’
* All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity

Published by sochinda

I really like new technology!!

16 thoughts on “Khmer Unicode on SQL Server

  1. i already change Collation to SQL_Latin1_General_CP850_BIN for each column need to store unicode string and set its data type to nvarchar(1000) or nvarchar(MAX). When I use insert statement i use N b4 the unicode string it is ok! I can see the string in khmer. BUT when I try to select using sql query with where condition it doesn’t work at all.

    1. Hi,

      It really work, man.

      1- Change collation
      2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
      3- Select Query: select * from where Field2 = N”

      Just easy.

      1. Hi Sochinda,
        I also had this problem for few days a go. I already change collation, and insert data to the table but I cannot use the query to select unicode charater when execute the query the result come out with all of rows, i use this “Select * from TblTest where Name=N’ស៊ីលីន’ ” do you have any way to solve it?

      2. I think, this way is perfect working good, so if you still in the trouble, please change to use Linq, it’s working with Khmer Unicode well.

  2. can we use khmer unicode as a primary key? let say i want the datatype of the field “ID” is NCHAR to store khmer unicode..

    Thanks

  3. sochinda :
    Hi,
    It really work, man.
    1- Change collation
    2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
    3- Select Query: select * from where Field2 = N”
    Just easy.

    If the unicode characters is come from Excelsheets imported to MSSQL,
    the data is also ok with your solution ?

    Thanks

  4. Good web site you have got here.. It’s hard to find high-quality writing like yours
    nowadays. I honestly appreciate individuals like you!
    Take care!!

  5. magnificent publish, very informative. I ponder why the opposite specialists of this sector don’t notice
    this. You must proceed your writing. I’m sure, you’ve a great readers’
    base already!

  6. Write more, thats all I have to say. Literally, it seems as though
    you relied on the video to make your point. You definitely know what
    youre talking about, why waste your intelligence on just posting videos to
    your weblog when you could be giving us something informative to read?

  7. After exploring a number of the blog posts on your site,
    I seriously appreciate your technique of blogging.
    I saved as a favorite it to my bookmark website list and will be checking back
    in the near future. Please visit my website as
    well and let me know what you think.

  8. A quality juicer that works well at lower speeds
    is the ideal type to look for. Most high functioning juicers
    will cost somewhere in the range of $200 to $2,000.
    You might think that taking the time to juice your own fruit
    is time consuming and wasteful.

Leave a reply to stainless steel juicer steamer Cancel reply