Khmer Unicode on SQL Server

sochindaOctober 3, 2009October 4, 2009SQL Server

Collations
The physical storage of character strings in SQL Server is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. SQL Server supports objects that have different collations being stored in a single database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations.

Microsoft SQL Server collation settings depend on the type of installation. In general, choose a SQL Server collation that supports the Microsoft Windows system locale most commonly used at your site. To identify your Windows system locale on computers running Microsoft Windows 2000, click Regional Settings in Control Panel, and then click the General tab if necessary to display the current system locale.

In most cases, a computer runs the Windows system locale that matches the language requirements of the user, so Microsoft SQL Server Setup automatically detects the Windows system locale and chooses the appropriate collation.

Notes: For Khmer Unicode, we must set collation of each column to: SQL_Latin1_General_CP850_BIN

Unicode Support of SQL Server
One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.

Unicode support was introduced to SQL Server starting with SQL Server 7.0. Microsoft SQL Server allows the storage of Unicode data only in UCS-2 encoding.
Data Types
In Microsoft SQL Server, these data types support Unicode data:

* nchar
* nvarchar
* ntext

Note: The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types. Use of nchar, nvarchar, and ntext is the same as char, varchar and text, respectively, except that:

* Unicode supports a wider range of characters
* More space is needed to store Unicode characters
* The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar
* Unicode constants are specified with a leading N: N’A Unicode string’
* All Unicode data uses the same Unicode code page. Collations do not control the code page used for Unicode columns, only attributes such as comparison rules and case sensitivity

Published by sochinda

I really like new technology!! View more posts

16 thoughts on “Khmer Unicode on SQL Server”

Samnang says:

March 23, 2010 at 10:35 AM

This is a good article for solving Khmer Unicode problem with SQL server.

Thank you for posting this

Reply
Chomnit Chouy says:

April 24, 2010 at 11:26 AM

i already change Collation to SQL_Latin1_General_CP850_BIN for each column need to store unicode string and set its data type to nvarchar(1000) or nvarchar(MAX). When I use insert statement i use N b4 the unicode string it is ok! I can see the string in khmer. BUT when I try to select using sql query with where condition it doesn’t work at all.

Reply
1. sochinda says:
  
  April 26, 2010 at 8:11 AM
  
  Hi,
  
  It really work, man.
  
  1- Change collation
  2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
  3- Select Query: select * from where Field2 = N”
  
  Just easy.
  
  Reply
  1. Heng Sillin says:
    
    October 6, 2011 at 4:36 PM
    
    Hi Sochinda,
    I also had this problem for few days a go. I already change collation, and insert data to the table but I cannot use the query to select unicode charater when execute the query the result come out with all of rows, i use this “Select * from TblTest where Name=N’ស៊ីលីន’ ” do you have any way to solve it?
  2. sochinda says:
    
    October 11, 2011 at 11:17 AM
    
    I think, this way is perfect working good, so if you still in the trouble, please change to use Linq, it’s working with Khmer Unicode well.
sochinda says:

February 9, 2011 at 9:21 AM

For this case, if you use LINQ there is any problem will come any more.

Reply
Nak says:

February 9, 2011 at 6:11 PM

can we use khmer unicode as a primary key? let say i want the datatype of the field “ID” is NCHAR to store khmer unicode..

Thanks

Reply
1. sochinda says:
  
  February 10, 2011 at 3:03 PM
  
  Oops! I think it can but not good.
  
  Reply
sovantha says:

August 11, 2011 at 10:43 AM

I found this very helpful. Thanks Sochinda.

Reply
Yang says:

September 14, 2011 at 10:24 AM

sochinda :
Hi,
It really work, man.
1- Change collation
2- Insert: insert into (Field1, Field2, … ) values(value1, N’value2′);
3- Select Query: select * from where Field2 = N”
Just easy.

If the unicode characters is come from Excelsheets imported to MSSQL,
the data is also ok with your solution ?

Thanks

Reply
Pingback: Khmer Unicode on MS SQL Server (2005) | Tiniss Blog
Naomi says:

July 7, 2014 at 1:29 AM

Good web site you have got here.. It’s hard to find high-quality writing like yours
nowadays. I honestly appreciate individuals like you!
Take care!!

Reply
peeky Movie Reivews says:

August 3, 2014 at 1:49 PM

magnificent publish, very informative. I ponder why the opposite specialists of this sector don’t notice
this. You must proceed your writing. I’m sure, you’ve a great readers’
base already!

Reply
http://43crnetwork.co.uk says:

September 28, 2014 at 1:45 PM

Write more, thats all I have to say. Literally, it seems as though
you relied on the video to make your point. You definitely know what
youre talking about, why waste your intelligence on just posting videos to
your weblog when you could be giving us something informative to read?

Reply
marathon training says:

December 22, 2014 at 11:15 AM

After exploring a number of the blog posts on your site,
I seriously appreciate your technique of blogging.
I saved as a favorite it to my bookmark website list and will be checking back
in the near future. Please visit my website as
well and let me know what you think.

Reply
stainless steel juicer steamer says:

January 18, 2015 at 5:52 AM

A quality juicer that works well at lower speeds
is the ideal type to look for. Most high functioning juicers
will cost somewhere in the range of $200 to $2,000.
You might think that taking the time to juice your own fruit
is time consuming and wasteful.

Reply