Exploring NLS: How to decipher output from NLSSORT

  • From: "Charles Schultz" <sacrophyte@xxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 1 Jan 2009 21:04:11 -0600

Good day, List,

I am attempting to understand the linguistic nuances of how Oracle
implements various character sets (territory and language). I have found a
number of useful white papers, metalink notes (and of course the
Globalization online Docs are good), but I have stumbled upon an obstacle of
sorts (tongue in cheek).

Metalink note 13882.1 (*Linguistic Sorting of Data in Oracle7 and Oracle8*)
talks about major and minor values, and also mentions the WB# (used
internally). I am assuming this information is still valid in 9i and 10g, as
there are no other newer metalink documents that cover these specific
details (that I have found). So, for instance, capital A grave (À) is listed
with a major value of 20, minor 5 (WB# = 300). I also noticed that the
online documentation has a small sample glyph
table<http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch5lingsort.htm#g1010269>with
slightly different values (and despite the little NOTE below the
table,
I get the same results in a non-unicode character set as a unicode one). How
do I expose that in sql? I assume NLSSORT gives me the proper numbers, but I
am not sure how to reverse engineer the output. The following example is
from 10gR2 AMERICAN_AMERICA.WE8ISO8859P1:

SQL > alter session set nls_comp = LINGUISTIC;

Session altered.

SQL > alter session set NLS_SORT=generic_m;

Session altered.

SQL > select nlssort('A') from dual;

NLSSORT('A')
--------------
01EA0000020006

SQL > select nlssort('À') from dual;

NLSSORT('À')
----------------
01EA0000020D0006


I set the nls_comp and nls_sort parameters because the default (binary
sorting) only gave me basic ascii values for the sort numbers, which is not
what I want. I am not overly familiar with either parameter yet, so it is
possible I could be using a "better" or easier nls_sort than generic_m.

I am assuming the baseletter is wrapped up in 01EA000002, since that is the
same for other A diacritics as well.

If anyone knows of a better way to get the major/minor sort values, please
let me know. The reason why I am pursuing this little nuance is for the
purpose of working up a lab that I demo to others, and I personally learn
much better when I have concrete examples I can put my hands on.


TIA

-- 
Charles Schultz

Other related posts:

  • » Exploring NLS: How to decipher output from NLSSORT - Charles Schultz