Character set conversion (re Metalink notes 260192.1 and 225938.1) - Why? - oracle-l

Character set conversion (re Metalink notes 260192.1 and 225938.1) - Why?

From: "Charles Schultz" <sacrophyte@xxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 22 Oct 2008 09:10:09 -0500

Good day list,

I filed this question with Oracle Support, but they have been slow in
responding.

Metalink note 260192.1 (*Changing WE8ISO8859P1/ WE8ISO8859P15 or
WE8MSWIN1252 to (AL32)UTF8*) says:
If there is any "Lossy" data then those rows contain code points that are
not currently stored correctly and they should be cleared up before you can
continue with the steps in this note. Please see the following note for
clearing up any "Lossy" data: Note
225938.1<http://metalink/plsql/showdoc?db=NOT&id=225938.1&blackframe=1>Database
Character Set Healthcheck

The thought is that in order to resolve lossy data when converting from a
WE8ISO8859P1 (aka ISO) character set to a (AL32)UTF8 character set, one
first has to convert to WE8MSWIN1252 (aka, WIN)  (the superset of
WE8ISO8859P1). Why?

I know the reasons that Oracle gives via the many Metalink notes
(341676.1,264294.1,260192.1,252352.1,225938.1,158577.1,123670.1,225912.1).
They boil down to the fact that the ISO set does not contain as many
codepoints as the WIN set. While this is true, it is also true that the WIN
set has *all* the codepoints that the ISO set does (hence the definition of
a superset).

So in essence, when one converts the character set from ISO to WIN, the
codepoints never change. If you store an "unsupported" character, say, the
Euro (128) or left double quote (147) in an ISO database, the ascii value
will remain the same (128 and 147 respectively) when you "convert".

But why? If I export my data from a WE8ISO8859P1 database with NLS_LANG =
WE8MSWIN1252 and then turn around and import that into a AL32UTF8 database,
all my lossy data is automagically converted. And so it should be. So why
does Oracle documentation tell me that I have to clean up the lossy data
first, when a much easier and foolproof method is available? What am I
missing?

Simple example

Source: OSEE 10.2.0.2 (EXPRT)
Target: OSEE 10.2.0.3 (PUMP)

Source:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

SQL> select VALUE from nls_database_parameters where PARAMETER =
'NLS_CHARACTERSET';

VALUE
--------------------
WE8ISO8859P1

SQL> create table oracle.lossy_test (a char(1));

Table created.

SQL> insert into oracle.lossy_test values (chr(128));

1 row created.

$> expdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=EXPRT_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST



Target:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

SQL > select VALUE from nls_database_parameters where PARAMETER =
'NLS_CHARACTERSET';

VALUE
--------------------
AL32UTF8

SQL > show parameter seman
NAME                                 TYPE     VALUE
------------------------------------ -------- ------------------------------
nls_length_semantics                 string   CHAR

SQL > create table oracle.lossy_test (a char(1 CHAR));

Table created.

PUMP_SQL > select * from oracle.lossy_test;

no rows selected

$> impdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=PUMP_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST
TABLE_EXISTS_ACTION=TRUNCATE


With a windows client (ie, sqlplusw), I can see that the Euro shows up just
fine in both databases.


-- 
Charles Schultz

Character set conversion (re Metalink notes 260192.1 and 225938.1) - Why?

Other related posts: