RE: Database Character set related question

  • From: "Justin Cave (DDBC)" <jcave@xxxxxxxxxxx>
  • To: <ashoke.k.mandal@xxxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 28 Oct 2005 13:52:26 -0600

Having 8-bit values in CHAR/ VARCHAR2 columns in a database whose
character set us US7ASCII is not supported by Oracle and can cause
significant problems, including irretrievably corrupt data.  

 

If the client's NLS_LANG and the server's character set match, Oracle
currently bypasses the character set conversion step at the network
level (it has been threatened that this will change in the future) and
trusts that the data being sent is, in fact, the declared character set.
If every application that accesses the data claims that the data is
US7ASCII but treats it internally as some other character set, it can
appear that things are working correctly.  If any application or client
machine deviates from that falsehood, very bad things happen.  If you
have two different applications that use different character sets
internally (i.e. one is using UTF8 and the other is using Windows-1252),
you can have a situation where half of your data is encoded using UTF-8
and the other half is encoded in Windows-1252 and Oracle has no way of
knowing what values are encoded in what character set.  Untangling data
that has been corrupted this way is a challenge to say the least.

 

You cannot use export & import to move data to a new machine with a
different character set.  If you do, Oracle will do a character set
conversion at some point and corrupt your data, since it will assume
that the data is valid US7ASCII and the conversion will fail for values
> 128.  

 

The first step is to figure out what character set the data in the
database really is.  You can do this either by looking at the
application(s) that connect to the database and seeing what character
set they use internally or by examining the data.  The quick & dirty
approach is to extract the data (or at least representative subsets) to
a flat file (ensuring that you do not do a character set conversion) and
try to open the file in Word, which will hopefully prompt you to choose
a character set to use to display the data.  Play around until you find
one that works (probably Windows-1252, ISO-8859-1, ISO-8859-15, or
UTF-8).  Note that many of these character sets are very similar, so you
have to be careful about jumping to conclusions.  If there are multiple
character sets represented, you've got a major problem.

 

If you follow the general instructions for character set conversion
(i.e. ALTER DATABASE CHARACTER SET), Oracle will allow you to change the
character set to just about anything, but that won't change the data
that is being stored and will almost certainly cause massive amounts of
things to break, particularly if you pick the wrong character set.  If
you pick the right character set (the character set that the data is
really encoded in), you'll just need to adjust NLS_LANG settings on all
the client machines.  If you pick the wrong character set, conversion
for the code points > 127 will universally fail regardless of client
settings.

 

Justin Cave

Distributed Database Consulting, Inc.

 

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Mandal, Ashoke
Sent: Thursday, October 27, 2005 12:42 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Database Character set related question

 

Greetings All, 

The database CHARACTER SET is US7ASCII. The data in the database
contains 8 bits  (> 127 ASCII )  values.  We cannot display these
values,  instead we see  "?" character via  the  sqlplus session.
We  want to upgrade this database to 9.2.0.4, but before  we do  that ,
we want to make sure  that we don't lose any characters that are
currently stored in the database.
 We are attempting to run CSSCAN to verify. We want to use UTF8 in the
new 9i 
database. How do  we  determine what the character set  was used to put
the data into the database.

It's definitely not US7ASCII!

Is there a object that  we can look at to see what another session is
currently using? It is difficult for  us  to get access to the user 's
workstation to see there configuration. 

Thanks,

Ashoke 

Other related posts: