RE: character set confusion

From: "Powell, Mark D" <mark.powell@xxxxxxx>
To: "oracle-l" <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 17 Jul 2007 14:10:57 -0400
I thought UTF8 should be considered obsolete as it is not guaranteed to
match the emerging standard and that AL32UTF8 was its replacement.
 

-- Mark D Powell -- 
Phone (313) 592-5148 

 


________________________________

        From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Bobak, Mark
        Sent: Tuesday, July 17, 2007 12:40 PM
        To: robyn.sands@xxxxxxxxx; oracle-l
        Subject: RE: character set confusion
        
        

        Hi Robyn,

         

        Playing a bit of catch up on Oracle-L.

         

        I'm no expert on this subject, but, here's what I (think I)
know:

        Converting from US7ASCII to UTF8 should not be a problem,
because the latter is a superset of the former.  Having a source
database in UTF8 and destination database in US7ASCII may be a problem.
If the UTF8 database stores characters that are not defined in US7ASCII,
that's not going to be good.  It seems to me, you could convert the
destination database to UTF8 first, and that shouldn't be a problem.
Then, when the source database is converted to UTF8 (from US7ASCII?),
there's no issue.  Since UTF8 is a superset of US7ASCII, having the
destination at UTF8 before the source should not pose any problem.

         

        To confirm what can and can't be stored in various character
sets, Oracle provides a tool called csscan.  It may be worth
investigating.  Here's the link to the 10.2 csscan docs:

        
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scann
er.htm

         

        Hope that helps,

         

        -Mark

         

         

        --
        Mark J. Bobak
        Senior Database Administrator, System & Product Technologies
        ProQuest
        789 E. Eisenhower, Parkway, P.O. Box 1346
        Ann Arbor MI 48106-1346
        734.997.4059  or 800.521.0600 x 4059
        mark.bobak@xxxxxxxxxxxxxxx <mailto:mark.bobak@xxxxxxxxxxxxxxx> 
        www.proquest.com <http://www.proquest.com> 
        www.csa.com <http://www.csa.com> 
        
        ProQuest...Start here. 

         

        From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Robyn
        Sent: Friday, July 13, 2007 7:50 PM
        To: oracle-l
        Subject: character set confusion

         

        Hello all,
        
        What are the limitations of materialized views across character
sets?  We will be upgrading the source database for many, many
materialized views to Oracle 10.2.0.2 in a few months.  We will also be
converting the database to UTF8 although that will probably occur a few
months later.  The target database is, at the moment, 9.2.0.8 and
USASCII7.   It too will be upgraded eventually but I need to determine
if there is a reason to perform the upgrade simultaneously with the
upgrade and/or the UTF8 conversion.  Both databases have been around for
many years; about a third of the objects in question still use the SNAP$
convention. 
        
        It seems logical to me that there would be the potential for the
target to be unable to hold some of the data stored in the UTF8 source
database, but every test I've run has worked.  I did manage to hit the
bug with the big endian/little endian issue but once that patch was in,
no problems.  I've opened a case with Oracle, but their answer was brief
and not very reassuring.   Supposedly, if I upgrade both databases to
10g, I won't have to worry about any differences in character sets.
Somehow, that's not making sense to me and no logic was offered with the
answer. 
        
        So is there some kind of conversion that occurs in the
materialized view process?  Or would I eventually hit some bit of data
that could not be stored in the target database if it remains USASCII7?
Would it make more sense to convert both to UTF8?  I've got time to plan
for this and I'd like to do it right, short of having to convert to
completely new form of replication overnight. 
        
        Suggestions appreciated, including any test cases that might
conclusively prove the possibility of failure.  I'd rather find out now
than at 3:00 am on Feb 23rd 2009.
        
        tia ... Robyn
Follow-Ups:
- RE: character set confusion
  - From: Bobak, Mark
References:
- RE: character set confusion
  - From: Bobak, Mark
RE: character set confusion

Other related posts: