RE: character set confusion

  • From: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxxxxx>
  • To: <robyn.sands@xxxxxxxxx>, "oracle-l" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 17 Jul 2007 12:40:00 -0400

Hi Robyn,


Playing a bit of catch up on Oracle-L.


I'm no expert on this subject, but, here's what I (think I) know:

Converting from US7ASCII to UTF8 should not be a problem, because the
latter is a superset of the former.  Having a source database in UTF8
and destination database in US7ASCII may be a problem.  If the UTF8
database stores characters that are not defined in US7ASCII, that's not
going to be good.  It seems to me, you could convert the destination
database to UTF8 first, and that shouldn't be a problem.  Then, when the
source database is converted to UTF8 (from US7ASCII?), there's no issue.
Since UTF8 is a superset of US7ASCII, having the destination at UTF8
before the source should not pose any problem.


To confirm what can and can't be stored in various character sets,
Oracle provides a tool called csscan.  It may be worth investigating.
Here's the link to the 10.2 csscan docs:


Hope that helps,





Mark J. Bobak
Senior Database Administrator, System & Product Technologies
789 E. Eisenhower, Parkway, P.O. Box 1346
Ann Arbor MI 48106-1346
734.997.4059  or 800.521.0600 x 4059
mark.bobak@xxxxxxxxxxxxxxx <mailto:mark.bobak@xxxxxxxxxxxxxxx> <> <> 

ProQuest...Start here. 


From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Robyn
Sent: Friday, July 13, 2007 7:50 PM
To: oracle-l
Subject: character set confusion


Hello all,

What are the limitations of materialized views across character sets?
We will be upgrading the source database for many, many materialized
views to Oracle in a few months.  We will also be converting
the database to UTF8 although that will probably occur a few months
later.  The target database is, at the moment, and USASCII7.
It too will be upgraded eventually but I need to determine if there is a
reason to perform the upgrade simultaneously with the upgrade and/or the
UTF8 conversion.  Both databases have been around for many years; about
a third of the objects in question still use the SNAP$ convention. 

It seems logical to me that there would be the potential for the target
to be unable to hold some of the data stored in the UTF8 source
database, but every test I've run has worked.  I did manage to hit the
bug with the big endian/little endian issue but once that patch was in,
no problems.  I've opened a case with Oracle, but their answer was brief
and not very reassuring.   Supposedly, if I upgrade both databases to
10g, I won't have to worry about any differences in character sets.
Somehow, that's not making sense to me and no logic was offered with the

So is there some kind of conversion that occurs in the materialized view
process?  Or would I eventually hit some bit of data that could not be
stored in the target database if it remains USASCII7?  Would it make
more sense to convert both to UTF8?  I've got time to plan for this and
I'd like to do it right, short of having to convert to completely new
form of replication overnight. 

Suggestions appreciated, including any test cases that might
conclusively prove the possibility of failure.  I'd rather find out now
than at 3:00 am on Feb 23rd 2009.

tia ... Robyn

Other related posts: