Re: Replacement of US7ASCII character set in 11g?
- From: Janine Sisk <janine@xxxxxxxxxx>
- To: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
- Date: Mon, 11 Jan 2010 14:14:49 -0800
Mark, thanks for that! You gave me the idea to run CSSCAN and tell it
to convert the data from WE8MSWIN1252 to UTF8. Voila, no more
errors. At least now I know what Oracle thought it was doing when
this data was inserted.
So (I think) what I need to do is get the data out of 8.1.7, but have
Oracle treat it as WE8MSWIN1252 as-is, without converting it. If it
tries to convert it, I'll probably lose all those 128 and above
characters.
I am going to try experimenting with this, but if anyone has any
educated guesses I'd love to hear them.
Also, sadly I no longer have Metalink/MOS access; I am doing this as
an outside contractor to one tiny piece of a huge organization, and I
am not allowed to use their CSI. I used to have my own, but over the
years my smaller clients have all converted to Postgres; our database
usage is fairly minimal and doesn't justify the licensing expense.
So, no more Metalink for me. :(
janine
On Jan 11, 2010, at 11:49 AM, Bobak, Mark wrote:
Hi Janine,
Problem is that “LOSSY” may mean that the data is already corrupted
in the source database, so, Oracle has no idea what to convert it
to. In the case of US7ASCII, character values range from 0-127,
correct? So, if you have any values 128 or greater, Oracle has no
idea what character that value should map to. It would depend on
what *assumption* the client code was operating under when the data
was inserted. So, it doesn’t matter if AL32UTF8 is the “mother of
all character sets”. Even if AL32UTF8 *does* have the character you
need, it doesn’t matter, cause Oracle doesn’t know what that
character is in the source. If it can’t determine that, it can’t
map it to the correct character in the destination character set.
See Doc ID 444701.1 “CSSCAN Output Explained”, for more info,
particularly “B.4) LOSSY data”.
Hope that helps,
-Mark
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx
] On Behalf Of Janine Sisk
Sent: Monday, January 11, 2010 2:37 PM
To: oracle-l L
Subject: Re: Replacement of US7ASCII character set in 11g?
Thanks to everyone who replied....
I ran CSSCAN on the original 8.1.7 database and, of course, ran into
trouble right away. The conversion from US7ASCII to WE8MSWIN1252 is
lossy in a number of places. This does not surprise me terribly;
Jared mentioned that you can put "invalid" data into a database of
type US7ASCII and I'm pretty sure that all of the programmers who
have worked on this site over the years have just assumed that if
the database didn't choke on it, then it was ok.
What concerns me is that CSSCAN reports that converting to UTF8 will
have the exact same lossy conversions. The two error files are
literally identical except for the value of TOCHAR. I thought that
UTF8 was the mother of all character sets, so where do I go from here?
As an example, one of the first errors reported looks like this when
I do a SELECT in sqlplus:
Lic. en medios de comunicaciÃ?3n
I will be digging further into this, with The Google and all, but if
anyone has any light to shed, please do!
janine
On Jan 7, 2010, at 3:07 PM, David Mann wrote:
When I had clients worried about character set conversions I usually
ran Character Set Scanner utility CSSCAN on a copy of the database
to check for differences. Here is the reference in the 10g
documentation, assuming it is still available in 11g but don't have
a link handy.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm
--
Dave Mann
www.brainio.us
www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml
---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407
Other related posts: