Re: Replacement of US7ASCII character set in 11g?

From: Janine Sisk <janine@xxxxxxxxxx>
To: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
Date: Mon, 11 Jan 2010 14:14:49 -0800

Mark, thanks for that! You gave me the idea to run CSSCAN and tell itto convert the data from WE8MSWIN1252 to UTF8. Voila, no moreerrors. At least now I know what Oracle thought it was doing whenthis data was inserted.

So (I think) what I need to do is get the data out of 8.1.7, but haveOracle treat it as WE8MSWIN1252 as-is, without converting it. If ittries to convert it, I'll probably lose all those 128 and abovecharacters.

I am going to try experimenting with this, but if anyone has anyeducated guesses I'd love to hear them.

Also, sadly I no longer have Metalink/MOS access; I am doing this asan outside contractor to one tiny piece of a huge organization, and Iam not allowed to use their CSI. I used to have my own, but over theyears my smaller clients have all converted to Postgres; our databaseusage is fairly minimal and doesn't justify the licensing expense.So, no more Metalink for me. :(


janine

On Jan 11, 2010, at 11:49 AM, Bobak, Mark wrote:

Hi Janine,
Problem is that “LOSSY” may mean that the data is already corruptedin the source database, so, Oracle has no idea what to convert itto. In the case of US7ASCII, character values range from 0-127,correct? So, if you have any values 128 or greater, Oracle has noidea what character that value should map to. It would depend onwhat *assumption* the client code was operating under when the datawas inserted. So, it doesn’t matter if AL32UTF8 is the “mother ofall character sets”. Even if AL32UTF8 *does* have the character youneed, it doesn’t matter, cause Oracle doesn’t know what thatcharacter is in the source. If it can’t determine that, it can’tmap it to the correct character in the destination character set.
See Doc ID 444701.1 “CSSCAN Output Explained”, for more info,particularly “B.4) LOSSY data”.
Hope that helps,

-Mark
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Janine Sisk
Sent: Monday, January 11, 2010 2:37 PM
To: oracle-l L
Subject: Re: Replacement of US7ASCII character set in 11g?

Thanks to everyone who replied....
I ran CSSCAN on the original 8.1.7 database and, of course, ran intotrouble right away. The conversion from US7ASCII to WE8MSWIN1252 islossy in a number of places. This does not surprise me terribly;Jared mentioned that you can put "invalid" data into a database oftype US7ASCII and I'm pretty sure that all of the programmers whohave worked on this site over the years have just assumed that ifthe database didn't choke on it, then it was ok.
What concerns me is that CSSCAN reports that converting to UTF8 willhave the exact same lossy conversions. The two error files areliterally identical except for the value of TOCHAR. I thought thatUTF8 was the mother of all character sets, so where do I go from here?
As an example, one of the first errors reported looks like this whenI do a SELECT in sqlplus:
Lic. en medios de comunicaci&Atilde;?3n
I will be digging further into this, with The Google and all, but ifanyone has any light to shed, please do!
janine

On Jan 7, 2010, at 3:07 PM, David Mann wrote:
When I had clients worried about character set conversions I usuallyran Character Set Scanner utility CSSCAN on a copy of the databaseto check for differences. Here is the reference in the 10gdocumentation, assuming it is still available in 11g but don't havea link handy.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm

--
Dave Mann
www.brainio.us
www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml


---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407

Follow-Ups:
- Re: Replacement of US7ASCII character set in 11g?
  - From: Robert Freeman
- Re: Replacement of US7ASCII character set in 11g?
  - From: amit bansal

References:
- Re: Replacement of US7ASCII character set in 11g?
  - From: David Mann
- Re: Replacement of US7ASCII character set in 11g?
  - From: Janine Sisk
- RE: Replacement of US7ASCII character set in 11g?
  - From: Bobak, Mark

Re: Replacement of US7ASCII character set in 11g?

Other related posts: