Re: Replacement of US7ASCII character set in 11g?

From: Robert Freeman <robertgfreeman@xxxxxxxxx>
To: janine@xxxxxxxxxx, "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
Date: Mon, 11 Jan 2010 14:32:21 -0800 (PST)

>> Also, sadly I no longer have Metalink/MOS access;  I am doing this as
an outside contractor to one tiny >> piece of a huge organization, and I
am not allowed to use their CSI.
All I can say is that is an organization that would not get my consulting 
services! :) I believe strongly in having the right tools for the job, or not 
doing the job at all. That's just me... :)

RF

Robert G. Freeman
Oracle ACE
Ask me about on-site Oracle Training! RMAN, DBA, Tuning, you name it!
Author:
Oracle Database 11g RMAN Backup and Recovery (Oracle Press) - ON ITS WAY SOON!
OCP: Oracle Database 11g Administrator Certified Professional Study Guide 
(Sybex)
Oracle Database 11g New Features (Oracle Press)
Oracle Database 10g New Features (Oracle Press)
Other various titles
Blog: http://robertgfreeman.blogspot.com
Check out my new blog series on installing Oracle Database 11gR2 on Windows 
using VMWare!

________________________________
From: Janine Sisk <janine@xxxxxxxxxx>
To: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
Cc: oracle-l L <oracle-l@xxxxxxxxxxxxx>
Sent: Mon, January 11, 2010 3:14:49 PM
Subject: Re: Replacement of US7ASCII character set in 11g?

Mark, thanks for that!  You gave me the idea to run CSSCAN and tell it to 
convert the data from WE8MSWIN1252 to UTF8.  Voila, no more errors.  At least 
now I know what Oracle thought it was doing when this data was inserted.

So (I think) what I need to do is get the data out of 8.1.7, but have Oracle 
treat it as WE8MSWIN1252 as-is, without converting it.  If it tries to convert 
it, I'll probably lose all those 128 and above characters.

I am going to try experimenting with this, but if anyone has any educated 
guesses I'd love to hear them.

Also, sadly I no longer have Metalink/MOS access;  I am doing this as an 
outside contractor to one tiny piece of a huge organization, and I am not 
allowed to use their CSI.  I used to have my own, but over the years my smaller 
clients have all converted to Postgres;  our database usage is fairly minimal 
and doesn't justify the licensing expense.  So, no more Metalink for me. :(

janine

On Jan 11, 2010, at 11:49 AM, Bobak, Mark wrote:

Hi Janine,
> 
>Problem is that “LOSSY” may mean that the data is already corrupted in the 
>source database, so, Oracle has no idea what to convert it to.  In the case of 
>US7ASCII, character values range from 0-127, correct?  So, if you have any 
>values 128 or greater, Oracle has no idea what character that value should map 
>to.  It would depend on what *assumption* the client code was operating under 
>when the data was inserted.  So, it doesn’t matter if AL32UTF8 is the “mother 
>of all character sets”.  Even if AL32UTF8 *does* have the character you need, 
>it doesn’t matter, cause Oracle doesn’t know what that character is in the 
>source.  If it can’t determine that, it can’t map it to the correct character 
>in the destination character set.
> 
>See Doc ID 444701.1 “CSSCAN Output Explained”, for more info, particularly 
>“B.4)  LOSSY data”.
> 
>Hope that helps,
> 
>-Mark
> 
>From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
>Behalf Of Janine Sisk
>Sent: Monday, January 11, 2010 2:37 PM
>To: oracle-l L
>Subject: Re: Replacement of US7ASCII character set in 11g?
> 
>Thanks to everyone who replied....
> 
>I ran CSSCAN on the original 8.1.7 database and, of course, ran into trouble 
>right away.  The conversion from US7ASCII to WE8MSWIN1252 is lossy in a number 
>of places.  This does not surprise me terribly;  Jared mentioned that you can 
>put "invalid" data into a database of type US7ASCII and I'm pretty sure that 
>all of the programmers who have worked on this site over the years have just 
>assumed that if the database didn't choke on it, then it was ok.
> 
>What concerns me is that CSSCAN reports that converting to UTF8 will have the 
>exact same lossy conversions.  The two error files are literally identical 
>except for the value of TOCHAR.  I thought that UTF8 was the mother of all 
>character sets, so where do I go from here?
> 
>As an example, one of the first errors reported looks like this when I do a 
>SELECT in sqlplus:
> 
>Lic. en medios de comunicaci&Atilde;?3n
> 
>I will be digging further into this, with The Google and all, but if anyone 
>has any light to shed, please do!
> 
>janine
> 
>On Jan 7, 2010, at 3:07 PM, David Mann wrote:
>
>
>
>When I had clients worried about character set conversions I usually ran 
>Character Set Scanner utility CSSCAN on a copy of the database to check for 
>differences. Here is the reference in the 10g documentation, assuming it is 
>still available in 11g but don't have a link handy. 
>
>http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm
>
>-- 
>Dave Mann
>www.brainio.us
>www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml
> 

---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407

References:
- Re: Replacement of US7ASCII character set in 11g?
  - From: David Mann
- Re: Replacement of US7ASCII character set in 11g?
  - From: Janine Sisk
- RE: Replacement of US7ASCII character set in 11g?
  - From: Bobak, Mark
- Re: Replacement of US7ASCII character set in 11g?
  - From: Janine Sisk

Re: Replacement of US7ASCII character set in 11g?

Other related posts: