RE: Change database character set

  • From: Kenny Payton <k3nnyp@xxxxxxxxx>
  • To: Scott Canaan <srcdco@xxxxxxx>
  • Date: Fri, 24 Oct 2014 16:22:32 -0400

This could be because the character endpoint exists in both charactersets.
Win1252 is a superset of 8859.

You're lucky.   9 affected rows is a walk in the park.  Reloaded is
certainly the way to go.

I wrote an interactive perl cgi script to query the database and let me
select charactersets from a drop down until I had visually identified the
correct characterset for identified problem characters.   I then wrote a
process to correct millions of rows as part of the process.

DMU provides a lot of this functionality now but was pretty immature and
buggy when I was attempting to use it.

I was also dealing with 20T+ databases with a 6 hour window and had to get
pretty creative.   Unfortunately csalter requires a clean scan, this was
the hardest hurdle to circumvent.
On Oct 24, 2014 4:08 PM, "Scott Canaan" <srcdco@xxxxxxx> wrote:

>  Actually, the report lists the character sets at the end.
> Unfortunately, in my case, it lists 2 different character sets for the
> “bad” 9 rows:
>
>
>
> CHARACTER SET                            NUMBER       PERCENTAGE
>
> ------------------------------ ---------------- ----------------
>
> UNKNOWN                                  350400          99.997%
>
> WE8ISO8859P1                                  7           0.002%
>
> WE8MSWIN1252                                  2           0.001%
>
> ------------------------------ ---------------- ----------------
>
> Total                                    350409         100.000%
>
>
>
> [Language detection result]
>
>
>
> LANGUAGE                                 NUMBER       PERCENTAGE
>
> ------------------------------ ---------------- ----------------
>
> UNKNOWN                                  350400          99.997%
>
> ENGLISH                                       9           0.003%
>
> ------------------------------ ---------------- ----------------
>
> Total                                    350409         100.000%
>
>
>
> Which tells me that the data was entered on 2 different PCs with different
> character sets.  I’m going to have to insist that they re-enter this data
> once the conversion (and application upgrade) is complete.
>
>
>
> Scott Canaan ’88 (srcdco@xxxxxxx)
>
> (585) 475-7886 – work                (585) 339-8659 – cell
>
> “Life is like a sewer, what you get out of it depends on what you put into
> it.” – Tom Lehrer
>
>
>
> *From:* Kenny Payton [mailto:k3nnyp@xxxxxxxxx]
> *Sent:* Friday, October 24, 2014 11:13 AM
> *To:* Scott Canaan
> *Cc:* Niall Litchfield; oracle.unknowns@xxxxxxxxx; Chris Taylor;
> oracle-l@xxxxxxxxxxxxx
> *Subject:* Re: Change database character set
>
>
>
> There is an option to csscan that will populate some columns in one of the
> tables, I believe it's added to csm$errors.  If I recall correctly the
> column names are something like CSIDLED1, ...2, ....3.  It will provide up
> to 3 educated guesses as to which character set the data might have came
> from.
>
>
> LCSDDATA=LOSSY LCSD=Y
>
> You can also look at SYS_OP_CSCONV to convert the data.
>
>     $update=sprintf("update %s.%s set %s=SYS_OP_CSCONV(%s,'%s','%s' )
> where rowid = ?", $err->own, $err->tab, $err->col, $err->col, $tocs, $cs);
>
> We had millions of rows with embedded characters, such as the fancy quotes
> from word documents, in varchar2 fields and had to convert them manually.
>
>
>
>
>
>
> On Fri, Oct 24, 2014 at 10:26 AM, Scott Canaan <srcdco@xxxxxxx> wrote:
>
> What I ended up doing was to create a blackout in OEM, then kill the
> dbsnmp connections and truncate the reg$ table.  That took care of that
> error.  The remaining 9 rows in the application data remain.  I’ve been
> working with Oracle support and they suggested checking against
> WE8MSWIN1252.  I did that and got the same 9 rows, so I still don’t know
> what characterset the data was entered with and I don’t know how to find
> out.
>
> I’ve asked the person that entered the data if she still has the files and
> can re-enter it after the conversion and upgrade.  I hope so, that will
> simplify things a lot.
>
>
>
> Scott Canaan ’88 (srcdco@xxxxxxx)
>
> (585) 475-7886 – work                (585) 339-8659 – cell
>
> “Life is like a sewer, what you get out of it depends on what you put into
> it.” – Tom Lehrer
>
>
>
> *From:* Niall Litchfield [mailto:niall.litchfield@xxxxxxxxx]
> *Sent:* Friday, October 24, 2014 8:49 AM
> *To:* Kenny Payton
> *Cc:* oracle.unknowns@xxxxxxxxx; Scott Canaan; Chris Taylor;
> oracle-l@xxxxxxxxxxxxx
>
>
> *Subject:* Re: Change database character set
>
>
>
> Now I'm at my desk
>
> *Lossy or Convertible Conversion Data In SYS.REG$ Table (Doc ID 1490394.1)
> *is the note I meant.
>
> sys.reg$ table data is related to AQ or the EM agent. This table should
> not contain any data if there is no connections to the database.
>
> Ideally there should not be any sessions in the database while running
> csscan as it can cause problems when you do the convert if rows are
> deleted.
>
>
>
> On Fri, Oct 24, 2014 at 1:32 PM, Kenny Payton <k3nnyp@xxxxxxxxx> wrote:
>
> I don't believe that worked for us.
>
> On Oct 24, 2014 8:30 AM, "Niall Litchfield" <niall.litchfield@xxxxxxxxx>
> wrote:
>
> You can verify the Agent issue by stopping it (or dbconsole) and re running
>
> On 24 Oct 2014 13:10, "Kenny Payton" <k3nnyp@xxxxxxxxx> wrote:
>
> It sounds like you're getting off easy.  As for the sys.reg$ row we had a
> similar row and it was related to the emagent.  We probably could have got
> away with just deleting it but what we did was create a table in a user
> schema with the row in it, export the user table, delete the row from both
> tables and then reload the row after the csalter.  In hindsight this
> probably wasn't necessary but worked.
>
> You could probably do the same with the clob data as well.
>
>
>
> On Fri, Oct 24, 2014 at 8:04 AM, Scott Canaan <srcdco@xxxxxxx> wrote:
>
> Actually, I’ve been considering doing something similar to this.  The 9
> rows in that table (out of about 20) that are affected were added in August
> and the person that added them still has the original documents, so they
> could be reloaded after the conversion.  I believe that what is in the
> HUGECLOB in that table is just a word document that was cut and pasted into
> the field.
>
>
>
> The only row that really concerns me is the one in SYS.REG$.
>
>
>
> Scott Canaan ’88 (srcdco@xxxxxxx)
>
> (585) 475-7886 – work                (585) 339-8659 – cell
>
> “Life is like a sewer, what you get out of it depends on what you put into
> it.” – Tom Lehrer
>
>
>
> *From:* Chris Taylor [mailto:christopherdtaylor1994@xxxxxxxxx]
> *Sent:* Friday, October 24, 2014 8:01 AM
> *To:* Niall Litchfield
> *Cc:* Scott Canaan; oracle.unknowns@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
> *Subject:* Re: Change database character set
>
>
>
> Based on Niall's analysis, another option could be to export the
> SCRIPT_GROUP table, truncate it, do the conversion and reload it (depending
> on the size).
>
>
>
> Chris
>
>
>
>
>
> On Fri, Oct 24, 2014 at 5:02 AM, Niall Litchfield <
> niall.litchfield@xxxxxxxxx> wrote:
>
> Scott
>
>
>
> I'm pretty sure you'll find the session_key row is transient and due to an
> emagent (either grid/cloud control or dbconsole) - there's a note on csscan
> dictionary errors that will tell you. That leaves you with 9 documents
> (they look like word docs/emails). Can you grab those from the app and
> reenter the data post conversion?
>
>
>
>
>
>
>
>
>
> --
> Niall Litchfield
> Oracle DBA
> http://www.orawin.info
>
>
>

Other related posts: