RE: Determining which rows have characters outside of the standard ASCII (0-127)

  • From: "Johnson, William L (TEIS)" <WLJohnson@xxxxxxxxxxxxxxxxxxx>
  • To: Herring Dave - dherri <Dave.Herring@xxxxxxxxxx>, "Mark.Bobak@xxxxxxxxxxxx" <Mark.Bobak@xxxxxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 9 Nov 2010 09:11:39 -0500

Is this a partitioned table?
Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

-----Original Message-----
From: Herring Dave - dherri [mailto:Dave.Herring@xxxxxxxxxx] 
Sent: Tuesday, November 09, 2010 9:10 AM
To: Mark.Bobak@xxxxxxxxxxxx; Johnson, William L (TEIS); oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard 
ASCII (0-127)

Mark,

Could something like the following work for you?

select doc_id, doc_authors 
  from documents 
 where doc_authors != convert(doc_authors, 'us7ascii');

Dave Herring  | DBA
Acxiom Global Technology Solutions   

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA@xxxxxxx


From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Bobak, Mark
Sent: Tuesday, November 09, 2010 7:44 AM
To: Johnson, William L (TEIS); oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard 
ASCII (0-127)

Hi Bill,

I tried your suggestion, but, it doesn't seem to work as expected:

pqrac101:[pqprd1]:(/home/oracle):$csscan


Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 
08:16:45 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Username: adds@prd1

Password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options

(1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

Current database character set is WE8ISO8859P1.

Enter new database character set name: > US7ASCII

Enter array fetch buffer size: 1024000 >

Enter number of scan processes to utilize(1..32): 1 >

Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

Enter column name to scan: >

Enumerating tables to scan...

table(s) contain no character type columns

Scanner terminated successfully.

Not sure what "contain no character type columns" actually means.....

-Mark


From: Johnson, William L (TEIS) [mailto:WLJohnson@xxxxxxxxxxxxxxxxxxx] 
Sent: Tuesday, November 09, 2010 8:04 AM
To: Bobak, Mark; oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard 
ASCII (0-127)

Could you try using Oracle's csscan utility and pick a US7ASCII database as the 
target for the migration?  This should identify any rows with data outside the 
normal US7ASCII character set...

Bill

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Bobak, Mark
Sent: Tuesday, November 09, 2010 7:56 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Determining which rows have characters outside of the standard ASCII 
(0-127)

Hi all,

I'm trying to (efficiently) determine which rows have column values with 
characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where 
regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the ones 
containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with 
dump(doc_authors) confirms that rows being returned don't have characters in 
the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where 
regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP 
functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it is, 
and the examples I've run across in the manuals and on the web, don't have 
anything similar to what I'm trying to do.


AdvThanksance,

-Mark


***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************

--
//www.freelists.org/webpage/oracle-l


Other related posts: