RE: Determining which rows have characters outside of the standard ASCII (0-127)

From: "Johnson, William L (TEIS)" <WLJohnson@xxxxxxxxxxxxxxxxxxx>
To: "Mark.Bobak@xxxxxxxxxxxx" <Mark.Bobak@xxxxxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 9 Nov 2010 08:03:53 -0500

Could you try using Oracle's csscan utility and pick a US7ASCII database as the 
target for the migration?  This should identify any rows with data outside the 
normal US7ASCII character set...

Bill

________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Bobak, Mark
Sent: Tuesday, November 09, 2010 7:56 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Determining which rows have characters outside of the standard ASCII 
(0-127)

Hi all,

I'm trying to (efficiently) determine which rows have column values with 
characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where 
regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the ones 
containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with 
dump(doc_authors) confirms that rows being returned don't have characters in 
the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where 
regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP 
functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it is, 
and the examples I've run across in the manuals and on the web, don't have 
anything similar to what I'm trying to do.


AdvThanksance,

-Mark

Follow-Ups:
- RE: Determining which rows have characters outside of the standard ASCII (0-127)
  - From: Bobak, Mark

References:
- Determining which rows have characters outside of the standard ASCII (0-127)
  - From: Bobak, Mark

RE: Determining which rows have characters outside of the standard ASCII (0-127)

Other related posts: