RE: [UNCLASSIFIED] - RE: Determining which rows have characters outside of the standard ASCII (0-127)

  • From: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
  • To: Ian Cary <ian.cary@xxxxxxxxxxxxxx>
  • Date: Tue, 9 Nov 2010 09:57:25 -0500

Ian,

Thanks for that.  I'll give it a try.  Dave's solution of 
doc_authors!=convert(doc_authors,'us7ascii') seems to be working.

-Mark

-----Original Message-----
From: Ian Cary [mailto:ian.cary@xxxxxxxxxxxxxx] 
Sent: Tuesday, November 09, 2010 9:39 AM
To: Bobak, Mark
Cc: Dave.Herring@xxxxxxxxxx; oracle-l@xxxxxxxxxxxxx; 
oracle-l-bounce@xxxxxxxxxxxxx; WLJohnson@xxxxxxxxxxxxxxxxxxx
Subject: [UNCLASSIFIED] - RE: Determining which rows have characters outside of 
the standard ASCII (0-127)

Hi Mark,

Would regexp_instr(doc_authors,'[[:alnum:][:blank:][:cntrl:][:punct:]]')
help as I believe this covers ascii(0-127)

Cheers,

Ian



|---------+----------------------------->
|         |           Mark.Bobak@proques|
|         |           t.com             |
|         |           Sent by:          |
|         |           oracle-l-bounce@fr|
|         |           eelists.org       |
|         |                             |
|         |                             |
|         |           09/11/2010 14:12  |
|         |           Please respond to |
|         |           Mark.Bobak        |
|         |                             |
|---------+----------------------------->
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                             
                                                                     |
  |       To:       WLJohnson@xxxxxxxxxxxxxxxxxxx, Dave.Herring@xxxxxxxxxx, 
oracle-l@xxxxxxxxxxxxx                                                   |
  |       cc:                                                                   
                                                                     |
  |       Subject:  RE: Determining which rows have characters outside of the 
standard ASCII (0-127)                                                 |
  |                                                                             
                                                                     |
  |                                                                             
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|




Nope.

-----Original Message-----
From: Johnson, William L (TEIS) [mailto:WLJohnson@xxxxxxxxxxxxxxxxxxx]
Sent: Tuesday, November 09, 2010 9:12 AM
To: Herring Dave - dherri; Bobak, Mark; oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard
ASCII (0-127)

Is this a partitioned table?
Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

-----Original Message-----
From: Herring Dave - dherri [mailto:Dave.Herring@xxxxxxxxxx]
Sent: Tuesday, November 09, 2010 9:10 AM
To: Mark.Bobak@xxxxxxxxxxxx; Johnson, William L (TEIS);
oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard
ASCII (0-127)

Mark,

Could something like the following work for you?

select doc_id, doc_authors
  from documents
 where doc_authors != convert(doc_authors, 'us7ascii');

Dave Herring  | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA@xxxxxxx


From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Bobak, Mark
Sent: Tuesday, November 09, 2010 7:44 AM
To: Johnson, William L (TEIS); oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard
ASCII (0-127)

Hi Bill,

I tried your suggestion, but, it doesn't seem to work as expected:

pqrac101:[pqprd1]:(/home/oracle):$csscan


Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
08:16:45 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Username: adds@prd1

Password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
Production
With the Partitioning, Real Application Clusters and Data Mining options

(1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

Current database character set is WE8ISO8859P1.

Enter new database character set name: > US7ASCII

Enter array fetch buffer size: 1024000 >

Enter number of scan processes to utilize(1..32): 1 >

Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

Enter column name to scan: >

Enumerating tables to scan...

table(s) contain no character type columns

Scanner terminated successfully.

Not sure what "contain no character type columns" actually means.....

-Mark


From: Johnson, William L (TEIS) [mailto:WLJohnson@xxxxxxxxxxxxxxxxxxx]
Sent: Tuesday, November 09, 2010 8:04 AM
To: Bobak, Mark; oracle-l@xxxxxxxxxxxxx
Subject: RE: Determining which rows have characters outside of the standard
ASCII (0-127)

Could you try using Oracle's csscan utility and pick a US7ASCII database as
the target for the migration?  This should identify any rows with data
outside the normal US7ASCII character set...

Bill

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Bobak, Mark
Sent: Tuesday, November 09, 2010 7:56 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Determining which rows have characters outside of the standard
ASCII (0-127)

Hi all,

I'm trying to (efficiently) determine which rows have column values with
characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where
regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the
ones containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with
dump(doc_authors) confirms that rows being returned don't have characters
in the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where
regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP
functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it
is, and the examples I've run across in the manuals and on the web, don't
have anything similar to what I'm trying to do.


AdvThanksance,

-Mark


***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************




--
//www.freelists.org/webpage/oracle-l






For the latest data on the economy and society consult National Statistics at 
http://www.ons.gov.uk

*********************************************************************************


Please Note:  Incoming and outgoing email messages are routinely monitored for 
compliance with our policy on the use of electronic communications
*********************************************************************************


Legal Disclaimer  :  Any views expressed by the sender of this message are not 
necessarily those of the Office for National Statistics
*********************************************************************************



The original of this email was scanned for viruses by the Government Secure 
Intranet virus scanning service supplied by Cable&Wireless Worldwide in 
partnership with MessageLabs. (CCTM Certificate Number 2009/09/0052.) On 
leaving the GSi this email was certified virus free.
Communications via the GSi may be automatically logged, monitored and/or 
recorded for legal purposes.


--
//www.freelists.org/webpage/oracle-l


Other related posts: