Finding illegal UTF8 sequences

  • From: "Weaver, Walt" <wweaver@xxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 27 May 2004 12:37:43 -0600

Is anyone experienced with finding illegal UTF8 sequences and doing
something about them?

We have a UTF8 database containing Japanese data. One of the customers
appears to have random malformed data; when the data is displayed it's
displayed as random characters rather than Kanji characters.

Using the dump() function I've found sequences where there appears to
be, say, a valid trail byte with no associated lead byte. I've found a
valid three-character lead byte with no associated trail byte, and so on
and so on.

At least, I think that's what I've found.=20

At this point I'm still in a bit of learning mode here and am still
trying to figure out what I'm looking at and what I'm going to do.

This problem is isolated to one customer and may be the result of a data
import that was done some time ago.

So, does anyone know of any utilities that can find and print out
illegal UTF8 sequences? Or am I going to have to hire someone to do it
for me (I'm not smart enough to be able to do that sort of thing)?

Thanks,
--Walt Weaver
  Bozeman, Montana
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: