RE: Of Character Sets, Performance and Storage...

  • From: "Orr, Steve" <sorr@xxxxxxxxxxxx>
  • To: "Justin Cave (DDBC)" <jcave@xxxxxxxxxxx>, "oracle-l" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 18 Jan 2005 11:43:32 -0700

What about tables (or table partitions) and indexes created with the
compress option? Any known effects? I'm wondering if anyone has already
looked into this but I'll probably test this myself in a personal
attempt to discover truth.=20

Steve Orr

-----Original Message-----
From: Justin Cave (DDBC) [mailto:jcave@xxxxxxxxxxx]=20
Sent: Tuesday, January 18, 2005 11:32 AM
To: Orr, Steve; oracle-l
Subject: RE: Of Character Sets, Performance and Storage...

You will not see any difference in performance or storage using US7ASCII
rather than WE8ISO8859P1.  Both are single-byte character sets.

If the debate was between storing data in something like AL32UTF8 or
WE8ISO8859P1, there would be some differences.  Since UTF-8 is a
variable length character sets (English characters take 1 byte, European
characters generally take 2 bytes, Asian characters generally take 3
bytes), it is relatively space efficient when you have mostly English
and European data.  On the other hand, searching the data is a little
less efficient, because Oracle has to spend cycles figuring out where
character boundaries are as it processes each row.  If you do a lot of
substring searches in data, this can be a noticeable performance impact.

Justin Cave  <jcave@xxxxxxxxxxx>
Distributed Database Consulting, Inc.

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Orr, Steve
Sent: Tuesday, January 18, 2005 12:01 PM
To: oracle-l
Subject: Of Character Sets, Performance and Storage...

I've got this data warehouse with an nls_characterset of WE8ISO8859P1
and an nls_nchar_characterset of AL16UTF16 but all the stored data is
only US-ASCII. (The data is from Apache logs capturing tons of web
application activity much of which comes from URI's which by W3C
definition is US-ASCII.)

The Oracle9i Globalization manual states: "Single-byte character sets
result in better performance than multi-byte character sets, and they
are also more efficient in terms of space requirements."

So... The obvious question is, How much more efficient and better
performing are they in actual practice? Also, I'm thinking US7ASCII and
WE8ISO8859P1 are both single-byte and possess equivalent storage
requirements and performance characteristics even though one is 7 bit
and the other is 8 bit. Is this true?

The current DW is not too big to recreate in the downsized US7ASCII
character set but is it even worth it? The DW will be getting quite
large so it's important now to address the bits and bytes storage and
performance concerns.

Of Mice and DBA's,
Steve Orr,
Bozeman, Montana


Other related posts: