Re: Why "Separating Data and Indexes improves performance" is amyth?

From: Tim Gorman <tim@xxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Fri, 23 Apr 2004 11:15:21 -0600
Juan Carlos,

Apologies in advance for the lengthy explanation, but I've not yet found a
way to explain this more concisely...

By far, the most common types of indexed access is UNIQUE and RANGE SCAN.
The other two methods, FULL SCAN and FAST FULL SCAN, are not often used
during queries, but are most often used during ALTER INDEX ... REBUILD or
when index usage is explicitly hinted but not really appropriate.  I've
recently heard of something called a SKIP SCAN, but as I understand it, it
is a variation on RANGE SCAN, so (until corrected) I don't differentiate.

UNIQUE SCAN, RANGE SCAN, and FULL SCAN all utilize random-access,
single-block reads (a.k.a. "db file sequential reads" wait-event) while FAST
FULL SCAN utilizes sequential-access, multi-block reads (a.k.a. "db file
scattered read" wait-event).  Thus, the overwhelming majority of indexed
access is performed using random-access, single-block reads, which as the
wait-event name suggests, is processed sequentially with no opportunity for
parallelism or asynchronicity.

So, then the question becomes:  why is indexed I/O a series of
sequentially-executed, random-access, single-block reads?  Why is there no
opportunity for executing these indexed scans in parallel within a single
database session?

Because performing a UNIQUE SCAN, RANGE SCAN, or FULL SCAN on a B*Tree index
structure is very much like the "treasure hunt" games you may have played as
a child (or as an adult on a work-related "team-building exercise").  In a
treasure hunt, you find a clue, read it to find where to go next, and then
go there.  When you get there, you find another clue, read it to find where
to go next, and then go there.  And so on until you reach the prize...

Navigating from root node through branch nodes to leaf nodes and then to
table blocks using a B*Tree index is very similar to that.  You have to
start at the root node, read it to know where to go next, and then go there.
When you get there, you read the node, and then move on.  And so on...

So, for a single database session, hopping from index block to table blocks
is not something that can be "accomplished in parallel", nor is the type of
I/O different for indexes or tables -- it is all single-block I/O.

Thus, there is no benefit to performance when tables and indexes are
separated into different tablespaces or (by extension) onto different
volumes or devices.

However, there are other reasons to separate tables and indexes into
different tablespaces, namely recoverability:

  * Tables contain data, indexes do not contain data (only pointers),
    so tables should be more carefully protected than indexes.  Lose
    a table, and you've lost data.  Lose an index, and you can just
    rebuild it.

So, the main reason to separate tables from indexes into different
tablespaces is data protection and data recovery.

To bear that out, my very first experience as a DBA involved a system which
was growing at a high rate but the company was not buying more storage for
it -- it was being decommissioned in favor of a replacement system.  So, to
accommodate growth, we unmirrored volumes underneath index tablespaces and
used those mirrors to create new mirrored volumes for table tablespaces and
new unmirrored volumes for index tablespaces.  This left index tablespaces
less protected than the table tablespaces.  Yes, we lost some indexes that
way, but it enabled us to continue to grow without acquiring any net new
storage and also avoid most database recoveries (just index rebuilds) until
the replacement system came online.  Of course, I eventually got nailed by a
series of recoveries anyway, but that's another story...  :-(

Counterpoint #1:  It is true that tables can also be accessed using FULL
table scans, which utilize sequential-access, multi-block reads (a.k.a "db
file scattered reads" wait-event).  But this fact is still not a reason to
separate tables from indexes.  Think it through:  if you cared enough about
optimizing the I/O of FULL table scans to optimize a specially-configured
set of I/O devices just for that type of I/O, then chances are good that
*any* random single-block I/O from indexed activity will mess things up.
Thus indexes would, in that scenario, be altogether undesirable and would
probably not be used at all.  So again, performance is not a reason to
separate tables from indexes.

Counterpoint #2:  This explanation of indexed activity does indeed explain
why parallel read access is not possible within a single database session,
but it does not take into account any opportunities for performance
improvement due to numerous concurrent sessions.  My response:  yeah, and so
what is the point being made?  The fact that dozens, hundreds, or thousands
of sessions may be performing indexed I/O's still does not indicate that
there is a performance benefit to separating tables and indexes to different
tablespaces or different devices.  If this were the case, an equally valid
case could be made for separating the different "branch levels" (a.k.a.
BLEVELs) within an index to different tablespaces or devices as well...

Hope this helps...

-Tim



on 4/23/04 10:13 AM, Juan Cachito Reyes Pacheco at jreyes@xxxxxxxxxxxxxxxx
wrote:

> Hi
> Does any one kwnows please,
> Why "Separating Data and Indexes improves performance" is a myth?
> 
> If they are both accesed at the same time, it will improve performance or
> not,
> I don't understand exactly what is this myth about?
> 
> http://www.niall.litchfield.dial.pipex.com/OracleMyths.zip
> 
> Thanks
> 
> 
> Juan Carlos Reyes Pacheco
> OCP
> Database 9.2 Standard Edition

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
References:
- Why "Separating Data and Indexes improves performance" is a myth?
  - From: Juan Cachito Reyes Pacheco
Re: Why "Separating Data and Indexes improves performance" is amyth?

Other related posts: