Re: That crazy CBO.....

From: Dan Tow <dantow@xxxxxxxxxxxxxx>
To: Mark.Bobak@xxxxxxxxxxxxxxx
Date: Wed, 27 Oct 2004 16:43:38 -0500
I can't say why you're getting diferent stats for the two tests, but I have some
general advice how you might do better than *either*:

You appear to have an example of a special case I see sometimes:

A table is laid out with one row per range of values, where the ranges of values
are non-overlapping and (usually) inclusive, meaning they leave no "holes" in
the midst of the ranges if the ranges are laid out end-to-end, so to speak. The
most common examples of this involve date ranges, but you seem to have a case
for address ranges. Thus, to point to a row, you say

:x between lowcol and highcol

or (where on ranges highcol is the next range's lowcol)

(:x > lowcol and :x <= highcol).

Most commonly, the designer chooses to create an index on (lowcol, highcol), but
sometimes the order is reversed, or just one of these is indexed.

The problem is that any such index strategy means that the index range scan to
find the single range that meets the condition is likely to have to read about
*half* the index, if you ask for a range in the middle of the whole set of
ranges. (With date ranges, you can often escape most of the dilemma because
applications tend to ask for the most-recent range, or one of the most-recent
ranges, anyway, so an index on *highcol* will see a very short range scan,
usually.)

Date or Codd (can't recall which, off-hand) discussed a handy set of
comparators, greatest-less-than, least-greater-than,
greatest-less-than-or-equal-to, least-greater-than-or-equal-to. With these
available, you'd just ask for an index range scan with

highcol least-greater-than-or-equal-to :x

and the database would find the very first (least) value of highcol in an index
range scan on a conventional index of highcol, then would quit, because that's
the row you want. The cost, if you look at the I/O, would be no different than
an index unique scan!

Sadly, I've never found a database that has implemented this lovely concept.
Instead, the best I've found is the following admittedly hacky solution:

(Forewarning: the following solution may offend the relational purists among
you. Yes, I *know* it's a hack.) Let's say the index on highcol is called
highcolind. If you want to quickly find the row you're after, the query you
need is

select /*+ index(t highcolind) */ <whatever>
from rangetable t
where highcol >= :x
and lowcol <= :x
and rownum=1

Here, the database won't keep searching for more ranges that might enclose :x
once it finds the first one, and the range scan on highcolind fidns the first
one in the first index entry it sees, as long as the database follows the hint!
(If it fails to follow the hint, performance will suffer, clearly, but the
database will still return the right row, since it will keep looking until it
satisfies both conditions on :x. This will *not* find multiple ranges that
include :x, though, if the table contains overlapping ranges - I have no easy
answer of finding multiple ranges if your ranges overlap. If your ranges have
holes, it will also have to do the entire range scan from :x to the end of the
range before it will conclude that no range contains the particular value of :x
you're looking with - that's why no-holes helps.)

To build this into a more-complex query, you'd do something like

Select /*+ ordered <maybe other hints> */ <whatever>
from (select /*+ index(t highcolind) */ *
        from rangetable t
       where highcol >= :x
         and lowcol <= :x
         and rownum=1) t, taba a, tabb b, ...
where t.fkey=a.pkey
...

In your particular case, you should have to do probably 1-3 I/Os to reach the
rowid you want from AIA_INDX_PR01, assuming that index leads with
AIA_IP_ADDRESS_END, then a single I/O for the table row, then about the same
number to reach the row you need in ADDS_USERS, probably no more than 8 I/Os,
likely all of them cached. However, the *biggest* win isn't going to be the
number of logical I/Os! I have found that I/Os to do the sort of large index
range scan you're liely getting, now, are *way* more expensive than normal
logical I/Os, because they have to do the work of examining every entry in each
leaf block they hit, which is much more expensive than a logical I/O to read a
single row or to reach a single rowid in a UNIQUE scan.

Hope it helps.

Thanks,

Dan Tow
650-858-1557
www.singingsql.com


Quoting "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxxxxx>:

> Ok, I'm a bit at a loss to explain this.....
>
> I have two tables, ADDS_USERS, which has AUSR_ID as the primary key, and
> AUTHORIZED_IP_ADDRESSES.
>
> Now, my original query looks like this:
> SELECT A.AUSR_LOGIN_SCREEN_NAME FROM ADDS_USERS A,
> AUTHORIZED_IP_ADDRESSES B
> WHERE A.AUSR_ID =3D B.AUSR_ID AND :B1 BETWEEN B.AIA_IP_ADDRESS_START AND
> B.AIA_IP_ADDRESS_END;
>
> and when run w/ autotrace set to traceonly, produces the following
> output:
>
> Execution Plan
> ----------------------------------------------------------
>    0      SELECT STATEMENT Optimizer=3DCHOOSE (Cost=3D13 Card=3D110 =
> Bytes=3D46
>           20)
>
>    1    0   NESTED LOOPS (Cost=3D13 Card=3D110 Bytes=3D4620)
>    2    1     INDEX (RANGE SCAN) OF 'AIA_INDX_PR01' (NON-UNIQUE) (Cost
>           =3D4 Card=3D110 Bytes=3D3080)
>
>    3    1     TABLE ACCESS (BY INDEX ROWID) OF 'ADDS_USERS' (Cost=3D2 Ca
>           rd=3D1 Bytes=3D14)
>
>    4    3       INDEX (UNIQUE SCAN) OF 'AUSR_PK' (UNIQUE)
>
>
>
>
> Statistics
> ----------------------------------------------------------
>           0  recursive calls
>           0  db block gets
>        1321  consistent gets
>         864  physical reads
>           0  redo size
>         513  bytes sent via SQL*Net to client
>         652  bytes received via SQL*Net from client
>           2  SQL*Net roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
>
> Now, that plan looks quite reasonable, but, 1321 buffer gets is too
> many, not to mention the physical I/O.  This is a very frequently called
> SQL statement.
>
> So, I tried:
> re-creating ADDS_USERS ordered by AUSR_ID, to improve AUSR_PK index
> clustering factor.  No use.
> creating AUTHORIZED_IP_ADDRESSES as an IOT.  Also useless.
>
> So, I went back to SQL hacking, and finally came up with the somewhat
> odd looking, but effective:
> SELECT (select A.AUSR_LOGIN_SCREEN_NAME FROM ADDS_USERS A where
> a.ausr_id =3D b.ausr_id) from  AUTHORIZED_IP_ADDRESSES B
> WHERE :B1 BETWEEN B.AIA_IP_ADDRESS_START AND B.AIA_IP_ADDRESS_END;
>
> which produces an output like:
> Execution Plan
> ----------------------------------------------------------
>    0      SELECT STATEMENT Optimizer=3DCHOOSE (Cost=3D2 Card=3D110 =
> Bytes=3D308
>           0)
>
>    1    0   TABLE ACCESS (BY INDEX ROWID) OF 'ADDS_USERS' (Cost=3D2 Card
>           =3D1 Bytes=3D14)
>
>    2    1     INDEX (UNIQUE SCAN) OF 'AUSR_PK' (UNIQUE) (Cost=3D1 =
> Card=3D1
>           )
>
>    3    0   INDEX (RANGE SCAN) OF 'AIA_INDX_PR01' (NON-UNIQUE) (Cost=3D4
>            Card=3D110 Bytes=3D3080)
>
>
>
>
>
> Statistics
> ----------------------------------------------------------
>           0  recursive calls
>           0  db block gets
>          67  consistent gets
>           0  physical reads
>           0  redo size
>         562  bytes sent via SQL*Net to client
>         652  bytes received via SQL*Net from client
>           2  SQL*Net roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
>
> Now, that's a bit odd looking plan, but, look at those consistent gets!
> Down to 67 from 1321!
>
> So, I guess what I'm wondering is why this re-write is so effective?
> I'm happy that I found a solution, but, I'm curious what's going on
> here.  Intuitively, I'm thinking that all I'm doing with the re-write is
> forcing the optimizer to do what it seems to claim to be already doing
> with the original version .  With the scalar subquery, it seems the
> nested loop join is implicit in the select statement.
>
> Well, anyhow, I thought this was pretty odd.
>
> Thoughts, anyone?
>
> -Mark
> --
> Mark J. Bobak
> Oracle DBA
> ProQuest Company
> Ann Arbor, MI
> "On two occasions, I have been asked [by members of Parliament], "Pray,
> Mr. Babbage, if you put into the machine wrong figures, will the right
> answers come out?'  I am not able to rightly apprehend the kind of
> confusion of ideas that could provoke such a question."
> -- Charles Babbage (1791-1871)
>
> --
> //www.freelists.org/webpage/oracle-l
>

--
//www.freelists.org/webpage/oracle-l
References:
- That crazy CBO.....
  - From: Bobak, Mark
Re: That crazy CBO.....

Other related posts: