Re: reverse key index -- Sorting expected?

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: Oracle L <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 8 Jan 2020 12:18:10 +0000

Vishnu,

Two things -
First - don't expect Oracle to do the perfect thing just because you can see an
obvious improvement.
Second - if you spot an anomaly, see if you can reproduce it in the simplest
possible way.

So (1st) In this case the "generic case" that Oracle is using is that an IN
subquery (or EXISTS) can be unnested and used as an inline view with distinct
to drive a join.  The fact that your example is guaranteed to give Oracle a
unique set is a little special and the optimizer could, in principle, derive
this fact by sufficiently close examination of the inline view - but that means
adding a piece of complexity to the optimizer that might be irrelevant in
almost all cases (generally if you run an IN subquery it's probably because you
expect there to be multiple rows for each input and you want to stop after 1).

(2nd) Isolate the principle of the inline view, try this:
select distinct a.student_id from students a, temp b where a.student_id in
(1,2,3,4,5,6,7,8) and b.student_id = a.student_id;

It's "obvious" that this is going to generate a list of unique values even if
the "distinct" wasn't there, so the optimizer ought to avoid a hash/sort unique.
Here's the execution plan (19.3)

-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |
A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |     2 (100)|      8
|00:00:00.01 |      15 |
|   1 |  SORT UNIQUE NOSORT  |       |      1 |      1 |     2  (50)|      8
|00:00:00.01 |      15 |
|   2 |   NESTED LOOPS SEMI  |       |      1 |      8 |     1   (0)|      8
|00:00:00.01 |      15 |
|   3 |    INLIST ITERATOR   |       |      1 |        |            |      8
|00:00:00.01 |       5 |
|*  4 |     INDEX UNIQUE SCAN| T_PK  |      8 |      8 |     1   (0)|      8
|00:00:00.01 |       5 |
|*  5 |    INDEX UNIQUE SCAN | ST_PK |      8 |      8 |     0   (0)|      8
|00:00:00.01 |      10 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
   4 - access(("B"."STUDENT_ID"=1 OR "B"."STUDENT_ID"=2 OR "B"."STUDENT_ID"=3 OR
              "B"."STUDENT_ID"=4 OR "B"."STUDENT_ID"=5 OR "B"."STUDENT_ID"=6 OR
"B"."STUDENT_ID"=7 OR
              "B"."STUDENT_ID"=8))
   5 - access("B"."STUDENT_ID"="A"."STUDENT_ID")
       filter(("A"."STUDENT_ID"=1 OR "A"."STUDENT_ID"=2 OR "A"."STUDENT_ID"=3 OR
              "A"."STUDENT_ID"=4 OR "A"."STUDENT_ID"=5 OR "A"."STUDENT_ID"=6 OR
"A"."STUDENT_ID"=7 OR
              "A"."STUDENT_ID"=8))

(Students is "create as select rownum student_id, ao.* from all_objects where
rownum <= 50000", temp is your definition.)
Note, by the way, that the optimizer has used transitive closure to copy the
inlist from students to temp. In general this is reasonable, in this specific
case (with the PK declared) it's pointless but another little enhancement that
would require a lot of careful coding and testing to avoid errors in all cases.

You've probably spotted that Oracle has used a sort unique here instead of a
hash unique - it's just a minor detail in the costing arithmetic. You'll also
notice that the cardinality estimate for the unique data is very silly.  The
latter is a severe limitation of the optimizer in the general case (and another
indication that it hasn't tried to look for the uniqueness conditions that hold
in this query)

It's a sad fact that it's easy to find examples where the optimizer doesn't do
the best thing, or gets the numbers wrong - sometimes it's a case of the
historical momentum; sometimes it's due to using a generic approach for a
problem that has a few special cases; sometimes it's because the statistics
that have been gathered don't describe the data well; sometimes it's simply
because the problem is too hard.  And sometimes it's a bug - in which case if
you can create a very simple reproducible demo it's nice to let Oracle Corp.
know.

If you ever need it in a production there's a
"no_use_hash_aggregation(@qb_name)" hint that you could use in your examples to
stop Oracle from using a hash unique when you want it to do a sort unique.

Regards
Jonathan Lewis

P.S.  If you do come across other things that look like defects it's best to
see if you can create a model that uses pure SQL (or PL/SQL) because it's hard
enough to persuade first-line support to run SQL scripts, let alone do
something with Java.  (And I've never bothered to acquaint myself with setting
up java on my sandbox, so I wouldn't be able to recreate you data without a lot
of hassle that I don't need.)

________________________________________
From: Vishnu Potukanuma <vishnupotukanuma@xxxxxxxxx>
Sent: 08 January 2020 10:51
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: reverse key index -- Sorting expected?

Hi Jonathan,

I completely agree with you regarding the optimisation that oracle performs
when it comes to using hashing or sort unique or sometimes no hashing or sort
unique.. but the following still bugs me and cannot find a suitable explanation
as to why this is happening -----  if the query in the IN - clause uses join -
it is using hash unique.... and we cannot prevent it even though if it is not
required in some cases.....

considering the same example as above:
I created two additional tables temp and temp2 as follows:
create table temp as select student_id from students where student_id < 100;
create table temp2  as select student_id from students where student_id < 100;
alter table temp add constraint temp_pk primary key (student_id);
alter table temp2 add constraint temp2_pk primary key (student_id);
gathered statistics.... even histograms as well..

for all the below statements....
given every possible way....
select student_id from students where student_id in (select
a.student_id from temp a, temp2 b where a.student_id = b.student_id and
a.student_id in (1,2,3,4,5,6,7,8,9,10) and b.student_id in
(1,2,3,4,5,6,7,8,9,10))

Plan hash value: 1363404670

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation       | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)|
E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |     |    1 |        |       |     2 (100)|
  |     10 |00:00:00.01 |    12 |       |       |  |
|   1 |  NESTED LOOPS       |     |    1 |      1 |    16 |     2 (50)|
00:00:01 |     10 |00:00:00.01 |    12 |       |       |  |
|   2 |   VIEW       | VW_NSO_1    |    1 |      1 |    13 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     8 |       |       |  |
|   3 |    HASH UNIQUE       |     |    1 |      1 |     6 |    |       |
10 |00:00:00.01 |     8 |  2294K|  2294K| 1041K (0)|
|   4 |     NESTED LOOPS SEMI  |     |    1 |      1 |     6 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     8 |       |       |  |
|   5 |      INLIST ITERATOR   |     |    1 |        |       |    |       |
10 |00:00:00.01 |     4 |       |       |  |
|*  6 |       INDEX UNIQUE SCAN| TEMP_PK     |   10 |     10 |    30 |     1
(0)| 00:00:01 |     10 |00:00:00.01 |     4 |       |       |  |
|*  7 |      INDEX UNIQUE SCAN | TEMP2_PK    |   10 |      1 |     3 |     0
(0)|       |     10 |00:00:00.01 |     4 |       |       |  |
|*  8 |   INDEX UNIQUE SCAN    | STUDENT_IDX |   10 |      1 |     3 |     0
(0)|       |     10 |00:00:00.01 |     4 |       |       |  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

select student_id from students where student_id in (select
a.student_id from temp a, temp2 b where a.student_id = b.student_id and
a.student_id in (1,2,3,4,5,6,7,8,9,10))

Plan hash value: 1363404670

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation       | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)|
E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |     |    1 |        |       |     2 (100)|
  |     10 |00:00:00.01 |    17 |       |       |  |
|   1 |  NESTED LOOPS       |     |    1 |      1 |    18 |     2 (50)|
00:00:01 |     10 |00:00:00.01 |    17 |       |       |  |
|   2 |   VIEW       | VW_NSO_1    |    1 |      1 |    13 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     8 |       |       |  |
|   3 |    HASH UNIQUE       |     |    1 |      1 |     6 |    |       |
10 |00:00:00.01 |     8 |  2294K|  2294K| 1044K (0)|
|   4 |     NESTED LOOPS SEMI  |     |    1 |      1 |     6 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     8 |       |       |  |
|   5 |      INLIST ITERATOR   |     |    1 |        |       |    |       |
10 |00:00:00.01 |     4 |       |       |  |
|*  6 |       INDEX UNIQUE SCAN| TEMP_PK     |   10 |     10 |    30 |     1
(0)| 00:00:01 |     10 |00:00:00.01 |     4 |       |       |  |
|*  7 |      INDEX UNIQUE SCAN | TEMP2_PK    |   10 |      1 |     3 |     0
(0)|       |     10 |00:00:00.01 |     4 |       |       |  |
|*  8 |   INDEX UNIQUE SCAN    | STUDENT_IDX |   10 |      1 |     5 |     0
(0)|       |     10 |00:00:00.01 |     9 |       |       |  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

select student_id from students where student_id in (select
a.student_id from temp a, temp2 b where a.student_id = b.student_id and
a.student_id < 10)

Plan hash value: 2866610440

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation      | Name    | Starts | E-Rows |E-Bytes| Cost (%CPU)|
E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |    |   1 |      |       |     2 (100)|      |
   9 |00:00:00.01 |   14 |       |       | |
|   1 |  NESTED LOOPS      |    |   1 |    8 |   144 |     2  (50)| 00:00:01 |
    9 |00:00:00.01 |   14 |       |       | |
|   2 |   VIEW      | VW_NSO_1    |   1 |    8 |   104 |     1 (0)| 00:00:01 |
    9 |00:00:00.01 |    5 |       |       | |
|   3 |    HASH UNIQUE      |    |   1 |    8 |    48 |   |      |      9
|00:00:00.01 |    5 |  2294K|  2294K| 1047K (0)|
|   4 |     NESTED LOOPS SEMI |    |   1 |    8 |    48 |     1 (0)| 00:00:01 |
     9 |00:00:00.01 |    5 |       |       | |
|*  5 |      INDEX RANGE SCAN | TEMP_PK     |   1 |    9 |    27 |     1 (0)|
00:00:01 |      9 |00:00:00.01 |    1 |       |       | |
|*  6 |      INDEX UNIQUE SCAN| TEMP2_PK    |   9 |    9 |    27 |     0 (0)|
   |      9 |00:00:00.01 |    4 |       |       | |
|*  7 |   INDEX UNIQUE SCAN   | STUDENT_IDX |   9 |    1 |     5 |     0 (0)|
   |      9 |00:00:00.01 |    9 |       |       | |
----------------------------------------------------------------------------------------------------------------------------------------------------------

select student_id from students where student_id in ( select  a.student_id from
temp a, temp2 b where a.student_id = b.student_id and a.student_id between 1
and 10 and b.student_id between 1 and 10)
-------------------------------------------------------------------------------------
| Id  | Operation      | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |    |  9 | 144 |  2  (50)| 00:00:01 |
|   1 |  NESTED LOOPS      |    |  9 | 144 |  2  (50)| 00:00:01 |
|   2 |   VIEW      | VW_NSO_1    |  9 | 117 |  1   (0)| 00:00:01 |
|   3 |    HASH UNIQUE      |    |  9 | 54 | |    |
|   4 |     NESTED LOOPS SEMI |    |  9 | 54 |  1   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN | TEMP_PK     | 10 | 30 |  1   (0)| 00:00:01 |
|*  6 |      INDEX UNIQUE SCAN| TEMP2_PK    | 10 | 30 |  0   (0)| 00:00:01 |
|*  7 |   INDEX UNIQUE SCAN   | STUDENT_IDX |  1 |  3 |  0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

we know that the values are unique.... when it comes to estimated rows....
NESTED LOOPS SEMI ---> it always thinks/estimates that there is 1 row less...

This is one aspect of it...
another aspect of what i was especially concerned is the use of SORT - ORDER
BY... just adding a ORDER BY STUDENT_ID to the main statement, we always end up
sorting the results which can be explained with use of HASH UNIQUE... it is
using HASH UNIQUE even for very small resultsets

----------------------------------------------------------
Plan hash value: 2349408382

--------------------------------------------------------------------------------------
| Id  | Operation       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |     |   9 |  54 |   2  (50)| 00:00:01 |
|   1 |  SORT ORDER BY       |     |   9 |  54 |   2  (50)| 00:00:01 |
|   2 |   NESTED LOOPS       |     |   9 |  54 |   2  (50)| 00:00:01 |
|   3 |    VIEW       | VW_NSO_1    |   9 |  27 |   1   (0)| 00:00:01 |
|   4 |     HASH UNIQUE        |     |   9 |  54 |  |     |
|   5 |      NESTED LOOPS SEMI |     |   9 |  54 |   1   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN | TEMP_PK     |  10 |  30 |   1   (0)| 00:00:01 |
|*  7 |       INDEX UNIQUE SCAN| TEMP2_PK    |  10 |  30 |   0   (0)| 00:00:01 |
|*  8 |    INDEX UNIQUE SCAN   | STUDENT_IDX |   1 |   3 |   0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

if it had done SORT UNIQUE here instead of hash unique considering the cost..
we could have easily avoided the additional PGA related operation while sorting
the final results from the students table...

if we notice it uses pga both the cases... sort order by and hash unique....
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation       | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)|
E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |     |    1 |        |       |     2 (100)|
  |     10 |00:00:00.01 |     9 |       |       |  |
|   1 |  SORT ORDER BY       |     |    1 |      9 |    54 |     2 (50)|
00:00:01 |     10 |00:00:00.01 |     9 |  2048 |  2048 | 2048  (0)|
|   2 |   NESTED LOOPS       |     |    1 |      9 |    54 |     2 (50)|
00:00:01 |     10 |00:00:00.01 |     9 |       |       |  |
|   3 |    VIEW       | VW_NSO_1    |    1 |      9 |    27 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     5 |       |       |  |
|   4 |     HASH UNIQUE        |     |    1 |      9 |    54 |    |       |
10 |00:00:00.01 |     5 |  2294K|  2294K| 1064K (0)|
|   5 |      NESTED LOOPS SEMI |     |    1 |      9 |    54 |     1 (0)|
00:00:01 |     10 |00:00:00.01 |     5 |       |       |  |
|*  6 |       INDEX RANGE SCAN | TEMP_PK     |    1 |     10 |    30 |     1
(0)| 00:00:01 |     10 |00:00:00.01 |     1 |       |       |  |
|*  7 |       INDEX UNIQUE SCAN| TEMP2_PK    |   10 |     10 |    30 |     0
(0)|       |     10 |00:00:00.01 |     4 |       |       |  |
|*  8 |    INDEX UNIQUE SCAN   | STUDENT_IDX |   10 |      1 |     3 |     0
(0)|       |     10 |00:00:00.01 |     4 |       |       |  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Please do let me know if I missed something...  regarding this entire thing...

Thanks,
Vishnu

On Tue, Jan 7, 2020 at 4:32 PM Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:

example of hash function that kicks in during execution:
select roll from randomload where roll in (select a.roll from temp a, temp b
where a.roll = b.roll) // this case sorting doesn't happen - only hashing to
eliminate duplicates...
select roll from randomload where roll in (select a.roll from temp) // both
sorting and hashing to eliminate the duplicates .

The appearance of ant hashing in one of these examples is probably a pure
costing decision based on the cardinality and clustering estimate - here's a
plan, pulled from memory,  that shows no hashing at all:

select object_name from t1 where object_id in (select object_id from t2)

Plan hash value: 2010303231

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)|
Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |    65 (100)|
     |
|   1 |  NESTED LOOPS                |       |    62 |  2976 |    65   (2)|
00:00:01 |
|   2 |   NESTED LOOPS               |       |    62 |  2976 |    65   (2)|
00:00:01 |
|   3 |    SORT UNIQUE               |       |    62 |   248 |     2   (0)|
00:00:01 |
|   4 |     TABLE ACCESS FULL        | T2    |    62 |   248 |     2   (0)|
00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     1   (0)|
00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    44 |     2   (0)|
00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("OBJECT_ID"="OBJECT_ID")

The optimizer has used a sort unique rather than a hash unique on T2 because it
expects to find a small number of rows, and sorting the values for uniqueness
may result in more efficient access for the nested loop join.

Bottom line - Oracle uses a sort to ensure uniqueness (and minimal iterations)
for an IN-list; but for an in subquery it simply picks the best optimizer
strategy for handling the join and subsequent activity.  Here's another plan
that doesn't even try for uniqueness (using the same two tables, reversing the
roles), but switches to a semi-join:

select object_name from t2 where object_id in (select object_id from t1)

Plan hash value: 3107539380

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |       |       |    25 (100)|          |
|*  1 |  HASH JOIN SEMI       |       |    62 |  2790 |    25  (24)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | T2    |    62 |  2480 |     2   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| T1_I1 | 56753 |   277K|    20  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("OBJECT_ID"="OBJECT_ID")

Regards
Jonathan Lewis

PS  19.3.0.0
create table t1 as select * from all_objects;
create index t1_i1 on t1(object_id);
create table t2 as select * from t1 where mod(object_id,1000) = 0;

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
<oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>> on behalf
of Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>
Sent: 06 January 2020 08:55
To: Andy Sayer
Cc: Oracle-L Freelists
Subject: Re: reverse key index -- Sorting expected?

Hi Andy,

Now that I look at the sentence I have written - it doesn't make sense..
regarding hash function... Let me rephrase things more clearly and precisely
the way i am looking at...
I was referring to the functionality used in eliminating the duplicate values
in the IN list  ( )...  as it makes the final transformed query... (i believe
it is using hash function due to the following)...

example of hash function that kicks in during execution:
select roll from randomload where roll in (select a.roll from temp a, temp b
where a.roll = b.roll) // this case sorting doesn't happen - only hashing to
eliminate duplicates...
select roll from randomload where roll in (select a.roll from temp) // both
sorting and hashing to eliminate the duplicates .

db file sequential read event was basically to see the ordering of blocks read
from extent to extent and based on the index treedump to see precisely whether
it is reading from left most index leaf block to the right most index leaf
block... now that looking at it from a different perspective.... even ordering
the values in the ascending order doesn't make sense after a certain point as
the probability or even potentially benefits of this diminishes as the number
of entries in the leaf blocks reduces due to
-  50-50 block splits or
-  nature of data load (eg: orders from a particular store increase
considerably during a certain time when they give offers)
- due to the size of the indexed column
- event multi column indexes and data type of the column (char or string)
- in which inherently less number of entries in leaf blocks - (also block size
dependent) -

the wired part of this is that it even sorts character data
select * from students where name in ('Vishnu','VISHNU','VIshnu'); // doesn't
really make sense...

Unless I have missed something completely - i cant see a valid purpose of
sorting entries in the IN- List for a wide majority of the use cases (unless
the entries lists in the IN clause are extremely close to each other) - which
reduces the latch gets by only 1 even in this case - this raises another
question - does this even sort if we use bind variables have to test this...

Thanks,
Vishnu

On Mon, Jan 6, 2020 at 1:24 PM Andy Sayer
<andysayer@xxxxxxxxx<mailto:andysayer@xxxxxxxxx><mailto:andysayer@xxxxxxxxx<mailto:andysayer@xxxxxxxxx>>>
wrote:
" Is this a bug? from what I look at applying a hash function to eliminate the
duplicates and sorting the results especially for reverse key indexes is
basically an unnecessary step"

What hash function are you referring to and how do the db file sequential read
events indicate this is happening?

How did you populate this students table? If the searched column was generated
by a sequence with no gaps then one would expect the keys you're looking for to
be in different leaf blocks for the normal indexes so when you do the index
traversal you might be more likely to require reading from disk more. The
reverse key index might place the index keys closer together or further aspart
depending on the implementation. Neither can really be proved just by looking
at the physical block gets. Given the fetch in between sequential reads it
looks like a different amount of data is actually being requested, does that
mean that you used a differently populated table for the two tests?

Additionally we cannot see which object your sequential reads are being
reported against, it could be the table or the index. The trace data includes
the object id so you can look it up for us.

Reverse key indexes are usually related to increasing physical IO for standard
operations because the working set you usually look at in a table are the few
most recently inserted rows. For a reverse key index using a sequence to
generate the value, these last few values are going to be scattered far around
the index (which is why they're usually naively used to help concurrency). They
also restrict the use of range predicates because the values are not sorted in
the same way that the index will be. What is your use case here? Or are you
just experimenting?

Thanks,
Andrew

On Mon, 6 Jan 2020 at 06:39, Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx><mailto:vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>>
wrote:
Hi,

consider the following case:
create table randomload(roll number, name varchar2(20), mark1 number);
populate the table with randomdata.
create index roll_desc_idx on randomload(roll desc);
or even ascending index...
roll is a unique key monotonically increasing.

so if the maximum value is 1000000 - this value is stored in the left most leaf
block in the index btree structure.

Now consider the following situation:
select * from students where student_id in (1,100000, 200000, 300000, 400000);
we can use the above values in any order we want, here is precisely what oracle
is doing even before the final query transformation is done, it eliminates the
duplicate values and sorts the values in the ascending order. the query after
final transformation becomes..
select * from students where student_id =1 or student_id = 100000 or
student_id=200000 or student_id = 300000 or student_id = 400000;

this works precisely as expected for all normal indexes and descending indexes,
since sorting them in ascending order as oracle can minimize the IO if the
values are adjacent to each other. but works horrible for reverse key
indexes.... as there is no order to which the oracle scans the leaf blocks.

I mean by default Oracle sorts these values in ascending order regardless of
what, for the descending index: there is no issue with the descending indexes
as well (but not in all the cases) ...
but for reverse key indexes?

trace of index leaf block reads for a descending index...
WAIT #140268739421912: nam='db file sequential read' ela= 90 file#=7
block#=30099 blocks=1 obj#=74642 tim=5089544852
WAIT #140268739421912: nam='db file sequential read' ela= 95 file#=7
block#=29904 blocks=1 obj#=74642 tim=5089545009
WAIT #140268739421912: nam='db file sequential read' ela= 83 file#=7
block#=29667 blocks=1 obj#=74642 tim=5089545184
WAIT #140268739421912: nam='db file sequential read' ela= 101 file#=7
block#=29430 blocks=1 obj#=74642 tim=5089545354
WAIT #140268739421912: nam='db file scattered read' ela= 199 file#=7
block#=29424 blocks=6 obj#=74642 tim=5089545650
WAIT #140268739421912: nam='db file sequential read' ela= 86 file#=7
block#=29192 blocks=1 obj#=74642 tim=5089545807

the following is the trace for reverse key index:
WAIT #140452225803960: nam='db file sequential read' ela= 215 file#=7
block#=17470 blocks=1 obj#=74643 tim=5897433740
WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
block#=16909 blocks=1 obj#=74643 tim=5897434021
FETCH
#140452225803960:c=631,e=1033,p=10,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3725354939,tim=5897434099
WAIT #140452225803960: nam='SQL*Net message from client' ela= 326 driver
id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897434480
WAIT #140452225803960: nam='db file sequential read' ela= 228 file#=7
block#=18162 blocks=1 obj#=74643 tim=5897434785
WAIT #140452225803960: nam='db file sequential read' ela= 221 file#=7
block#=18080 blocks=1 obj#=74643 tim=5897435085
WAIT #140452225803960: nam='SQL*Net message to client' ela= 1 driver
id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897435152
WAIT #140452225803960: nam='db file sequential read' ela= 214 file#=7
block#=19545 blocks=1 obj#=74643 tim=5897435399
WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
block#=19383 blocks=1 obj#=74643 tim=5897435681
WAIT #140452225803960: nam='db file sequential read' ela= 209 file#=7
block#=20929 blocks=1 obj#=74643 tim=5897435960
WAIT #140452225803960: nam='db file sequential read' ela= 192 file#=7
block#=20686 blocks=1 obj#=74643 tim=5897436213
WAIT #140452225803960: nam='db file sequential read' ela= 183 file#=7
block#=22313 blocks=1 obj#=74643 tim=5897436471
WAIT #140452225803960: nam='db file sequential read' ela= 202 file#=7
block#=21989 blocks=1 obj#=74643 tim=5897436735
WAIT #140452225803960: nam='db file sequential read' ela= 223 file#=7
block#=23696 blocks=1 obj#=74643 tim=5897437039
WAIT #140452225803960: nam='db file sequential read' ela= 207 file#=7
block#=23292 blocks=1 obj#=74643 tim=5897437320

Is this a bug? from what I look at applying a hash function to eliminate the
duplicates and sorting the results especially for reverse key indexes is
basically an unnecessary step....

Thanks,
Vishnu

--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: reverse key index -- Sorting expected?
  - From: Vishnu Potukanuma

References:
- reverse key index -- Sorting expected?
  - From: Vishnu Potukanuma
- Re: reverse key index -- Sorting expected?
  - From: Andy Sayer
- Re: reverse key index -- Sorting expected?
  - From: Vishnu Potukanuma
- Re: reverse key index -- Sorting expected?
  - From: Jonathan Lewis
- Re: reverse key index -- Sorting expected?
  - From: Vishnu Potukanuma

Re: reverse key index -- Sorting expected?

Other related posts: