Re: FW: Bitmap index not used when joining tables

  • From: K Gopalakrishnan <kaygopal@xxxxxxxxx>
  • To: chris.saxon@xxxxxxxxx
  • Date: Mon, 2 Jun 2014 14:25:56 -0500

Chris,

(From my Oracle Wait Interface Book: Page 278...Bitmap index costing,
changing the costing methods,etc)

//www.freelists.org/post/oracle-l/BITMAP-index-cost-10053-trace,4

-Gopal




On Sun, Jun 1, 2014 at 4:32 AM, Chris Saxon <chris.saxon@xxxxxxxxx> wrote:

> Thanks for the explanation Jonathan.
>
> Could you explain where the 20% value comes from please? Is this a default
> for bitmap indexes, or something you've inferred from the info posted?
>
>
> On 31 May 2014 20:24, Mark W. Farnham <mwf@xxxxxxxx> wrote:
>
>> hotel email bounce; trying again
>>
>>
>>
>> *From:* Mark W. Farnham [mailto:mwf@xxxxxxxx]
>> *Sent:* Saturday, May 31, 2014 3:14 PM
>> *To:* 'jonathan@xxxxxxxxxxxxxxxxxx'; 'oracle-l@xxxxxxxxxxxxx'
>> *Subject:* RE: Bitmap index not used when joining tables
>>
>>
>>
>> I do wonder about what your results might be for these two queries if you
>> modified your query slightly as:
>>
>>
>>
>> select
>>
>> --+ gather_plan_statistics
>>
>>  s.* from sales_fact s
>>
>>        where s.rowid in (
>>
>>        select
>>
>>        --+ no_merge
>>
>>           s2.rowid from sales_fact s2
>>
>>           where s2.date_id in (
>>
>>           select
>>
>>           --+ no_merge
>>
>>              d.date_id from date_d d
>>
>>              where calendar_date between date’2013-12-31’ and
>> date’2014-01-01’
>>
>>           )
>>
>>        );
>>
>>
>>
>> The idea being that if the only column it needs in the unmerged subquery
>> (rowid) can be sourced from the index, this should be less total work.
>>
>> Possibly this is also optimized if the subquery is ordered by rowid, but
>> that’s not the curiosity at this point. Of course even if this works you
>> would only want to form it up this way when you knew it was to your benefit
>> (knowledge the CBO can only guess by way of the thumbrule quoted by JL, and
>> some other thumbrule for a literal range.)
>>
>>
>>
>> I just typed this in and didn’t test it, and working is release
>> dependent, so you and I got lucky if this works via cut and paste. Once
>> debugged I would sort of expect this to use the index in both cases to
>> fetch back the rowid list.
>>
>>
>>
>> mwf
>>
>> *From:* oracle-l-bounce@xxxxxxxxxxxxx [
>> mailto:oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx>] *On
>> Behalf Of *Jonathan Lewis
>> *Sent:* Saturday, May 31, 2014 1:11 PM
>> *To:* oracle-l@xxxxxxxxxxxxx
>> *Subject:* RE: Bitmap index not used when joining tables
>>
>>
>>
>>
>>
>> It's not surprising for the example you've given.
>>
>>
>>
>> The basic principle is that the cost of a single table access via a
>> bitmap index is likely to be different from the cost of access via the
>> equivalent btree index because the bitmap index has no information about
>> data clustering (i.e no clustering_factor) so it uses a guess.  Sometimes
>> this means the cost of the bitmap will be higher, sometimes lower,
>> sometimes the same as for a btree.  In your case the btree clustering
>> factor will be very low (because of the order by in the CTAS) while the
>> guess basically assumes that 20% of the data will be very widely scattered
>> - in your case that probably means 400 blocks (20% of 2000 rows => 400
>> blocks).
>>
>>
>>
>>
>>
>>
>>
>>
>> Regards
>> Jonathan Lewis
>> http://jonathanlewis.wordpress.com
>> @jloracle
>> ------------------------------
>>
>> *From:* oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on
>> behalf of Chris Saxon [chris.saxon@xxxxxxxxx]
>> *Sent:* 31 May 2014 16:21
>> *To:* oracle-l@xxxxxxxxxxxxx
>> *Subject:* Bitmap index not used when joining tables
>>
>> Hi,
>>
>>
>>
>> I've been testing using bitmap indexes on 11.2.0.2 EE. When joining two
>> tables on a column with a BTree index, the index is used in the execution
>> plan. If this index is changed to a bitmap index, Oracle no longer uses the
>> index when executing the query! It assigns a higher cost to using the
>> bitmap index when joining, despite this being a cheaper approach (in terms
>> of consistent gets).
>>
>>
>>
>> To see this, I created a date dimension table with 515 days (rows) and a
>> sales fact table with 1,000 rows for each day:
>>
>>
>>
>> create table date_d (
>>
>>   date_id integer not null primary key,
>>
>>   calendar_date date not null unique
>>
>> );
>>
>>
>>
>> create table sales_fact (
>>
>>   date_id integer not null
>>
>>     references date_d (date_id),
>>
>>   quantity number not null,
>>
>>   total_value number not null
>>
>> );
>>
>>
>>
>> insert into date_d
>>
>>   select rownum, date'2013-01-01'-1+rownum
>>
>>   from   dual
>>
>>   connect by level <= sysdate - date'2013-01-01';
>>
>>
>>
>> insert into sales_fact
>>
>> with rws as (select * from dual connect by level <= 1000)
>>
>>   select d.date_id, round(dbms_random.value(1, 20)),
>> round(dbms_random.value(10, 100), 2)
>>
>>   from   date_d d
>>
>>   cross  join rws
>>
>>   order  by d.date_id;
>>
>>
>>
>> begin
>>
>>   dbms_stats.gather_table_stats(user, 'sales_fact');
>>
>>   dbms_stats.gather_table_stats(user, 'date_d');
>>
>> end;
>>
>> /
>>
>>
>>
>> If I create a BTree index on SALES_FACT.DATE_ID, then join from the date
>> dim to the fact table, restricting to two days, Oracle uses the index on
>> the fact table as I would expect (as we're fetching 2,000 of 515,000 rows):
>>
>>
>>
>> create index safa_date_id on sales_fact (date_id);
>>
>>
>>
>> set autotrace trace
>>
>> select s.* from sales_fact s join date_d d
>>
>> on     d.date_id = s.date_id
>>
>> where  calendar_date between date'2013-12-31' and date'2014-01-01';
>>
>>
>>
>> set autotrace off
>>
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 2189554905
>>
>>
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | Id  | Operation                     | Name         | Rows  | Bytes |
>> Cost (%CPU)| Time     |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> |   0 | SELECT STATEMENT              |              |  3002 | 69046 |
>>  21   (0)| 00:00:01 |
>>
>> |   1 |  NESTED LOOPS                 |              |       |       |
>>          |          |
>>
>> |   2 |   NESTED LOOPS                |              |  3002 | 69046 |
>>  21   (0)| 00:00:01 |
>>
>> |   3 |    TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |
>>   3   (0)| 00:00:01 |
>>
>> |*  4 |     INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |
>>   2   (0)| 00:00:01 |
>>
>> |*  5 |    INDEX RANGE SCAN           | SAFA_DATE_ID |  1000 |       |
>>   3   (0)| 00:00:01 |
>>
>> |   6 |   TABLE ACCESS BY INDEX ROWID | SALES_FACT   |  1000 | 11000 |
>>   6   (0)| 00:00:01 |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>>
>>    4 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00',
>> 'syyyy-mm-dd
>>
>>               hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01
>> 00:00:00', 'syyyy-mm-dd
>>
>>               hh24:mi:ss'))
>>
>>    5 - access("D"."DATE_ID"="S"."DATE_ID")
>>
>>
>>
>>
>>
>> Statistics
>>
>> ----------------------------------------------------------
>>
>>           1  recursive calls
>>
>>           0  db block gets
>>
>>         290  consistent gets
>>
>>           6  physical reads
>>
>>           0  redo size
>>
>>       36195  bytes sent via SQL*Net to client
>>
>>        1839  bytes received via SQL*Net from client
>>
>>         135  SQL*Net roundtrips to/from client
>>
>>           0  sorts (memory)
>>
>>           0  sorts (disk)
>>
>>        2000  rows processed
>>
>>
>>
>> However, if I drop the normal index and re-create it as a bitmap index
>> the query above changes to a FTS on the SALES_FACT table:
>>
>>
>>
>> set autotrace off
>>
>> drop index safa_date_id;
>>
>> create bitmap index safa_date_id on sales_fact (date_id);
>>
>>
>>
>> set autotrace trace
>>
>> select s.* from sales_fact s join date_d d
>>
>> on     d.date_id = s.date_id
>>
>> where  calendar_date between date'2013-12-31' and date'2014-01-01';
>>
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 525754326
>>
>>
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>> | Id  | Operation                    | Name         | Rows  | Bytes |
>> Cost (%CPU)| Time     |
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>> |   0 | SELECT STATEMENT             |              |  3002 | 69046 |
>> 315   (2)| 00:00:04 |
>>
>> |*  1 |  HASH JOIN                   |              |  3002 | 69046 |
>> 315   (2)| 00:00:04 |
>>
>> |   2 |   TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |
>> 3   (0)| 00:00:01 |
>>
>> |*  3 |    INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |
>> 2   (0)| 00:00:01 |
>>
>> |   4 |   TABLE ACCESS FULL          | SALES_FACT   |   515K|  5532K|
>> 310   (1)| 00:00:04 |
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>>
>>    1 - access("D"."DATE_ID"="S"."DATE_ID")
>>
>>    3 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00',
>> 'syyyy-mm-dd
>>
>>               hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01
>> 00:00:00', 'syyyy-mm-dd
>>
>>               hh24:mi:ss'))
>>
>>
>>
>>
>>
>> Statistics
>>
>> ----------------------------------------------------------
>>
>>           1  recursive calls
>>
>>           0  db block gets
>>
>>        1267  consistent gets
>>
>>           0  physical reads
>>
>>           0  redo size
>>
>>       36195  bytes sent via SQL*Net to client
>>
>>        1839  bytes received via SQL*Net from client
>>
>>         135  SQL*Net roundtrips to/from client
>>
>>           0  sorts (memory)
>>
>>           0  sorts (disk)
>>
>>        2000  rows processed
>>
>>
>>
>> If we hint the query to use the bitmap index, we can see it has a higher
>> cost. The autotrace stats report significantly fewer consistent gets though:
>>
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 1520624055
>>
>>
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | Id  | Operation                     | Name         | Rows  | Bytes |
>> Cost (%CPU)| Time     |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> |   0 | SELECT STATEMENT              |              |  3002 | 69046 |
>> 416   (0)| 00:00:05 |
>>
>> |   1 |  NESTED LOOPS                 |              |       |       |
>>          |          |
>>
>> |   2 |   NESTED LOOPS                |              |  3002 | 69046 |
>> 416   (0)| 00:00:05 |
>>
>> |   3 |    TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |
>>   3   (0)| 00:00:01 |
>>
>> |*  4 |     INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |
>>   2   (0)| 00:00:01 |
>>
>> |   5 |    BITMAP CONVERSION TO ROWIDS|              |       |       |
>>          |          |
>>
>> |*  6 |     BITMAP INDEX SINGLE VALUE | SAFA_DATE_ID |       |       |
>>          |          |
>>
>> |   7 |   TABLE ACCESS BY INDEX ROWID | SALES_FACT   |  1000 | 11000 |
>> 416   (0)| 00:00:05 |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>>
>>    4 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00',
>> 'syyyy-mm-dd
>>
>>               hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01
>> 00:00:00', 'syyyy-mm-dd
>>
>>               hh24:mi:ss'))
>>
>>    6 - access("D"."DATE_ID"="S"."DATE_ID")
>>
>>
>>
>>
>>
>> Statistics
>>
>> ----------------------------------------------------------
>>
>>           1  recursive calls
>>
>>           0  db block gets
>>
>>         152  consistent gets
>>
>>           1  physical reads
>>
>>           0  redo size
>>
>>       36195  bytes sent via SQL*Net to client
>>
>>        1839  bytes received via SQL*Net from client
>>
>>         135  SQL*Net roundtrips to/from client
>>
>>           0  sorts (memory)
>>
>>           0  sorts (disk)
>>
>>        2000  rows processed
>>
>>
>>
>> If you remove the join to the date dim and just use the date ids, Oracle
>> uses the index as expected:
>>
>>
>>
>> select * from sales_fact
>>
>> where  date_id between 365 and 366;
>>
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 2749560877
>>
>>
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>> | Id  | Operation                    | Name         | Rows  | Bytes |
>> Cost (%CPU)| Time     |
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>> |   0 | SELECT STATEMENT             |              |  3002 | 33022 |
>> 221   (0)| 00:00:03 |
>>
>> |   1 |  TABLE ACCESS BY INDEX ROWID | SALES_FACT   |  3002 | 33022 |
>> 221   (0)| 00:00:03 |
>>
>> |   2 |   BITMAP CONVERSION TO ROWIDS|              |       |       |
>>        |          |
>>
>> |*  3 |    BITMAP INDEX RANGE SCAN   | SAFA_DATE_ID |       |       |
>>        |          |
>>
>>
>> ---------------------------------------------------------------------------------------------
>>
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>>
>>    3 - access("DATE_ID">=365 AND "DATE_ID"<=366)
>>
>>
>>
>>
>>
>> Statistics
>>
>> ----------------------------------------------------------
>>
>>           1  recursive calls
>>
>>           0  db block gets
>>
>>         144  consistent gets
>>
>>           0  physical reads
>>
>>           0  redo size
>>
>>       36195  bytes sent via SQL*Net to client
>>
>>        1839  bytes received via SQL*Net from client
>>
>>         135  SQL*Net roundtrips to/from client
>>
>>           0  sorts (memory)
>>
>>           0  sorts (disk)
>>
>>        2000  rows processed
>>
>>
>>
>> Looking at the 10053 trace, I can see this is because the bm index join
>> cost is calculated as higher than the FTS of SALES_FACT:
>>
>>
>>
>> NL Join
>>
>>   Outer table: Card: 3.00  Cost: 3.00  Resp: 3.00  Degree: 1  Bytes: 12
>>
>> Access path analysis for SALES_FACT
>>
>>   Inner table: SALES_FACT  Alias: S
>>
>>   Access Path: TableScan
>>
>>     NL Join:  Cost: 929.82  Resp: 929.82  Degree: 1
>>
>>       Cost_io: 920.00  Cost_cpu: 317629069
>>
>>       Resp_io: 920.00  Resp_cpu: 317629069
>>
>>   ****** trying bitmap/domain indexes ******
>>
>>   Access Path: index (AllEqJoinGuess)
>>
>>     Index: SAFA_DATE_ID
>>
>>     resc_io: 1.00  resc_cpu: 8171
>>
>>     ix_sel: 0.001942  ix_sel_with_filters: 0.001942
>>
>>     NL Join : Cost: 6.00  Resp: 6.00  Degree: 1
>>
>>       Cost_io: 6.00  Cost_cpu: 47359
>>
>>       Resp_io: 6.00  Resp_cpu: 47359
>>
>>   Bitmap nodes:
>>
>>     Used SAFA_DATE_ID
>>
>>       Cost = 6.001464, sel = 0.001942
>>
>>   Access path: Bitmap index - accepted
>>
>>     Cost: 416.121488 Cost_io: 415.999594 Cost_cpu: 3943626.145192 Sel:
>> 0.001942
>>
>>     Not Believed to be index-only
>>
>>   ****** finished trying bitmap/domain indexes ******
>>
>>
>>
>>   Best NL cost: 416.12
>>
>>           resc: 416.12  resc_io: 416.00  resc_cpu: 3943626
>>
>>           resp: 416.12  resp_io: 416.00  resc_cpu: 3943626
>>
>>
>>
>> However, same section of the 10053 trace shows the following when using a
>> BTree index on the SALES_FACT.DATE_ID column:
>>
>>
>>
>> NL Join
>>
>>   Outer table: Card: 3.00  Cost: 3.00  Resp: 3.00  Degree: 1  Bytes: 12
>>
>> Access path analysis for SALES_FACT
>>
>>   Inner table: SALES_FACT  Alias: S
>>
>>   Access Path: TableScan
>>
>>     NL Join:  Cost: 929.82  Resp: 929.82  Degree: 1
>>
>>       Cost_io: 920.00  Cost_cpu: 317629069
>>
>>       Resp_io: 920.00  Resp_cpu: 317629069
>>
>>   Access Path: index (AllEqJoinGuess)
>>
>>     Index: SAFA_DATE_ID
>>
>>     resc_io: 6.00  resc_cpu: 433579
>>
>>     ix_sel: 0.001942  ix_sel_with_filters: 0.001942
>>
>>     NL Join : Cost: 21.04  Resp: 21.04  Degree: 1
>>
>>       Cost_io: 21.00  Cost_cpu: 1323580
>>
>>       Resp_io: 21.00  Resp_cpu: 1323580
>>
>>
>>
>>   Best NL cost: 21.04
>>
>>           resc: 21.04  resc_io: 21.00  resc_cpu: 1323580
>>
>>           resp: 21.04  resp_io: 21.00  resc_cpu: 1323580
>>
>>
>>
>> Why does this happen? Is this a bug or expected behaviour?
>>
>>
>>
>> Thanks,
>>
>> Chris
>>
>> www.sqlfail.com
>>
>
>

Other related posts: