Some weird behavior with a collection in a subquery.

Hello.

asktom is busy I guess,

Let me try my luck here.

Can you please explain=20

1. Why CBO considers semi-join on a collection in "IN-LIST". Such course of=
=20
actions propagates incorrect cardinality to the further steps of the execut=
ion=20
path and eventually leads to a bad plan with high cardinality of the
collection.


Artificial setup environment can be achieved by running  following:

create table t1 as select * from all_objects;

create unique index t1_i on t1(object_id);

create table t2 as select * from all_objects;

create index t2_i on t2(object_id);

update t1 set status =3D 0;

update t2 set status =3D 0;

create or replace type table_of_number as table of number;
/

create or replace function getnumericlist(card_n number) return table_of_nu=
mber
as
    l_numeric_list table_of_number;
begin
    select object_id=20
      bulk collect into l_numeric_list
      from (select distinct object_id from t1)=20
     where rownum <=3D card_n;

    return l_numeric_list;                                 =20
end getnumericlist;
/

begin
    dbms_stats.delete_table_stats(null, 't1');
    dbms_stats.gather_table_stats(null, 't1',=20
        estimate_percent =3D> 100,=20
        method_opt =3D> 'for all columns size 254',=20
        cascade =3D> true);

    dbms_stats.delete_table_stats(null, 't2');
    dbms_stats.gather_table_stats(null, 't2',=20
        estimate_percent =3D> 100,=20
        method_opt =3D> 'for all columns size 254',=20
        cascade =3D> true);
end;
/


Test query:
select count(*) from (
select t1.object_name,t1.object_type, t2.object_name
  from t1, t2
where t1.object_id in (select /*+ cardinality(nlist 1000) */ *
                          from table(cast(getnumericlist(1000) as=20
table_of_number)) nlist)
   and t2.object_id =3D t1.object_id
   and t1.status =3D 0
   and t2.status =3D 0);

(for proper test use same value for getnumericlist function as for cardinal=
ity=20
parameter)

Plan for low carinality of the collection:
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=3DCHOOSE (Cost=3D71 Card=3D1 Bytes=3D1=
6)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=3D2 Card=3D1 Byte=
s=3D7)
   3    2       NESTED LOOPS (Cost=3D71 Card=3D10 Bytes=3D160)
   4    3         NESTED LOOPS (Cost=3D51 Card=3D10 Bytes=3D90)
   5    4           SORT (UNIQUE)
   6    5             COLLECTION ITERATOR (PICKLER FETCH) OF 'GETNUMERICLIS=
T'
   7    4           TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=3D1 Card=3D=
1=20
Bytes=3D7)
   8    7             INDEX (UNIQUE SCAN) OF 'T1_I' (UNIQUE)
   9    3         INDEX (RANGE SCAN) OF 'T2_I' (NON-UNIQUE) (Cost=3D1 Card=
=3D1)

Offending plan for collection cardinality > 13 (my case.. magic number!? :)=
) :
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=3DCHOOSE (Cost=3D93 Card=3D1 Bytes=3D1=
6)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=3D2 Card=3D1 Byte=
s=3D7)
   3    2       NESTED LOOPS (Cost=3D93 Card=3D1 Bytes=3D16)
   4    3         HASH JOIN (SEMI) (Cost=3D91 Card=3D1 Bytes=3D9)
   5    4           TABLE ACCESS (FULL) OF 'T1' (Cost=3D51 Card=3D30295=20
Bytes=3D212065)
   6    4           COLLECTION ITERATOR (PICKLER FETCH) OF 'GETNUMERICLIST'
   7    3         INDEX (RANGE SCAN) OF 'T2_I' (NON-UNIQUE) (Cost=3D1 Card=
=3D1)

As you can see the cardinality of the outcome from collection and T1
is 1. It is
always evaluated as 1 regardless of the collection cardinality. In fact it =
is=20
evaluated as "card(t1) * selectivity(t1.object_id)" from 10053 trace.

Result of the offending query with collection cardinality =3D 100000

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      55746  consistent gets
          0  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        344  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          1  rows processed

Probably one can predict that hash join on T2 reduces LIOs substantially:

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=3DCHOOSE (Cost=3D143 Card=3D1 Bytes=3D=
16)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=3D143 Card=3D1 Bytes=3D16)
   3    2       HASH JOIN (SEMI) (Cost=3D91 Card=3D1 Bytes=3D9)
   4    3         TABLE ACCESS (FULL) OF 'T1' (Cost=3D51 Card=3D30295 Bytes=
=3D212065)
   5    3         COLLECTION ITERATOR (PICKLER FETCH) OF 'GETNUMERICLIST'
   6    2       TABLE ACCESS (FULL) OF 'T2' (Cost=3D51 Card=3D30296 Bytes=
=3D212072)


Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
        456  consistent gets
          0  physical reads
          0  redo size
        211  bytes sent via SQL*Net to client
        344  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          1  rows processed


2. This doesn't happen with real tables. Is it possible to suppress semi-jo=
in=20
for this particular type of queries?=20

Side note: always_semi_join=3Doff isn't working for me in some versions=20
(9.2.0.5+SA #68, 9.2.0.6); 10.1.0.2 works by I'd rather find the conceptual=
=20
workaround that will allow legitimate semi-joins take place.

Thanks.

- Vladimir

P.S. after posting this found that always_semi_join obsolete and
substituted with _always_semi_join, so please disregard note part.
--
http://www.freelists.org/webpage/oracle-l

Other related posts: