hash group by estimates wrong cardinality with statistics

From: Ls Cheng <exriscer@xxxxxxxxx>
To: Oracle Mailinglist <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 30 Apr 2015 00:59:29 +0200

Hi

I have noticed in a simple query between two tables when table have no
statistics dynamic samping estimates good cardinality for hash group by
however when statistics are gathered ithe CBO estimates one row for the
hash gruop by operation.

This is a test case:

drop table t100;
drop table t101;

create table t100 as select * from dba_source where 1 = 0;
create table t101 as select * from dba_source where 1 = 0;

insert into t100 select * from dba_source;
delete t100 where owner = 'SYS' and rownum <49457;
insert into t101 select * from dba_source where owner = 'SYS' and rownum <
100001;

select a_owner, count(*)
from (
select a.owner a_owner, a.name, b.name b_name
from t100 a left outer join t101 b
on a.owner = b.owner
and a.name = b.name
and a.type = b.type
and a.line = b.line)
where b_name is null
group by a_owner;

no statistics gives this plan:

Plan hash value: 908433588

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost
(%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 258K| 27M| | 2929
(1)| 00:00:36 |
*| 1 | HASH GROUP BY | | 258K| 27M| | 2929
(1)| 00:00:36 |*
|* 2 | FILTER | | | | |
| |
|* 3 | HASH JOIN RIGHT OUTER| | 258K| 27M| 6400K| 2915
(1)| 00:00:35 |
| 4 | TABLE ACCESS FULL | T101 | 97791 | 5252K| | 582
(1)| 00:00:07 |
| 5 | TABLE ACCESS FULL | T100 | 258K| 13M| | 1200
(1)| 00:00:15 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("B"."NAME" IS NULL)
3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
"A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))

Note
-----
- dynamic sampling used for this statement (level=2)

with statistics

Execution Plan
----------------------------------------------------------
Plan hash value: 908433588

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost
(%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 68 | | 2575
(1)| 00:00:31 |
*| 1 | HASH GROUP BY | | 1 | 68 | | 2575
(1)| 00:00:31 |*
|* 2 | FILTER | | | | |
| |
|* 3 | HASH JOIN RIGHT OUTER| | 220K| 14M| 4304K| 2563
(1)| 00:00:31 |
| 4 | TABLE ACCESS FULL | T101 | 100K| 3125K| | 650
(1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | T100 | 220K| 7746K| | 1199
(1)| 00:00:15 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("B"."NAME" IS NULL)
3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
"A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))

This is 11.2.0.4 in Linux. Any lights?

Thanks

Follow-Ups:
- Re: hash group by estimates wrong cardinality with statistics
  - From: Martin Preiss

hash group by estimates wrong cardinality with statistics

Other related posts: