Re: hash group by estimates wrong cardinality with statistics

  • From: Martin Preiss <mtnpreiss@xxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 30 Apr 2015 08:24:55 +0200

Hello,

some comments without a particular order:

* replaying your example in 12.1.0.2 I get similar results: the same
plan for both queries and a hash group by cardinality of 1 with
statistics and equal to the cardinality of the preceding hash join
without statistics.
* given that the operation returns 19 rows in my system both estimates
are questionable: maybe the 1 is even the better guess.
* as far as I know 11g introduced some changes in the calculation of
the cardinality of group by operations. At least that's what
Alexandr Antonov mentioned in

http://antonoal.blogspot.de/2012/12/11g-group-by-selectivity-improvements.html
in 2012. According to this article the change is controlled by the
parameter _optimizer_improve_selectivity.
* in older releases the cardinality was determined by the following
strategy: "In general, the optimizer estimates the number of
distinct combinations of N columns by multiplying the individual
num_distinct values, and then dividing by the square root of 2 (N-1)
times." (Jonathan Lewis: Cost Based Oracle Fundamentals, p. 388)
* taking a look at a CBO trace (event 10053) for both operations I see
that the the plan with statistics contains the following lines:
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T101[B]
...
Grouping column cardinality [ OWNER] 19
...
***************
Now joining: T101[B]#0
***************
NL Join
...
Grouping column cardinality [ OWNER] 1
* so the CBO has the correct cardinality for the single table, but for
the join the value drops to 1 - and I don't remember (or never knew)
how the optimizer calculates the number of distinct combinations in
this situation.
* the Grouping column cardinality are not in the plan with dynamic
sampling.

I know there are a lot of gaps in this list of observations... - maybe someone else can close them...

Regards

Martin Preiss


Am 30.04.2015 um 00:59 schrieb Ls Cheng:

Hi

I have noticed in a simple query between two tables when table have no statistics dynamic samping estimates good cardinality for hash group by however when statistics are gathered ithe CBO estimates one row for the hash gruop by operation.

This is a test case:

drop table t100;
drop table t101;

create table t100 as select * from dba_source where 1 = 0;
create table t101 as select * from dba_source where 1 = 0;

insert into t100 select * from dba_source;
delete t100 where owner = 'SYS' and rownum <49457;
insert into t101 select * from dba_source where owner = 'SYS' and rownum < 100001;

select a_owner, count(*)
from (
select a.owner a_owner, a.name <http://a.name>, b.name <http://b.name> b_name
from t100 a left outer join t101 b
on a.owner = b.owner
and a.name <http://a.name> = b.name <http://b.name>
and a.type = b.type
and a.line = b.line)
where b_name is null
group by a_owner;

no statistics gives this plan:

Plan hash value: 908433588

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 258K| 27M| | 2929 (1)| 00:00:36 |
*| 1 | HASH GROUP BY | | 258K| 27M| | 2929 (1)| 00:00:36 |*
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 258K| 27M| 6400K| 2915 (1)| 00:00:35 |
| 4 | TABLE ACCESS FULL | T101 | 97791 | 5252K| | 582 (1)| 00:00:07 |
| 5 | TABLE ACCESS FULL | T100 | 258K| 13M| | 1200 (1)| 00:00:15 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("B"."NAME" IS NULL)
3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
"A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))

Note
-----
- dynamic sampling used for this statement (level=2)

with statistics

Execution Plan
----------------------------------------------------------
Plan hash value: 908433588

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 68 | | 2575 (1)| 00:00:31 |
*| 1 | HASH GROUP BY | | 1 | 68 | | 2575 (1)| 00:00:31 |*
|* 2 | FILTER | | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 220K| 14M| 4304K| 2563 (1)| 00:00:31 |
| 4 | TABLE ACCESS FULL | T101 | 100K| 3125K| | 650 (1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | T100 | 220K| 7746K| | 1199 (1)| 00:00:15 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("B"."NAME" IS NULL)
3 - access("A"."LINE"="B"."LINE"(+) AND "A"."TYPE"="B"."TYPE"(+) AND
"A"."NAME"="B"."NAME"(+) AND "A"."OWNER"="B"."OWNER"(+))


This is 11.2.0.4 in Linux. Any lights?

Thanks



Other related posts: