Re: Wrong cardinality estimates if group by (and possibly a hash join)

  • From: Laimutis.Nedzinskas@xxxxxx
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Fri, 27 Apr 2012 12:58:32 +0300

>This really isn't an issue with the group by cardinality as the output of
the GBY


I pasted the test query and plan again at the end of the email.

Consider those two fragments, same tables, same number of rows, same
predicates:

select with materialize:
|*  3 |    HASH JOIN RIGHT SEMI    |                             |  1917K|
334M|       | 12038   (1)| 00:02:25 |
|   4 |     INDEX FAST FULL SCAN   | I_CL_ID                     |   109K|
640K|       |    69   (2)| 00:00:01 |
|   5 |     TABLE ACCESS FULL      | IP                          |  1917K|
323M|       | 11957   (1)| 00:02:24 |

select without materialize, i.e. transformations allowed:

|*  2 |   HASH JOIN             |         |    53 |   954 | 12341   (1)|
00:02:29 |
|   3 |    SORT UNIQUE          |         |   109K|   640K|    69   (2)|
00:00:01 |
|   4 |     INDEX FAST FULL SCAN| I_CL_ID |   109K|   640K|    69   (2)|
00:00:01 |
|   5 |    TABLE ACCESS FULL    | IP      |  1917K|    21M| 11902   (1)|
00:02:23 |


(btw, "Sort unique" seems to have no effect)

IMHO, the later HASH JOIN uses this (10053 trace) to estimate the
cardinality:

Join order[2]:  CL[CL]#1  IP[IP]#0
...
Outer table: Card: 3.00  Cost: 428.52  Resp: 428.52  Degree: 1  Bytes: 6
Inner table: IP  Alias: IP
...
Join Card:  52.63 = outer (3.00) * inner (1917174.00) * sel (9.1504e-06)
Join Card - Rounded: 53 Computed: 52.63

This sel (9.1504e-06) seems to be comming from this (density ~ 1/NDV):

Column (#1): CL_ID(NUMBER)
    AvgLen: 6.00 NDV: 109285 Nulls: 0 Density: 9.1504e-06 Min: -4 Max:
2109168


But where in the world from comes the  outer (3.00) ??

The CL table is:

Table Stats::
  Table: CL  Alias: CL
    #Rows: 109285  #Blks:  1824  AvgRowLen:  124.00
  Column (#1): CL_ID(NUMBER)
    AvgLen: 6.00 NDV: 109285 Nulls: 0 Density: 9.1504e-06 Min: -4 Max:
2109168
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255

value 3 is NDV of join predicate column IDV_R_F: 2 - access
("CL"."CL_ID"="IP"."IDV_R_F")

Table Stats::
  Table: IP  Alias: IP
    #Rows: 1917174  #Blks:  43757  AvgRowLen:  177.00
  Column (#9): IDV_R_F(NUMBER)
    AvgLen: 6.00 NDV: 3 Nulls: 0 Density: 2.5603e-07 Min: 0 Max: 2050594
    Histogram: Freq  #Bkts: 3  UncompBkts: 5104  EndPtVals: 3


Hmmmm... I am lost.


p.s.
If asked why I am so crazy about this whole cardinality estimate issue is
this: the case is only a part of a larger case. The full query has a third
large table joined. Guess what: Cardinality of merely 53 rows yields a plan
with NL index join of this large table... 1917K instead of 53 random table
reads make quite a difference... The correct plan (which happens to happen
in 10gr2 under some special circumstances) has correct estimate and full
scan/hash join of this large table.

My guess: the problem lies in view transformation and cardinality estimates
and GBY.


---------------------------------------------------------------------------------------------------------------------------------------------------------
The select used is (with or w/o hint):

explain plan for
with v as (select /*+materialize */*     FROM sebim.ip
   WHERE
EXISTS (SELECT 1
                   FROM classif.cl
                  WHERE cl.cl_id = ip.idv_r_f
                  )
)
SELECT   ip_id from v
group by  ip_id
/




---------------------------------------------------------------------------------

Please consider the environment before printing this e-mail


                                                                                
                                                                   
  From:       Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>                               
                                                                   
                                                                                
                                                                   
  To:         Laimutis.Nedzinskas@xxxxxx                                        
                                                                   
                                                                                
                                                                   
  Cc:         oracle-l@xxxxxxxxxxxxx                                            
                                                                   
                                                                                
                                                                   
  Date:       2012.04.26 17:42                                                  
                                                                   
                                                                                
                                                                   
  Subject:    *** SPAM ***  Re: Wrong cardinality estimates if group by (and 
possibly a hash join)                                                 
                                                                                
                                                                   





This really isn't an issue with the group by cardinality as the output of
the GBY isn't input into anything other than the final result set.
The difference in the plans has to do with the forced materialization and
the change of the join type.


On Thu, Apr 26, 2012 at 5:30 AM, <Laimutis.Nedzinskas@xxxxxx> wrote:.

      In Oracle 11.2.0.3 and 10.2.0.3 group by produces wrong
      cardinalities.  See
      the inner hash join cardinality bellow.

--
Regards,
Greg Rahn  |  blog  |  twitter  |  linkedin:


--
//www.freelists.org/webpage/oracle-l


Other related posts: