RE: need enhance feature for CBO :)

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: "'Ujang Jaenudin'" <ujang.jaenudin@xxxxxxxxx>, "'Michael Fontana'" <mfontana@xxxxxxxxxxx>
  • Date: Thu, 13 Nov 2008 06:21:45 -0500

This is just one more reason why folks shouldn't use small sample versions
to predict the performance of queries on large tables, especially
partitioned tables.

Please repeat your query timings on the development table using one
partition (just query the partition name directly).

From that you can compute the time of the overhead of the setup and results
reassembly on the "over 500 partitions."

I predict this time will be significant versus 14 million rows and
vanishingly small versus billions of rows.

If your production table was only 14 million rows, would you really even
consider having 500 partitions? But I believe it is true that Oracle does
*not* add in a setup cost to partition queries. Perhaps it should, but that
setup charge for your query is scalar against the number of partitions while
the dominating cost of your production query is bulk i/o with a 5 to 1
advantage for PVC_U1, so you would get different plans for the sample and
the production databases.

If you want the actual answer to the question of distinct seq values, you
have to visit all the seq values one way or the other. At least since you
are scanning a unique index you get

SORT UNIQUE NOSORT
    INDEX FULL SCAN INDEX(UNIQUE) PVC_U1.

so Oracle is being much more efficient in avoiding doing an actual SORT by
recognizing the SORT method as UNIQUE NOSORT.

Spawning parallel, as you observed just made overhead of the startup against
each partition worse, as well as the overhead of the reassembly of the
parallel pieces. Unless you have 2000+ CPUs I can't see how that would be to
your advantage against 500 paritions.

Of course your query might match quite well with Oracle's new exedata
machine technology.

I wonder what the speed difference would be if you force the issue by
generating the union ALL of the distinct query against each partition. Again
the setup cost on the small version would probably dominate. I'm interested
in the difference on the big IOT. Since your partition key is seq the
reassembly union ALL is legitimate without generating duplicates, which I'm
not sure the CBO can impute.

Now, if you want to settle for an estimate, write that query on the
dictionary yourself. I take it you would presume that you have at least one
of each seq value in each range partition. If you only have positive
integers as values of seq, that could be a useful estimate of your answer,
but you might include some values in the result that do not exist. That is
not an allowed result for a sql query, and Oracle has no way to know you
might want a slightly incorrect result set.

I wonder if your desired result is simply the enumeration of values less
than or equal to the maximum value you currently have. Since you have a
unique index with seq leading, getting the max value of seq should run
really fast, and you can spend your time figuring out the best way to
generate a list of values from 0 (or 1) through the max value. (That might
be perl [grinning in Jared's direction] if you count development cost). Is
that what you want?

Regards,

mwf

-----Original Message-----
From: Ujang Jaenudin [mailto:ujang.jaenudin@xxxxxxxxx] 
Sent: Thursday, November 13, 2008 12:01 AM
To: Michael Fontana
Cc: mwf@xxxxxxxx; jkstill@xxxxxxxxx; Oracle Discussion List
Subject: Re: need enhance feature for CBO :)

hi,

this table and its indexes never analyzed, due to huge rows (more than
billion rows).
of course PVC_U1 is smaller physical blocks rather than PVC_PK.

from development machine which has <14 million rows (through dba_segments):
PVC_PK = 1280 blocks per partitions
PVC_U1 = 256 blocks per partitions

in development:

select distinct seq from pvc;
using PVC_U1 with parallel process
-- > 5 mins never end :)

select /*+ noparallel_index(pvc) */ distinct seq from pvc;
using PVC_U1 with no parallel process
-- 3mins 31 sec

select /*+ index_ffs(pvc pvc_pk) */ distinct seq from pvc;
using PVC_PK with no parallel process
--46 sec

another parameter : we set dynamic_sampling to 2

-- 
thanks and regards
ujang | oracle dba
jakarta | http://ora62.wordpress.com

On Thu, Nov 13, 2008 at 5:10 AM, Michael Fontana <mfontana@xxxxxxxxxxx>
wrote:
> Perhaps this should not even be an IOT to begin with if the SEQ and ES
> columns guarantee uniqueness?
>
>
>
> Just make the unique key the primary.  That might just get you a little
> better performance.
>
>
>
> Perhaps also we can put this matter to bed if ujang will send us output
from
> the following query:
>
>
>
> Select * from dba_tables where table_name = 'PVC";
>
>
>
> That way we can validate statistics!?
>
>
>
>
>
>
>
>
>
>
>
>
>
> Jared asks an interesting question, but I'm even more interested in why
you
> think
>
>
>
> "I think scanning Primary key
>  more efficient rather than roundtrip read PVC_U1 index and then again
>  read PVC_PK index, due to IOT mapping..." ?
>
>
>
> Your query only needs the seq column. All the column seq values are in
> pvc_u1, which is smaller than the PK.
>
> So of course the CBO uses the smaller index. Since both SEQ and ES are in
> the PK, I can absolutely guarantee that pvc_u1 is smaller than the PK,
which
> contains them and additional columns.
>
>
>
> Now if you needed to look up values not contained in pvc_u1 (that is other
> columns than SEQ and ES) that are contained in the PK, or for that matter
> any non-overflow column in the case of an IOT, then your concern about
going
> back to the IOT would be valid. But the value you want IS in the index and
> that index is guaranteed to be the smaller object.
>
>
>
> Regards,
>
>
>
> mwf
>
>



--
//www.freelists.org/webpage/oracle-l


Other related posts: