Re: HASH Partitioning question

From: Tim Gorman <tim@xxxxxxxxx>
To: Deepak Sharma <sharmakdeep_oracle@xxxxxxxxx>, "dmarc-noreply@xxxxxxxxxxxxx" <dmarc-noreply@xxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 12 Feb 2015 21:09:29 -0700

Deepak,

One more question: when you say "/changing it to HASH partition/" fromthe present RANGE partitioning, do you mean changing it from RANGE toRANGE-HASH sub-partitioned, or just from RANGE to HASH, no sub-partitioning.

I think that the latter choice, changing from RANGE to HASH nosub-partitioning, is going to ruin your data loading scheme, if you'reusing the EXCHANGE PARTITION load technique. If you're not using theEXCHANGE PARTITION load technique and I suspect that might be the caseafter you state "/no batch window for loads/", then that may explain the6 TB of archive logs per day because you're unable to use direct-pathloads, but that's a whole other matter, isn't it?

Anyway, hopefully you're planning to change from RANGE to RANGE-HASHsub-partitioned, so that you can continue to load by date range andpurge by date range.

At any rate, the architecture of PX is relevant to this question. Thereis a single query-coordinator (QC) session and "N" parallel-executionworker sessions. The QC session serves as a collation point for all thedata returned from the worker sessions. So, parallel-execution queriescan work well when you're scanning a large volume of data, but onlyreturning a small number of rows. That is their primary use-case -throwing lots of CPU and memory at the problem at once.

Now what will happen if you have a parallel query that is scanning alarge volume of data and returning a large number of rows? The singleQC session will be overwhelmed with huge volumes of data being returnedfrom all of the PX worker sessions, and so queuing will result. Inother words, the QC becomes the bottleneck, and the total elapsed timeof the parallel query drops to match the capability of that single QCsession to return rows. In other words, not very fast. In thissituation, it is probably faster to dispense with parallel executionaltogether and just run a serial query.

So, if your proposed query is returning a large number (i.e. millions orbillions) of rows, then you're just plain doomed. PX won't help.Indexes and partitioning won't help. At that point, it is probable thatit isn't your query that needs to be tuned, but your application logic.After all, what use is millions or billions of rows unless they're allgoing to another table, in which case you should use INSERT ... SELECT.If you're just SELECTing them and then displaying them on a screen,nobody is ever going to look at them all, so what's the point? Likewiseif you're going to print them in a report.

However, if your proposed query is returning a small number of rowsafter scanning a huge number of rows, then you're either filtering oraggregating or both. If you're filtering, then enablingpartition-pruning or indexing can be your best bet. If you end up usingindexing for filtering, then parallel execution is not likely to workwell. If you enable partition-pruning (either by the RANGEpartition-key or the HASH sub-partition-key), then you can still do FULLtable scan with parallel execution, but now against a smaller volume ofdata, which will be faster.


Hopefully that helps?

Thanks!

-Tim


On 2/12/15 19:07, Deepak Sharma wrote:

It's a DW environment close to 150TB with almost 90% of tablespartitioned, generating 6TB archive logs a day (24x7 loading with nobatch window for loads).
On Thursday, February 12, 2015 8:02 PM, Deepak Sharma<dmarc-noreply@xxxxxxxxxxxxx> wrote:
Sure Tim (btw, I attended your DW seminar several years ago inMinneapolis and have implemented a few ideas that have really helped).
600gb table with 14 Billion rows. It's currently partitioned on aloading date column by range, and has a rolling window retention.Someone suggested changing it to HASH Partition on another column(lets say ABC) which is more frequently used, but may not have evendistribution (so Range would have skewed data). My guess is that inorder for the queries to use that new ABC HASH partitioned column (itcould be equi-join, In-List etc.), PX is needed.
Let me know if you need more details.

-Deepak


On Thursday, February 12, 2015 3:19 PM, Tim Gorman <tim@xxxxxxxxx> wrote:


C'mon, you need to give us more than that, please?
Query for one row? Query for 10 million rows? Aggregating? Notaggregating? Exadata or laptop or something in between? Oracle7?Oracle8i? Windows? Linux? Android?
On 2/12/15 13:55, Deepak Sharma (Redacted sendersharmakdeep_oracle@xxxxxxxxx <mailto:sharmakdeep_oracle@xxxxxxxxx> forDMARC) wrote:
Is it true that if a table (say 1 billion rows) is HASH Partitioned,then the most efficient way to query it needs to use Oracle parallelthreads ?

Follow-Ups:
- Re: HASH Partitioning question
  - From: Deepak Sharma

References:
- Re: HASH Partitioning question
  - From: Deepak Sharma
- Re: HASH Partitioning question
  - From: Deepak Sharma

Re: HASH Partitioning question

Other related posts: