Re: HASH Partitioning question

  • From: Tim Gorman <tim@xxxxxxxxx>
  • To: Deepak Sharma <sharmakdeep_oracle@xxxxxxxxx>, "dmarc-noreply@xxxxxxxxxxxxx" <dmarc-noreply@xxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 12 Feb 2015 21:09:29 -0700

Deepak,

One more question: when you say "/changing it to HASH partition/" from the present RANGE partitioning, do you mean changing it from RANGE to RANGE-HASH sub-partitioned, or just from RANGE to HASH, no sub-partitioning.

I think that the latter choice, changing from RANGE to HASH no sub-partitioning, is going to ruin your data loading scheme, if you're using the EXCHANGE PARTITION load technique. If you're not using the EXCHANGE PARTITION load technique and I suspect that might be the case after you state "/no batch window for loads/", then that may explain the 6 TB of archive logs per day because you're unable to use direct-path loads, but that's a whole other matter, isn't it?

Anyway, hopefully you're planning to change from RANGE to RANGE-HASH sub-partitioned, so that you can continue to load by date range and purge by date range.

At any rate, the architecture of PX is relevant to this question. There is a single query-coordinator (QC) session and "N" parallel-execution worker sessions. The QC session serves as a collation point for all the data returned from the worker sessions. So, parallel-execution queries can work well when you're scanning a large volume of data, but only returning a small number of rows. That is their primary use-case - throwing lots of CPU and memory at the problem at once.

Now what will happen if you have a parallel query that is scanning a large volume of data and returning a large number of rows? The single QC session will be overwhelmed with huge volumes of data being returned from all of the PX worker sessions, and so queuing will result. In other words, the QC becomes the bottleneck, and the total elapsed time of the parallel query drops to match the capability of that single QC session to return rows. In other words, not very fast. In this situation, it is probably faster to dispense with parallel execution altogether and just run a serial query.

So, if your proposed query is returning a large number (i.e. millions or billions) of rows, then you're just plain doomed. PX won't help. Indexes and partitioning won't help. At that point, it is probable that it isn't your query that needs to be tuned, but your application logic. After all, what use is millions or billions of rows unless they're all going to another table, in which case you should use INSERT ... SELECT. If you're just SELECTing them and then displaying them on a screen, nobody is ever going to look at them all, so what's the point? Likewise if you're going to print them in a report.

However, if your proposed query is returning a small number of rows after scanning a huge number of rows, then you're either filtering or aggregating or both. If you're filtering, then enabling partition-pruning or indexing can be your best bet. If you end up using indexing for filtering, then parallel execution is not likely to work well. If you enable partition-pruning (either by the RANGE partition-key or the HASH sub-partition-key), then you can still do FULL table scan with parallel execution, but now against a smaller volume of data, which will be faster.

Hopefully that helps?

Thanks!

-Tim


On 2/12/15 19:07, Deepak Sharma wrote:
It's a DW environment close to 150TB with almost 90% of tables partitioned, generating 6TB archive logs a day (24x7 loading with no batch window for loads).


On Thursday, February 12, 2015 8:02 PM, Deepak Sharma <dmarc-noreply@xxxxxxxxxxxxx> wrote:


Sure Tim (btw, I attended your DW seminar several years ago in Minneapolis and have implemented a few ideas that have really helped).

600gb table with 14 Billion rows. It's currently partitioned on a loading date column by range, and has a rolling window retention. Someone suggested changing it to HASH Partition on another column (lets say ABC) which is more frequently used, but may not have even distribution (so Range would have skewed data). My guess is that in order for the queries to use that new ABC HASH partitioned column (it could be equi-join, In-List etc.), PX is needed.

Let me know if you need more details.

-Deepak


On Thursday, February 12, 2015 3:19 PM, Tim Gorman <tim@xxxxxxxxx> wrote:


C'mon, you need to give us more than that, please?

Query for one row? Query for 10 million rows? Aggregating? Not aggregating? Exadata or laptop or something in between? Oracle7? Oracle8i? Windows? Linux? Android?



On 2/12/15 13:55, Deepak Sharma (Redacted sender sharmakdeep_oracle@xxxxxxxxx <mailto:sharmakdeep_oracle@xxxxxxxxx> for DMARC) wrote:
Is it true that if a table (say 1 billion rows) is HASH Partitioned, then the most efficient way to query it needs to use Oracle parallel threads ?






Other related posts: