Re: Tuning large aggregation query

From: Thomas Kellerer <thomas.kellerer@xxxxxxxxxx>
To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 07 Apr 2014 10:24:00 +0200

Jonathan,

> How many regions, and how even is the distribution of sales data by region.

There are about 200 regions. The distribution is somewhat skewed. About 20 
regions make up 40% of the data. 
The rest is more or less evenly distributed

> Is the region part of the primary key of base_table

Yes and no. We don't have a primary key constraint defined, only a unique key 
with the region id as the leading column and the real PK as the other columns.
We found that this kind of index is in general better utilized by Oracle than a 
"real" PK.

Playing around with indexing and partitioning it seems that if I do _not_ 
partition BASE_TABLE things seem to be faster. 
But I have yet to confirm that. 

> and is there a foreign key defined between the base_table and the sales data

No, there is no FK between those tables. Each column of the PK references a 
different table (products, stores and the marketing campaign to which that 
product belongs)

> does it make sense to think in terms of a (partial) partition-wise join on 
> region, or does data-skew or 
> the effect of a primary key value relating to sales data that crosses regions 
> make this inappropriate ?

The sales data will never cross a region. Because essentially it is reported on 
store level and a store will 
always be in exactly one region - think of it as a kind of hierarchy: country 
-> region -> store


> Are all of the "other 100 columns" aggregates, or are some of them 
> non-aggregated columns that also appear in the group by clause ?

All of them are aggregates. Most of them aren't "straight" sums but the sum is 
done conditionally based on the sales date and 
other parameters that appear in base_table and the target table of the merge, 
using sum(case when ... end)

> Can you show us an execution plan that includes the distribution columns for 
> the parallel query; 
> and a report of v$pq_tqstat after running the query.

I have uploaded the output of dbms_sqltune.report_sql_monitor and the 
v$pq_tqstat to pastebin:

The plan from report_sql_monitor: http://pastebin.com/2UXhusyk
The contents of pg_tqstat: http://pastebin.com/Uwiu9QCC

Regards
Thomas


Jonathan Lewis, 04.04.2014 15:42:
> 
> 
> How many regions, and how even is the distribution of sales data by region.
> Is the region part of the primary key of base_table, and is there a foreign 
> key defined between the base_table and the sales data; 
> does it make sense to think in terms of a (partial) partition-wise join on 
> region, or does data-skew or the effect of a primary key value 
> relating to sales data that crosses regions make this inappropriate ?
> Are all of the "other 100 columns" aggregates, or are some of them 
> non-aggregated columns that also appear in the group by clause ?
> Can you show us an execution plan that includes the distribution columns for 
> the parallel query; and a report of v$pq_tqstat after running the query.
> 
--
//www.freelists.org/webpage/oracle-l

References:
- Tuning large aggregation query
  - From: Thomas Kellerer
- RE: Tuning large aggregation query
  - From: Jonathan Lewis

Re: Tuning large aggregation query

Other related posts: