Re: Tuning large aggregation query

  • From: Sidney <huanshengchen@xxxxxxxxx>
  • To: "rgravens@xxxxxxxxx" <rgravens@xxxxxxxxx>
  • Date: Fri, 4 Apr 2014 21:31:30 +0800

have you try Workarea_size_policy to manual and increase the 
hash_area_size/sort_area_size. 

--
Sidney

> 在 2014年4月4日,下午9:17,Rumpi Gravenstein <rgravens@xxxxxxxxx> 写道:
> 
> Looks like a good candidate to run in parallel.
> 
> 
>> On Fri, Apr 4, 2014 at 9:12 AM, Thomas Kellerer <thomas.kellerer@xxxxxxxxxx> 
>> wrote:
>> Hi,
>> 
>> for reporting purposes we are maintaining some denormalized, pre-aggregated 
>> tables in our OLTP database.
>> 
>> These queries are aggregating sales and other related data from several 
>> source tables into one aggregation table.
>> 
>> Our problem is that the aggregation of some 70 million rows into approx. 4 
>> million rows is quite slow.
>> 
>> As far as I can tell, most of the time is lost when doing the actual 
>> aggregation.
>> The join itself is quite fast, but aggregating 70 million rows needs a huge 
>> amount of memory (or temp-space)
>> 
>> I uploaded to the execution plan to pastebin to preserve formatting: 
>> http://pastebin.com/XfEvB3z9
>> (You need to disable word-wrapping though)
>> 
>> The basic structure of the statement looks like this:
>> 
>> select base_table.pk_col_1,
>>        base_table.pk_col_2,
>>        base_table.pk_col_3,
>>        sum(....),
>>        sum(...),
>>        min(...),
>>        .... and another approx. 100 columns ...
>> from base_table
>>   join sales_data on ...
>> group by base_table.pk_col_1,
>>          base_table.pk_col_2,
>>          base_table.pk_col_3
>> 
>> There is a 1:N relation between base_table and sales_data.
>> 
>> Sales_data essentially stores sales information for products on a per-day, 
>> sale and store level.
>> The goal of the aggregation is to sum up that daily information to a 
>> product/store level.
>> 
>> sales_data contains about 70 million rows, base_table contains about 4 
>> million rows, so the result of the aggregation is again 4 million rows.
>> sales_data is partitioned by the region in which the stores are located (and 
>> that partitioning cannot be changed).
>> 
>> We applied the same partitioning scheme to base_table and target_table in 
>> the hope that this will make the parallel execution more efficient, but 
>> changing from unpartitioned to partitioned didn't improve the queries (but 
>> didn't make them worse either). We tried various indexes (in additioin to 
>> the PK index that is already used for the JOIN). But apparently for the type 
>> of query there isn't much an index can do.
>> 
>> The main time is spent doing the aggregation.
>> I confirmed that by only running the select statement that's inside the 
>> merge and that takes nearly as long as the merge itself.
>> 
>> We were able to improve the whole process a bit by using parallel execution, 
>> but the runtimes are still way too high for the "window" we have in the 
>> night to prepare the data. The target would be that a single merge like that 
>> runs in about 10 minutes, currently it's around 60minutes.
>> 
>> The test server runs Oracle 11.2.0.4 on a Windows 2008R2 server with 64GB 
>> RAM, a 8 spindle RAID 10 attached and 6 cores (12 CPUs).
>> 
>> I can provide more Oracle configuration parameters if needed.
>> 
>> The production server also runs Oracle 11.2.0.4 on Windows Server 2008R2, 
>> has 32 cores (64CPUs), 192GB RAM
>> But we will not be able to test this on the real production hardware - and 
>> the number of rows in production will probably be 4-5 times bigger.
>> 
>> 
>> I am looking for ideas on how to improve the runtime of those statements.
>> Just some pointers on what things I could try out to either optimize the 
>> query itself or tune the Oracle configuration
>> 
>> Thanks
>> Thomas
>> 
>> 
>> 
>> --
>> //www.freelists.org/webpage/oracle-l
> 
> 
> 
> -- 
> Rumpi Gravenstein

Other related posts: