RE: Tuning large aggregation query

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: "mohamed.houri@xxxxxxxxx" <mohamed.houri@xxxxxxxxx>
  • Date: Fri, 4 Apr 2014 14:17:29 +0000


That might be a side effect of using the "allstats last" option for 
dbms_xplan.display_cursor() - it's part of why the v$pq_tqstats and 
v$sql_monitor information would be more helpful.


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf 
of Mohamed Houri [mohamed.houri@xxxxxxxxx]
Sent: 04 April 2014 15:13
To: kennethnaim@xxxxxxxxx
Cc: thomas.kellerer@xxxxxxxxxx; ORACLE-L
Subject: Re: Tuning large aggregation query

I can see in the plan that there are several operations that have not been 
started at all

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name                
        | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT                                   |                     
        |      0 |        |      0 |00:00:00.01 |
|   1 |  MERGE                                            | TARGET_TABLE        
        |      0 |        |      0 |00:00:00.01 |
|   2 |   PX COORDINATOR                                  |                     
        |      0 |        |      0 |00:00:00.01 |
|   3 |    PX SEND QC (RANDOM)                            | :TQ10002            
        |      0 |   4061K|      0 |00:00:00.01 |
|   4 |     VIEW                                          |                     
        |      0 |        |      0 |00:00:00.01 |
|   5 |      NESTED LOOPS                                 |                     
        |      0 |   4061K|      0 |00:00:00.01 |
|   6 |       NESTED LOOPS                                |                     
        |      0 |   4061K|      0 |00:00:00.01 |
|   7 |        VIEW                                       |                     
        |      0 |   3900K|      0 |00:00:00.01 |
|   8 |         SORT GROUP BY                             |                     
        |      0 |   3900K|      0 |00:00:00.01 |
|   9 |          PX RECEIVE                               |                     
        |      0 |   3900K|      0 |00:00:00.01 |
|  10 |           PX SEND HASH                            | :TQ10001            
        |      0 |   3900K|      0 |00:00:00.01 |
|  11 |            NESTED LOOPS                           |                     
        |      1 |   3900K|   4135K|00:05:09.10 |
|  12 |             NESTED LOOPS                          |                     
        |      1 |   3900K|   4135K|00:02:26.62 |
|  13 |              BUFFER SORT                          |                     
        |      1 |        |    229K|00:01:29.17 |
|  14 |               PX RECEIVE                          |                     
        |      1 |        |    229K|00:01:28.21 |
|  15 |                PX SEND ROUND-ROBIN                | :TQ10000            
        |      0 |        |      0 |00:00:00.01 |
|  16 |                 TABLE ACCESS BY GLOBAL INDEX ROWID| BASE_TABLE          
        |      0 |   3900K|      0 |00:00:00.01 |
|  17 |                  INDEX FULL SCAN                  | PK_BASE_TABLE       
        |      0 |   3900K|      0 |00:00:00.01 |
|  18 |              PARTITION LIST ITERATOR              |                     
        |    229K|      1 |   4135K|00:00:48.82 |
|* 19 |               INDEX RANGE SCAN                    | IDX_SALES_DATA      
        |    229K|      1 |   4135K|00:00:36.71 |
|  20 |             TABLE ACCESS BY LOCAL INDEX ROWID     | SALES_DATA          
        |   4135K|      1 |   4135K|00:02:21.94 |
|  21 |        PARTITION LIST ITERATOR                    |                     
        |      0 |      1 |      0 |00:00:00.01 |
|* 22 |         INDEX RANGE SCAN                          | 
IDX_T1704_ADVSTOREITEM      |      0 |      1 |      0 |00:00:00.01 |
|* 23 |       TABLE ACCESS BY LOCAL INDEX ROWID           | TARGET_TABLE        
        |      0 |      1 |      0 |00:00:00.01 |
----------------------------------------------------------------------------------------------------------------------------------


All those operations signaled in this color have not been started. For the 
remaining operations, operation 20 seems to be the most consuming one.

Have you a plan with a predicate part?

What is the definition of the index IDX_SALES_DATA. The CBO is doing a wrong 
estimation when accessing this index which might lead to a wrong since it looks 
like the starting operation

Best regards
Mohamed Houri
www.hourim.wordpress.com<http://www.hourim.wordpress.com>



2014-04-04 16:02 GMT+02:00 Kenneth Naim 
<kennethnaim@xxxxxxxxx<mailto:kennethnaim@xxxxxxxxx>>:
What wait events are seeing when the query is running? How many columns are in 
each of the tables?

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx> 
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>] On 
Behalf Of Thomas Kellerer
Sent: Friday, April 04, 2014 9:12 AM
To: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Tuning large aggregation query

Hi,

for reporting purposes we are maintaining some denormalized, pre-aggregated 
tables in our OLTP database.

These queries are aggregating sales and other related data from several source 
tables into one aggregation table.

Our problem is that the aggregation of some 70 million rows into approx. 4 
million rows is quite slow.

As far as I can tell, most of the time is lost when doing the actual 
aggregation.
The join itself is quite fast, but aggregating 70 million rows needs a huge 
amount of memory (or temp-space)

I uploaded to the execution plan to pastebin to preserve formatting: 
http://pastebin.com/XfEvB3z9 (You need to disable word-wrapping though)

The basic structure of the statement looks like this:

select base_table.pk_col_1,
       base_table.pk_col_2,
       base_table.pk_col_3,
       sum(....),
       sum(...),
       min(...),
       .... and another approx. 100 columns ...
from base_table
  join sales_data on ...
group by base_table.pk_col_1,
         base_table.pk_col_2,
         base_table.pk_col_3

There is a 1:N relation between base_table and sales_data.

Sales_data essentially stores sales information for products on a per-day, sale 
and store level.
The goal of the aggregation is to sum up that daily information to a 
product/store level.

sales_data contains about 70 million rows, base_table contains about 4 million 
rows, so the result of the aggregation is again 4 million rows.
sales_data is partitioned by the region in which the stores are located (and 
that partitioning cannot be changed).

We applied the same partitioning scheme to base_table and target_table in the 
hope that this will make the parallel execution more efficient, but changing 
from unpartitioned to partitioned didn't improve the queries (but didn't make 
them worse either). We tried various indexes (in additioin to the PK index that 
is already used for the JOIN). But apparently for the type of query there isn't 
much an index can do.

The main time is spent doing the aggregation.
I confirmed that by only running the select statement that's inside the merge 
and that takes nearly as long as the merge itself.

We were able to improve the whole process a bit by using parallel execution, 
but the runtimes are still way too high for the "window" we have in the night 
to prepare the data. The target would be that a single merge like that runs in 
about 10 minutes, currently it's around 60minutes.

The test server runs Oracle 11.2.0.4 on a Windows 2008R2 server with 64GB RAM, 
a 8 spindle RAID 10 attached and 6 cores (12 CPUs).

I can provide more Oracle configuration parameters if needed.

The production server also runs Oracle 11.2.0.4 on Windows Server 2008R2, has 
32 cores (64CPUs), 192GB RAM But we will not be able to test this on the real 
production hardware - and the number of rows in production will probably be 4-5 
times bigger.


I am looking for ideas on how to improve the runtime of those statements.
Just some pointers on what things I could try out to either optimize the query 
itself or tune the Oracle configuration

Thanks
Thomas



--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l





--
Bien Respectueusement
Mohamed Houri

Other related posts: