Re: SQL problem

  • From: Kim Berg Hansen <kibeha@xxxxxxxxx>
  • To: thomas@xxxxxxxxxxx
  • Date: Tue, 6 Jan 2015 11:32:56 +0100

Hi, Thomas

Why are you not happy with your solution?
I've been doing similar things and my solutions very much looks like your
solution - I think your solution looks good.

Is it the performance of the query when executed on 20-100 million rows
that is the problem?
If so, what's the access plan for the query on the big table?

I'm guessing a couple full table scans, hash join, and some window sort,
window buffer and window nosort.
If so, then the problem may be a lot of work on TEMP as it might not be
able to hold it all in memory?

If so, then perhaps an idea could be to split the job into smaller jobs on
intervals of k1, so that each sub-job the analytics can be performed all in
memory?

But I'm just guessing ;-)
Let us know your actual reasons for not being happy with the solution, then
it is easier to help...



Regards


Kim Berg Hansen

http://dspsd.blogspot.com
kibeha@xxxxxxxxx
@kibeha


On Tue, Jan 6, 2015 at 10:58 AM, Thomas Ranchon <thomas@xxxxxxxxxxx> wrote:

> Hi list,
>
> I'm trying to solve a SQL problem on a 11.2.0.4 DB, I've found a solution
> but I'm not very happy with it.
>
> Here is a simplified example of my problem.
>
> I have a table with the following description :
>
> CREATE TABLE etrn_tst(
>     k1         NUMBER(2)
>   , ens        NUMBER(5)
>   , start_date DATE
>   , end_date   DATE
>   , priority   NUMBER(2)
>   , PRIMARY KEY(k1, ens, start_date));
>
> (In production the table hold between 20 million rows and 100 million rows)
>
> With some test data to insert into the table :
>
> INSERT
>   INTO etrn_tst(k1, ens, start_date, end_date, priority)
> SELECT 1, 123, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
> 'MM/DD/YYYY'), 9 FROM dual
>  UNION ALL
> SELECT 1, 456, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('06/01/2014',
> 'MM/DD/YYYY'), 7 FROM dual
>  UNION ALL
> SELECT 1, 456, TO_DATE('08/01/2014', 'MM/DD/YYYY'), TO_DATE('01/01/2015',
> 'MM/DD/YYYY'), 7 FROM dual
>  UNION ALL
> SELECT 1, 987, TO_DATE('04/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
> 'MM/DD/YYYY'), 4 FROM dual
>  UNION ALL
> SELECT 1, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('07/01/2014',
> 'MM/DD/YYYY'), 1 FROM dual
>  UNION ALL
> SELECT 2, 987, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
> 'MM/DD/YYYY'), 4 FROM dual
>  UNION ALL
> SELECT 2, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
> 'MM/DD/YYYY'), 1 FROM dual;
>
> I need to transform the first table to insert it into a new table with the
> following description :
> CREATE TABLE etrn_tst_result(
>     k1         NUMBER(2)
>   , start_date DATE
>   , end_date   DATE
>   , ens        NUMBER(5)
>   , PRIMARY KEY(k1, start_date));
>
> I need to build date period for k1, for each period find 'ens' with the
> minimum priority and if consecutive period have the same 'ens' then I have
> to merge them.
> Period should not overlap for a given k1.
> The problem would be quite easy if for a given k1 each ens had the same
> period but that's not the case.
>
> For my test data the result should be :
>
> k1 | start_date |  end_date  | ens
> 1  | 01/01/2014 | 03/31/2014 | 456
> 1  | 04/01/2014 | 04/30/2014 | 987
> 1  | 05/01/2014 | 07/01/2014 | 212
> 1  | 07/02/2014 | 09/01/2014 | 987
> 1  | 09/02/2014 | 01/01/2015 | 456
> 1  | 01/02/2015 | 12/31/9999 | 123
> 2  | 01/01/2014 | 04/30/2014 | 987
> 2  | 05/01/2014 | 12/31/9999 | 212
>
> Here is the query I used to find this result :
> INSERT INTO etrn_tst_result
> WITH
>   tmp_perim_dt AS (
>     SELECT k1
>          , start_date
>          , LEAD(start_date - 1, 1, TO_DATE('31/12/9999', 'DD/MM/YYYY'))
>              OVER(PARTITION BY k1 ORDER BY start_date) AS end_date
>       FROM (SELECT k1, start_date
>               FROM etrn_tst
>              UNION
>             SELECT k1, end_date + 1
>               FROM etrn_tst
>              WHERE end_date != TO_DATE('31/12/9999', 'DD/MM/YYYY')))
> SELECT k1
>      , start_date
>      , end_date
>      , ens
>   FROM (SELECT k1
>              , MIN(start_date) AS start_date
>              , MAX(end_date) AS end_date
>              , ens
>              , grp
>           FROM (SELECT k1
>                      , start_date
>                      , end_date
>                      , ens
>                      , RANK()
>                          OVER(PARTITION BY k1 ORDER BY start_date ASC)
>                      - SUM(ind_same)
>                          OVER(PARTITION BY k1 ORDER BY start_date ASC
>                               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
> ROW) AS grp
>                   FROM (SELECT k1
>                              , start_date
>                              , end_date
>                              , ens
>                              , DECODE(DECODE(LAG(ens, 1) OVER(PARTITION BY
> k1 ORDER BY start_date), ens, 0, 1) +
>                                       DECODE(LAG(end_date, 1)
> OVER(PARTITION BY k1 ORDER BY start_date), start_date - 1, 0, 1), 0, 1, 0)
> AS ind_same
>                           FROM (SELECT x.k1
>                                      , x.start_date
>                                      , x.end_date
>                                      , MIN(y.ens)
>                                          KEEP(DENSE_RANK FIRST ORDER BY
> y.priority ASC) AS ens
>                                   FROM tmp_perim_dt x
>                                     INNER JOIN etrn_tst y ON y.k1 = x.k1
>                                                             AND
> y.start_date <= x.start_date
>                                                             AND y.end_date
> >= x.start_date
>                                  GROUP
>                                     BY x.k1
>                                      , x.start_date
>                                      , x.end_date)))
>          GROUP
>             BY k1
>              , ens
>              , grp)
>  ORDER
>     BY k1
>      , start_date;
>
> I'm trying to find a better solution to my problem.
>
> Thanks,
> Thomas
>

Other related posts: