SQL problem

  • From: Thomas Ranchon <thomas@xxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Tue, 6 Jan 2015 10:58:45 +0100

Hi list,

I'm trying to solve a SQL problem on a 11.2.0.4 DB, I've found a solution
but I'm not very happy with it.

Here is a simplified example of my problem.

I have a table with the following description :

CREATE TABLE etrn_tst(
    k1         NUMBER(2)
  , ens        NUMBER(5)
  , start_date DATE
  , end_date   DATE
  , priority   NUMBER(2)
  , PRIMARY KEY(k1, ens, start_date));

(In production the table hold between 20 million rows and 100 million rows)

With some test data to insert into the table :

INSERT
  INTO etrn_tst(k1, ens, start_date, end_date, priority)
SELECT 1, 123, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
'MM/DD/YYYY'), 9 FROM dual
 UNION ALL
SELECT 1, 456, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('06/01/2014',
'MM/DD/YYYY'), 7 FROM dual
 UNION ALL
SELECT 1, 456, TO_DATE('08/01/2014', 'MM/DD/YYYY'), TO_DATE('01/01/2015',
'MM/DD/YYYY'), 7 FROM dual
 UNION ALL
SELECT 1, 987, TO_DATE('04/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
'MM/DD/YYYY'), 4 FROM dual
 UNION ALL
SELECT 1, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('07/01/2014',
'MM/DD/YYYY'), 1 FROM dual
 UNION ALL
SELECT 2, 987, TO_DATE('01/01/2014', 'MM/DD/YYYY'), TO_DATE('09/01/2014',
'MM/DD/YYYY'), 4 FROM dual
 UNION ALL
SELECT 2, 212, TO_DATE('05/01/2014', 'MM/DD/YYYY'), TO_DATE('12/31/9999',
'MM/DD/YYYY'), 1 FROM dual;

I need to transform the first table to insert it into a new table with the
following description :
CREATE TABLE etrn_tst_result(
    k1         NUMBER(2)
  , start_date DATE
  , end_date   DATE
  , ens        NUMBER(5)
  , PRIMARY KEY(k1, start_date));

I need to build date period for k1, for each period find 'ens' with the
minimum priority and if consecutive period have the same 'ens' then I have
to merge them.
Period should not overlap for a given k1.
The problem would be quite easy if for a given k1 each ens had the same
period but that's not the case.

For my test data the result should be :

k1 | start_date |  end_date  | ens
1  | 01/01/2014 | 03/31/2014 | 456
1  | 04/01/2014 | 04/30/2014 | 987
1  | 05/01/2014 | 07/01/2014 | 212
1  | 07/02/2014 | 09/01/2014 | 987
1  | 09/02/2014 | 01/01/2015 | 456
1  | 01/02/2015 | 12/31/9999 | 123
2  | 01/01/2014 | 04/30/2014 | 987
2  | 05/01/2014 | 12/31/9999 | 212

Here is the query I used to find this result :
INSERT INTO etrn_tst_result
WITH
  tmp_perim_dt AS (
    SELECT k1
         , start_date
         , LEAD(start_date - 1, 1, TO_DATE('31/12/9999', 'DD/MM/YYYY'))
             OVER(PARTITION BY k1 ORDER BY start_date) AS end_date
      FROM (SELECT k1, start_date
              FROM etrn_tst
             UNION
            SELECT k1, end_date + 1
              FROM etrn_tst
             WHERE end_date != TO_DATE('31/12/9999', 'DD/MM/YYYY')))
SELECT k1
     , start_date
     , end_date
     , ens
  FROM (SELECT k1
             , MIN(start_date) AS start_date
             , MAX(end_date) AS end_date
             , ens
             , grp
          FROM (SELECT k1
                     , start_date
                     , end_date
                     , ens
                     , RANK()
                         OVER(PARTITION BY k1 ORDER BY start_date ASC)
                     - SUM(ind_same)
                         OVER(PARTITION BY k1 ORDER BY start_date ASC
                              ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW) AS grp
                  FROM (SELECT k1
                             , start_date
                             , end_date
                             , ens
                             , DECODE(DECODE(LAG(ens, 1) OVER(PARTITION BY
k1 ORDER BY start_date), ens, 0, 1) +
                                      DECODE(LAG(end_date, 1)
OVER(PARTITION BY k1 ORDER BY start_date), start_date - 1, 0, 1), 0, 1, 0)
AS ind_same
                          FROM (SELECT x.k1
                                     , x.start_date
                                     , x.end_date
                                     , MIN(y.ens)
                                         KEEP(DENSE_RANK FIRST ORDER BY
y.priority ASC) AS ens
                                  FROM tmp_perim_dt x
                                    INNER JOIN etrn_tst y ON y.k1 = x.k1
                                                            AND
y.start_date <= x.start_date
                                                            AND y.end_date
>= x.start_date
                                 GROUP
                                    BY x.k1
                                     , x.start_date
                                     , x.end_date)))
         GROUP
            BY k1
             , ens
             , grp)
 ORDER
    BY k1
     , start_date;

I'm trying to find a better solution to my problem.

Thanks,
Thomas

Other related posts: