Re: theory of rewriting a query

  • From: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
  • To: Joel.Patterson@xxxxxxxxxxx
  • Date: Tue, 12 May 2009 22:03:11 +0200

Joel,

   What really kills in this type of quey is the fact that the
subqueries are dependent on each other  - for instance the second
subquery depends on d.effdt that is determined by the first subquery.
You have the same type of unsound relationship between subqueries 3 and
4. In fact, you want, so to speak, a "greater value of greater value "
(in that case the greatest effseq for the greatest effdt), which usually
calls for analytic functions as Ken Naim suggested.

If you take the subqueries in isolation:

 (SELECT max(d_ed.effdt)
               FROM ps_job d_ed
               WHERE d.emplid = d_ed.emplid
                 AND d.empl_rcd = d_ed.empl_rcd
                 AND d_ed.effdt <= a.pay_end_dt)

(SELECT max(d_es.effseq)
               FROM ps_job d_es
               WHERE d.emplid = d_es.emplid
                 AND d.empl_rcd = d_es.empl_rcd
                 AND d.effdt = d_es.effdt)

(SELECT max(e_ed.effdt)
               FROM ps_job e_ed
               WHERE e.emplid = e_ed.emplid
                 AND e.empl_rcd = e_ed.empl_rcd
                 AND e_ed.effdt <= a.pay_end_dt)

(SELECT max(e_es.effseq)
               FROM ps_job e_es
               WHERE e.emplid = e_es.emplid
                 AND e.empl_rcd = e_es.empl_rcd
                 AND e.effdt = e_es.effdt)

You see that they all refer to the same table, that emplid and empl_rcd
are the major link to the outer world, but that we are correlated to
different entities. You can easily condense two queries into one, for
instance:

 (SELECT max(d_ed.effdt)
               FROM ps_job d_ed
               WHERE d.emplid = d_ed.emplid
                 AND d.empl_rcd = d_ed.empl_rcd
                 AND d_ed.effdt <= a.pay_end_dt)

(SELECT max(d_es.effseq)
               FROM ps_job d_es
               WHERE d.emplid = d_es.emplid
                 AND d.empl_rcd = d_es.empl_rcd
                 AND d.effdt = d_es.effdt)

can become

select effdt, effseq
from (select j.effdt, j.effseq, rank() over (order by j.effdt desc,
j.effseq desc) rnk
         from ps_job j
         where j.emplid = d.emplid
             and j.empl_rcd = j.empl_rcd
             and j.effdt <= a.pay_end_dt)
where rnk = 1

(you can also get something similar with an order by inside the inline
view and a condition on rownum at the higher level, but an analytic
function will probably be better downstream - for instance, perhaps what
you'll really want ultimately is

select emplid, empl_rcd, effdt, effseq
from (select j.emplid, j.empl_rcd,
                    j.effdt, j.effseq,
                    rank() over (partition by j.eplid, j.empl_rcd,
                                      order by j.effdt desc, j.effseq
desc) rnk
         from ps_job j
         where j.effdt <= a.pay_end_dt)
where rnk = 1

)


At this stage you get two subqueries instead of four. Then you should
ask yourself whether you couldn't, at the outer level, refer to each
table once instead of twice, use an OR, multiple CASE ... END constructs
and a GROUP BY to bring everything in a single pass - at which point you
can perhaps reduce your two subqueries to one, partition by emplid and
empl_rcd and so on ...

A few hours of work if you want to do it properly, but I hope you get
the idea.

HTH

Stéphane Faroult



Joel.Patterson@xxxxxxxxxxx wrote:
> Just off the top of anyones head going more for the theoretical side,
> can anyone suggest a better way to write the below peoplesoft query.   I
> believe this uses a lot of temp tablespace, and something tells me the
> subqueries are an issue.
>
> Maybe not, but would you in general unravel the subqueries and do it
> another way?   Same type of querie I see a lot uses union all half a
> dozen times.
>
> Just wondering
>
>
>
> SELECT a.company, a.union_cd, d.location, a.emplid, a.name, to_char(
>        c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd, sum(a.hours_crw)
>     FROM ps_gl_transact_det a, ps_person c, ps_pers_srch_qry c1, ps_job
> d, 
>          ps_emplmt_srch_qry d1, ps_job e, ps_emplmt_srch_qry e1, 
>          ps_person_name f
>     WHERE c.emplid = c1.emplid
>       AND c1.rowsecclass = 'HRSALLU'
>       AND d.emplid = d1.emplid
>       AND d.empl_rcd = d1.empl_rcd
>       AND d1.rowsecclass = 'HRSALLU'
>       AND e.emplid = e1.emplid
>       AND e.empl_rcd = e1.empl_rcd
>       AND e1.rowsecclass = 'HRSALLU'
>       AND a.emplid = c.emplid
>       AND a.emplid = d.emplid
>       AND d.empl_rcd = a.empl_rcd
>       AND a.cmc_trans_type = 'C'
>       AND a.debit_credit = 'DB'
>       AND a.union_cd IN ('ADT', 'AD3', 'ADO')
>       AND a.erncd IN ('REG', 'OVR')
>       AND a.gl_acctng_period BETWEEN :1 AND :2
>       AND d.effdt = (SELECT max(d_ed.effdt)
>                FROM ps_job d_ed
>                WHERE d.emplid = d_ed.emplid
>                  AND d.empl_rcd = d_ed.empl_rcd
>                  AND d_ed.effdt <= a.pay_end_dt)
>       AND d.effseq = (SELECT max(d_es.effseq)
>                FROM ps_job d_es
>                WHERE d.emplid = d_es.emplid
>                  AND d.empl_rcd = d_es.empl_rcd
>                  AND d.effdt = d_es.effdt)
>       AND d.reports_to = e.position_nbr
>       AND e.effdt = (SELECT max(e_ed.effdt)
>                FROM ps_job e_ed
>                WHERE e.emplid = e_ed.emplid
>                  AND e.empl_rcd = e_ed.empl_rcd
>                  AND e_ed.effdt <= a.pay_end_dt)
>       AND e.effseq = (SELECT max(e_es.effseq)
>                FROM ps_job e_es
>                WHERE e.emplid = e_es.emplid
>                  AND e.empl_rcd = e_es.empl_rcd
>                  AND e.effdt = e_es.effdt)
>       AND e.emplid = f.emplid
>       AND e.empl_status = 'A'
>       AND f.name_type = 'PRI'
>     GROUP BY a.company, a.union_cd, d.location, a.emplid, a.name,
> to_char(
>              c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd
>     ORDER BY 2, 3, 7, 5
>
>  
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>
>   


--
//www.freelists.org/webpage/oracle-l


Other related posts: