RE: SQL question - most recent value for each group of data

  • From: <krish.hariharan@xxxxxxxxxxxx>
  • To: <Bruce.Reardon@xxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 28 Feb 2008 22:52:37 -0700

With the assertion that dts is not indexed I wonder if we can do it more
efficiently procedurally rather than relationally (SQL) to limit it to one
full scan of the table.

Regards,
-Krish
President/Executive Architect, Quasar Database Technologies, LLC
http://www.linkedin.com/in/quasardb

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Reardon, Bruce (RTABBAY)
Sent: Thursday, February 28, 2008 9:28 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: SQL question - most recent value for each group of data

I have a case where I need to return the most recent record for each
group of data and I'm interested in how to do most efficiently.

Versions - Oracle 8.1.7.4 on Windows 2003R2 SP2

Example data:
id      name    value   dts
1       Al      2.3     28-Feb-2008 8:33
1       Al      2.5     28-Feb-2008 7:21
1       Fe      3.2     27-Feb-2008 4:55
2       Al      3.1     27-Feb-2008 3:22

The required results are:
id      name    val_at_max_dts  res_count
1       Al      2.3                     2
1       Fe      3.2                     1
2       Al      3.1                     1

That is, for id=1, the most recent date is the 8:33 so the value
returned is the 2.3.
The res_count is how many result records there were for that id and
name.


Below I've got 2 queries that give the correct result - I'm interested
in other options that may perform better.
To help, the vast majority will have res_count = 1 and the max value
I've come across is 9.
I've found the analytical query below often performs better, but in
testing with extreme numbers of duplicates, the query with the derived
table performed better.

The SQL I've got to reproduce this is below:

drop table str;
create table str ( id integer , name varchar2(40) , val real , dts date
) ;
create index str_idx1 ON str ( id , name ) ;

-- add ~100,000 samples
insert into str select rownum , 'Al' , o1.object_id , sysdate-2 
from dba_objects o1 , dba_objects o2 
where o1.object_id < 100 and o2.object_id < 1000;

-- add 2nd + 3rd result to each
insert into str select rownum , 'Fe' , o1.object_id , sysdate 
from dba_objects o1 , dba_objects o2 
where o1.object_id < 100 and o2.object_id < 1000;

insert into str select rownum , 'pH' , 7 , sysdate 
from dba_objects o1 , dba_objects o2 
where o1.object_id < 100 and o2.object_id < 1000;

-- every 100th to have a second Al
insert into str select rownum * 100 , 'Al' , o1.object_id , sysdate -3 
from dba_objects o1 
where o1.object_id < 1000 ;

-- every 1000th to have third Al
insert into str select rownum * 1000 , 'Al' , o1.object_id , sysdate -2 
from dba_objects o1 
where o1.object_id < 100 ;

commit;

exec dbms_stats.gather_table_stats ( ownname => USER , tabname => 'STR'
, cascade => TRUE) ;

set autotrace traceonly

SELECT str1.id , str1.name , str1.val AS val_at_max_dts , smmry.max_dts
, smmry.res_count AS 
FROM str str1 , 
     ( SELECT str2.id , str2.name , MAX(str2.dts) AS max_dts , COUNT(*)
as res_count
       FROM str str2
       GROUP BY str2.id , str2.name
     ) smmry
WHERE str1.id = smmry.id
  AND str1.name = smmry.name
  AND str1.dts = smmry.max_dts
--  AND str1.id BETWEEN 1000 and 2100
ORDER BY
   str1.id , str1.name
;

SELECT * 
FROM
(SELECT str1.id , str1.name , str1.val AS val_at_max_dts , str1.dts , 
        RANK() OVER (PARTITION BY str1.id , str1.name ORDER BY str1.dts
DESC) AS rank_dts ,
        ROW_NUMBER() OVER (PARTITION BY str1.id , str1.name ORDER BY
str1.dts ASC) AS res_count
 FROM str str1
-- WHERE str1.id BETWEEN 1000 and 2100
 ORDER BY
   str1.id , str1.name
)
WHERE rank_dts = 1
;




In practise, there are 2 sets of tables 
Recent tables - table "s" with field id, table "t" as a child and the
final child "r" with the value and date field.
Repeated for another 3 tables with older data.
The dts field is not indexed in the "r" tables.


Thanks,
Bruce Reardon


NOTICE
This e-mail and any attachments are private and confidential and may contain
privileged information. If you are not an authorised recipient, the copying
or distribution of this e-mail and any attachments is prohibited and you
must not read, print or act in reliance on this e-mail or attachments.
This notice should not be removed.
--
//www.freelists.org/webpage/oracle-l



--
//www.freelists.org/webpage/oracle-l


Other related posts: