Re: cheapest way to access every 15th row in table ordered by timestamp column - solution without sort

From: "Jaromir D.B. Nemec" <jaromir@xxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>, <Chris.Stephens@xxxxxxx>
Date: Sun, 23 Jan 2011 20:58:32 +0100

Hello Chris,


Can anyone think of a more efficient way to do it than this:

SELECT x.col,
             x.t_stamp
     FROM ( SELECT col,
                                   t_stamp
                                  rownum rn
                        FROM t
                       WHERE t_stamp >= TRUNC( SYSDATE - 30 )) x
    WHERE MOD( x.rn, 15 ) = 0;


It feels like there should be a better way to do it.

One possibility to save cost is to avoid the sort at all. This will beobvious solution in case that the data would contain a sequence number. Youcould simple filter the rows in the where predicate:


Select * from T
where mod(seq_id,15) = 0;

Unfortunately this is not the case. But it should be possible to reconstructsuch sequence id. The idea is to divide the time space in buckets of 2seconds and to assign each timestamp the bucket number. This number is usedto calculate the modulo.


Here a possible solution:

select
col, t_stamp,
((t_stamp - trunc(sysdate-30)) * (24*60*30) ) biseconds_from_start,
 mod(1 + round( (t_stamp - trunc(sysdate-30)) * (24*60*30) ),15) mod2
from t order by col
;

In the first step a number of bi-seconds from the starting point iscalculated. This number is used to calculate the modulo. I added one toalign with row_number starting with 1 too.

There are two caveats associated with this approach. First possibility isthat a timestamp is assigned to a wrong bucket due to calculationinaccuracy - example: number of bi-seconds from start calculated is7,99999999999999999999999999999999999999. I workarounded this using round inthe example above. Other possibility is to use an offset to shift thetimestamp in the middle of the interval:

mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) *(24*60*30) ),15) mod2

I shifted the intervals by one second - this works fine with my syntheticdata with timestamps on 0,2,4,.. seconds.

The second potential problem is a slow shift of the timestamp caused by theclock inaccuracy - similar to a DBMS_JOB started each full hour withinterval 60 minutes; after some time it starts minutes after the full hour.This could be corrected - if systematically occurring - with adjusting thecoefficient of the number of intervals per day; e.g. (24*60*30 + 1) forwatch running too fast.

Anyway the first thing to do is to check the concept with your data -something like the following select:


select * from (
select
 col, t_stamp,
 mod(row_number() over(order by t_stamp),15) mod1,

/* offset 1 second - adjust ifrequired in interval <0,2) */mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) *(24*60*30) ),15) mod2

from t order by col
)
where mod1 != mod2;

If the select returns no or acceptable small number of rows you may use thissolution:


SELECT col, t_stamp
FROM t
WHERE t_stamp >= TRUNC( SYSDATE - 30 ) and
/* filter every 15th row using 2 seconds buckets with 1 sec. offset */

mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) *(24*60*30) ),15) = 1

The difference in performance depends of the volume of the data to besorted.

See also http://www.db-nemec.com/SamplingTimeSeriesData.html

HTH

Jaromir D.B. Nemec

--
//www.freelists.org/webpage/oracle-l

Re: cheapest way to access every 15th row in table ordered by timestamp column - solution without sort

Other related posts: