Re: cheapest way to access every 15th row in table ordered by timestamp column - solution without sort

  • From: "Jaromir D.B. Nemec" <jaromir@xxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>, <Chris.Stephens@xxxxxxx>
  • Date: Sun, 23 Jan 2011 20:58:32 +0100

Hello Chris,


Can anyone think of a more efficient way to do it than this:

SELECT x.col,
             x.t_stamp
     FROM ( SELECT col,
                                   t_stamp
                                  rownum rn
                        FROM t
                       WHERE t_stamp >= TRUNC( SYSDATE - 30 )) x
    WHERE MOD( x.rn, 15 ) = 0;


It feels like there should be a better way to do it.

One possibility to save cost is to avoid the sort at all. This will be obvious solution in case that the data would contain a sequence number. You could simple filter the rows in the where predicate:

Select * from T
where mod(seq_id,15) = 0;

Unfortunately this is not the case. But it should be possible to reconstruct such sequence id. The idea is to divide the time space in buckets of 2 seconds and to assign each timestamp the bucket number. This number is used to calculate the modulo.

Here a possible solution:

select
col, t_stamp,
((t_stamp - trunc(sysdate-30)) * (24*60*30) ) biseconds_from_start,
 mod(1 + round( (t_stamp - trunc(sysdate-30)) * (24*60*30) ),15) mod2
from t order by col
;

In the first step a number of bi-seconds from the starting point is calculated. This number is used to calculate the modulo. I added one to align with row_number starting with 1 too.

There are two caveats associated with this approach. First possibility is that a timestamp is assigned to a wrong bucket due to calculation inaccuracy - example: number of bi-seconds from start calculated is 7,99999999999999999999999999999999999999. I workarounded this using round in the example above. Other possibility is to use an offset to shift the timestamp in the middle of the interval:

mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) mod2

I shifted the intervals by one second - this works fine with my synthetic data with timestamps on 0,2,4,.. seconds.

The second potential problem is a slow shift of the timestamp caused by the clock inaccuracy - similar to a DBMS_JOB started each full hour with interval 60 minutes; after some time it starts minutes after the full hour. This could be corrected - if systematically occurring - with adjusting the coefficient of the number of intervals per day; e.g. (24*60*30 + 1) for watch running too fast.

Anyway the first thing to do is to check the concept with your data - something like the following select:

select * from (
select
 col, t_stamp,
 mod(row_number() over(order by t_stamp),15) mod1,
/* offset 1 second - adjust if required in interval <0,2) */ mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) mod2
from t order by col
)
where mod1 != mod2;

If the select returns no or acceptable small number of rows you may use this solution:

SELECT col, t_stamp
FROM t
WHERE t_stamp >= TRUNC( SYSDATE - 30 ) and
/* filter every 15th row using 2 seconds buckets with 1 sec. offset */
mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) = 1
;


The difference in performance depends of the volume of the data to be sorted.
See also http://www.db-nemec.com/SamplingTimeSeriesData.html

HTH

Jaromir D.B. Nemec

--
//www.freelists.org/webpage/oracle-l


Other related posts:

  • » Re: cheapest way to access every 15th row in table ordered by timestamp column - solution without sort - Jaromir D.B. Nemec