Hello Chris,
Can anyone think of a more efficient way to do it than this: SELECT x.col, x.t_stamp FROM ( SELECT col, t_stamp rownum rn FROM t WHERE t_stamp >= TRUNC( SYSDATE - 30 )) x WHERE MOD( x.rn, 15 ) = 0; It feels like there should be a better way to do it.
One possibility to save cost is to avoid the sort at all. This will be obvious solution in case that the data would contain a sequence number. You could simple filter the rows in the where predicate:
Select * from T where mod(seq_id,15) = 0;Unfortunately this is not the case. But it should be possible to reconstruct such sequence id. The idea is to divide the time space in buckets of 2 seconds and to assign each timestamp the bucket number. This number is used to calculate the modulo.
Here a possible solution: select col, t_stamp, ((t_stamp - trunc(sysdate-30)) * (24*60*30) ) biseconds_from_start, mod(1 + round( (t_stamp - trunc(sysdate-30)) * (24*60*30) ),15) mod2 from t order by col ;In the first step a number of bi-seconds from the starting point is calculated. This number is used to calculate the modulo. I added one to align with row_number starting with 1 too.
There are two caveats associated with this approach. First possibility is that a timestamp is assigned to a wrong bucket due to calculation inaccuracy - example: number of bi-seconds from start calculated is 7,99999999999999999999999999999999999999. I workarounded this using round in the example above. Other possibility is to use an offset to shift the timestamp in the middle of the interval:
mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) mod2
I shifted the intervals by one second - this works fine with my synthetic data with timestamps on 0,2,4,.. seconds.
The second potential problem is a slow shift of the timestamp caused by the clock inaccuracy - similar to a DBMS_JOB started each full hour with interval 60 minutes; after some time it starts minutes after the full hour. This could be corrected - if systematically occurring - with adjusting the coefficient of the number of intervals per day; e.g. (24*60*30 + 1) for watch running too fast.
Anyway the first thing to do is to check the concept with your data - something like the following select:
select * from ( select col, t_stamp, mod(row_number() over(order by t_stamp),15) mod1,/* offset 1 second - adjust if required in interval <0,2) */ mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) mod2
from t order by col ) where mod1 != mod2;If the select returns no or acceptable small number of rows you may use this solution:
SELECT col, t_stamp FROM t WHERE t_stamp >= TRUNC( SYSDATE - 30 ) and /* filter every 15th row using 2 seconds buckets with 1 sec. offset */mod(1 + trunc((t_stamp - (trunc(sysdate-30) - 1/(24*60*60))) * (24*60*30) ),15) = 1
;The difference in performance depends of the volume of the data to be sorted.
See also http://www.db-nemec.com/SamplingTimeSeriesData.html HTH Jaromir D.B. Nemec -- //www.freelists.org/webpage/oracle-l