Mark,
I understood as Jonathan pointed the best strategy might be to flattened the
required data in one table and then fetch the pk or rowid with minimal sorting
overhead(with an index fully covering all the columns and same sorting order)
and then traverse the table data from those selected rowids.
The below strategy which you pointed, got bounced over my head. Do you mean
adding some additional predicate to the query to minimize the amount of
sorting? What are 'x' and 'y' here? Can you please explain a bit more.
If a small set of columns (like sometimes one column) can distinctly provide
the top n ordering, it may be useful to provide that additional predicate, as
an and x => y gathering up the data to present.
For slowly changing data and some “n” that it may even be useful to update a
table containing the top “m” rows where “m” is as least as big as the biggest
“n” your users will use and update that table if a new combination should be in
it.
Two situations:
case 1: IF your users are running top n reports repeatedly with selections on
less slowly changing data.
in addition to what JL and others already suggested, it MAY be cheaper to yank
out just the ordering columns, and then yank out the lowest value for each
column from those n rows and filter on greater or equal to the lowest value
seen in the top n rows for each column at its source. Notice you can’t just
take the lowest row if you’re doing all the columns as filters because (two
columns) Z A first row would get filtered out by the second column if the last
row of the top n was Y Y. You CAN take just greater than or equal to the first
column alone in the order by. (But I will repeat this is not valid for
hierarchical queries).
IF the top N is the result of the value of a group by aggregation this also
won’t help, because you can’t toss individual rows away for a value less than
the aggregate they become a part of.
case 2: IF your users are running top n reports repeatedly with selections on
slowly changing data where n may vary but never be over m.
with the same caveats as case 1, it may actually be useful in some data cases
to keep a single row table with the lowest value that should be included and
update that table if a transaction changes the low values to be included.
Lots of top-n reports are either hierarchical or aggregations and are unsuited
for these approaches. But quite a few top-n determinations are also quite
simple, where as few as one ordering column on something could be the gating
predicate and huge relational assemblies are tacked together only to be pruned
down at the end. IF that gating value (or set of values) can be easily looked
up or kept updated, then expensive reports can become quick.
IF you are operating on a frozen standby clone (such as when a daily, weekly,
monthly, quarterly, or annual trigger event is used to keep analysis on a
static as was basis, aggregations and filtering aggregations are much more
frequently useful to instantiate and use. Denormalizations are quite often
perishable, but they are not when you are operating on a frozen set of
underlying data.
One of my most favorite customers froze a copy of a standby clone weekly after
the process “generate receivables” was complete Friday night. Many aggregations
were then performed on that frozen set of detail data the rest of the weekend
and used for analysis the following week.
Good luck,
mwf
<snip>