Re: Processing new rows in staging table quickly...

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: 'ORACLE-L' <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 18 Apr 2019 10:38:22 +0000

===========
where b.access_key=v_access_key
       and b.bucket=v_bucket
       and b.stats_processed_time is null
===========

Assming v_access and v_bucket are bind variables then it would be perfectly
reasonable to query like this, using an index
(bucket,  access_key, stats_processed_time), the order of the first two columns
could be switched depending on other access requirements or (all other things
being equal) to put the most repetitive column first to maximise the scope for
index compression (the name "bucket" suggests many rows per value while the
name "access_key" suggests uniqueness - which is why I've chosen the order I
showed, and why I would also use "compress 1" as a starting guess.

If you don't want to support an index that is 100 times the size it needs to be
but do want the precision of selecting on bucket and access key for
non-processed rows then , a two-column function-based index (or index on two
virtual columns if Postgres supports that) would be best, e.g.

create index stg_fbi_1 on stage_table(
    case when stats_processed_time is null then bucket end,
    case when stats_processed_time is null then access_key end
);

query with the where clase:

where
    case when stats_processed_time is null then bucket end = v_bucket
and case when stats_processed_time is null then access_key end = v_access_key

you could also order by bucket and access key and avoid a sort by using

order by
    case when stats_processed_time is null then bucket end,
    case when stats_processed_time is null then access_key end

The SQL looks messy - which is why creating virtual columns is the more
desirable solution:
e.g.  alter table staging_table add (
        unprocessed_bucket generated always as case when stats_processed_time
is null then bucket end,
        unprocessed_access_key generated always as case when
stats_processed_time is null then access_key end
);

create index stg_fbi2 on staging_table(unprocssed_bucket,,
unprocessed_access_key);   -- compress 1 ?

Regards
Jonathan Lewis

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx> on behalf
of post.ethan@xxxxxxxxx <post.ethan@xxxxxxxxx>
Sent: 18 April 2019 00:31:37
To: 'ORACLE-L'
Subject: Processing new rows in staging table quickly...

I have a staging table in which 99% of the rows are going to have a date for
stats_processed_time.

I will need to only select for rows that have not been processed.

What is the best strategy here? Should I allow nulls and select like below? Or
should I put a date, way out in the future, say 2099, and select for that
instead and then update to correct data once things are processed? Perhaps
allow nulls and a bitmap or FBI solution (I would like to avoid solutions which
can’t be ported to Postgres.)

…
where b.access_key=v_access_key
       and b.bucket=v_bucket
       and b.stats_processed_time is null
--
//www.freelists.org/webpage/oracle-l

References:
- Processing new rows in staging table quickly...
  - From: post.ethan

Re: Processing new rows in staging table quickly...

Other related posts: