Re: Need design suggestion , whether index organized table is good choice here

From: Lothar Flatz <l.flatz@xxxxxxxxxx>
To: oracle-l@xxxxxxxxxxxxx
Date: Mon, 31 Jan 2022 10:15:47 +0100

Guys,

seriously. I copy the Monitor of the that includes the FTS on REF_TAB in again. Maybe you see something that I don't.
In line 1 we see 52.94 of the total activity. We have 700 samples Cpu and 11+46 temp space samples.
It would seem to me that we have an CPU issue much more than a temp issue.
That is anything but a surprise. Hash join Cpu is often the weak spot of an Exa execution.
So how can we improve that plan? The point is that 115 M rows coming in from the intermediate result. That is not that much. But I have 2G rows from REF_TAB.
When combining big tables Hash Join is generally much more efficient than Nested Loop join. There are corner cases where it looks different, but we already tried Nested Loop and it proved much slower.

It it were possible to get a bloom filter on REF_TAB the storage cells could pre filter REF_TAB potentially retrieving a lot less that 2G rows which would shrink the join in Line 1.
I can not go into a research now about what stops and what provokes bloom filter from happening. Jonathan I believe has much more knowledge about that.

I would make this execution parallel. If it takes 25 minutes now, it can take less than 5 minutes parallel 16. Is that not good enough? Not sure what time you need.
Maybe we get a bloom filter when the execution is parallel too.
The other thing I could think of is materialize the Result of step 2 - 11. I would use a WITH .. /*+ Materialize */ for that. Maybe there is a bigger chance to get a bloom filter build when we enforce a scan over the intermediate result.
No guarantee for that, it is a mere guess. It could even be slower since writing 115M rows will take its time.

Thanks

Lothar

Global Information
------------------------------
Status                                 :  DONE (ALL ROWS)
Instance ID                            : 4
SQL Execution ID                       : 67108975
Execution Started                      :  01/27/2022 11:30:51
First Refresh Time                     :  01/27/2022 11:30:54
Last Refresh Time                      :  01/27/2022 11:55:18
Duration                               : 1467s
Fetch Calls                            : 22993

Global Stats
============================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Cluster | Other   | Fetch | Buffer | Read | Read | Write | Write | Cell   |
| Time(s) | Time(s) | Waits(s) | Waits(s)   | Waits(s)   | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes | Reqs | Bytes | Offload |
============================================================================================================================================
|    1097 |     983 |      108 |        0.30 |        0.00 | 0.64 |     5.69 | 22993 |    41M |   1M | 373GB | 837K | 96GB | 15.97% |
============================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2504035735)
===========================================================================================================================================================================================================================================
| Id |             Operation             | Name           | Rows   | Cost |   Time    | Start | Execs |   Rows   | Read | Read | Write | Write | Cell   | Mem | Temp | Activity |           Activity Detail            |
|    | |                          | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max) |   (%)    |             (# samples)              |
===========================================================================================================================================================================================================================================
| 0 | SELECT STATEMENT |                          |         |       |      1357 | +111 |     1 |     115M |       |       |       | |         |       |       |     8.46 | Cpu (121)                            |
| 1 |   HASH JOIN OUTER |                          |      1M |    8M |      1357 | +111 |     1 |     115M | 772K | 88GB | 772K | 88GB |         | 795M | 101G |    52.94 | Cpu (700)                            |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | direct path read temp (11)           |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | direct path write temp (46)          |
| 2 |    VIEW |                          |      1M |    2M |       129 | +111 |     1 |     115M |       |       |       | |         |       |       |     1.05 | Cpu (15)                             |
| 3 |     HASH JOIN OUTER |                          |      1M |    2M |       237 | +3 |     1 |     115M | 65196 |   7GB | 65196 |   7GB | | 821M |    9G |     8.88 | Cpu (78)                             |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | direct path read temp (3)            |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | direct path write temp (46)          |
| 4 |      HASH JOIN |                          |      1M | 765K |       101 | +3 |     1 |     115M |       |       |       |       | |    1M |       |     0.35 | Cpu (5)                              |
| 5 |       JOIN FILTER CREATE          | :BF0000                  |    5662 | 66073 |         1 |     +3 |     1 |     3339 |       |       |       |       | |       |       | |                                      |
| 6 |        TABLE ACCESS STORAGE FULL | STAGE_TAB                |    5662 | 66073 |         1 |     +3 |     1 |     3339 |       |       |       |       | |       |       | |                                      |
| 7 |       JOIN FILTER USE             | :BF0000                  |    307M | 698K |       101 |     +3 |     1 |     115M |       |       |       |       | |       |       | |                                      |
| 8 |        PARTITION RANGE SINGLE |                          |    307M | 698K |       101 | +3 |     1 |     115M |       |       |       |       | |       |       | |                                      |
| 9 |         TABLE ACCESS STORAGE FULL | TRAN_TAB                 |    307M | 698K |       102 |     +2 |     1 |     115M | 41288 | 34GB |       |       | 81.27% |    7M |       |     0.35 | Cpu (4)                              |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | cell smart table scan (1)            |
| 10 |      PARTITION RANGE SINGLE |                          |     18M | 933K |         9 | +103 |     1 |      19M |       |       |       | |         |       |       | |                                      |
| 11 |       TABLE ACCESS STORAGE FULL   | TRAN_TAB2                |     18M | 933K |         9 |   +103 |     1 |      19M | 72012 | 69GB |       |       | 98.54% |    7M |       |     0.07 | Cpu (1)                              |
| 12 |    TABLE ACCESS STORAGE FULL      | REF_TAB                  |      1G |    3M |       481 |   +238 |     1 |       2G | 261K | 158GB |       |       | 38.27% |    7M |       |     3.15 | enq: KO - fast object checkpoint (1) |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | Cpu (40)                             |
|    | |                          |         |       | |        |       |          |       |       |       | |         |       |       |          | cell smart table scan (4)            |
===========================================================================================================================================================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TRAN_TAB2"."ANBR"="REF_TAB"."COL1"(+))
   3 - access("TRAN_TAB"."PART_DATE1"="TRAN_TAB2"."PART_DATE1"(+) AND "TRAN_TAB"."TX_ID"="TRAN_TAB2"."TX_ID"(+))
   4 - access("TRAN_TAB"."SBID"="SFID")
   6 - storage("ASP_NM"=:B2 AND "PART_DATE"=TO_DATE(:B1,'DD-MON-YYYY'))   filter("ASP_NM"=:B2 AND "PART_DATE"=TO_DATE(:B1,'DD-MON-YYYY'))
   9 - storage("TRAN_TAB"."PART_DATE1"=TO_DATE(:B1,'DD-MON-YYYY') AND        SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID"))
       filter("TRAN_TAB"."PART_DATE1"=TO_DATE(:B1,'DD-MON-YYYY') AND       SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID"))
11 - storage("TRAN_TAB2"."P_CD"(+)='XX' AND "TRAN_TAB2"."PART_DATE1"(+)=TO_DATE(:B1,'DD-MON-YYYY') ) filter("TRAN_TAB2"."P_CD"(+)='XX' AND "TRAN_TAB2"."PART_DATE1"(+)=TO_DATE(:B1,'DD-MON-YYYY'))
Am 31.01.2022 um 09:25 schrieb Laurentiu Oprea:

Hello Pap,

Starting from your initial question: " how the access path to this table can be made faster? ", the questions are: how fast can you make the acces and by how much the query duration will improve.

-> if you look into the second sql monitor report accessing the table takes 46 seconds and represents 3.15% from the total activity. So do you really need to improve the access to this table? My opinion this is not the first thing you need to look into. Speaking of partitioning, a range partition strategy will determine the optimizer to use a partition range join-filter (based on a join filter create on tran_tab2 table) ...but I`m not sure you can have range partitioning for the data of the joined column .. This is something you might need to figure it out.

Taking a step back your biggest problem is the "HASH JOIN OUTER" from the line 1, and mainly due to swapping on temp. In my opinion the problem here is that the view is not merged thus the left side of the HASH join will contain all columns from projection perspective, making the hash table very big.

So question number 1: why the view is not merged? Maybe it is related to miss estimates (1M est vs 115M actual).
If you manage to obtain a plan where instead of doing the "HASH JOIN OUTER" from line 3 with the table TRAN_TAB2 do a hash join with the result of the HASH join between TRAN_TAB2 and REF_TAB you should no longer have the big temp swapping from line 1.

Thank you,
Laurentiu.

În dum., 30 ian. 2022 la 20:30, Pap <oracle.developer35@xxxxxxxxx> a scris:

    Actually it's an ETL job query and I believe it's moving those
    records to some downstream system based on the validation which
    it's doing on that ref_tab table (which holds the masked values of
    the COL1).

    Alo i tried with px_join_filter hints on ref_tab, the table still
    not showing bloom pruning in its plan. And yes we want to minimize
    the execution time so as to ensure the data is reaching the
    downstream system as fast as possible.

    So it seems , if we stick with the requirement to move those 50-60
    Million rows, the only better option would be to go by the full
    scan+hash join path on this table and make use of the necessary
    memory/PGA such that the tempspill can be minimized. Please
    correct me if my understanding is wrong here. Or should we try the
    covering index option as suggested by Sayan initially, maybe in
    addition with hash partitioning etc?

    On Sat, Jan 29, 2022 at 8:43 PM Andy Sayer <andysayer@xxxxxxxxx>
    wrote:

        Hi Pap,

        What are you planning on doing with all these rows? To put it
        bluntly, what’s the point in getting all these rows back as
        fast as possible, no one is going to read that many that fast.
        If it’s just a one off then you just need it to complete, right?

        Thanks,
        Andy

        On Sat, 29 Jan 2022 at 14:22, Pap
        <oracle.developer35@xxxxxxxxx> wrote:

            Thank You Lothar.
            So if I understand your point correctly, , do you mean to
            say the HASH join plan with full scan on table REF_TAB
            which I posted earlier, is the one it should go for?

            But as a majority of time it's spent on hash outer join
            and that to it spills to temp ~100GB+ , So as you
            mentioned, i was trying to run it parallel-2 but still it
            was consuming ~50GB+ temp and i killed it then. So I'm
            still wondering, what will be the ideal way to cater this
            key value lookup kind of table design situation,
            considering the table ref_tab can hold ~5billion rows in
            future i.e. ~4times the current volume(which is
            ~1.5billion rows). Or like Lok was saying any hash
            partitioning strategy will make the design or data fetch
            better here?

            On Sat, Jan 29, 2022 at 6:56 PM Lothar Flatz
            <l.flatz@xxxxxxxxxx> wrote:

                Hi Pap,

                I do discourage IOT.
                Andrew, my former boss at RWPG said me once that a
                index organized table which is bigger than half of the
                buffer cache can lead into serious trouble.
                Apart from that the nested loop join is a bad decision
                in that plan anyway.
                The estimate for line 4 is 166k rows, but the actual
                is 14M. That is an error factor > 80 times and that is
                serious.
                I am pretty sure if that estimate would be correct, we
                would not see an nested loop join.
                Correct would a hash join and you the query should be
                parallel too

                Thanks

                Lothar

                Am 28.01.2022 um 20:21 schrieb Pap:

                Hello Listers, It's a 11.2.0.4 oracle database. We
                have one table (say REF_TAB) which has four columns,
                two of them(i.e. COL1, masked_COL1) are actually
                holding business attributes and the other two are
                created_date and created_by_user columns. The length
                of those two business columns are varchar2(34 bytes)
                and varchar2(36 bytes) respectively. And both of
                these columns hold all unique/distinct values. Col1
                is the one on which the primary key is defined. The
                table currently has ~1.5billion rows in it and its
                size is ~160GB. It is estimated to grow to hold
                ~5-8billion rows.

                The table is always being queried on a filter/join on
                COL1 and that too as an outer join and will fetch the
                value of masked_col1. So in short the requirement is
                to scan the full table data based on outer join on
                column col1 like ( TRAN_TAB2.ANBR=REF_TAB.COL1 (+) ).
                And below is a sample query sql monitor showing even
                that table is getting access using primary key index
                but still is consuming all the time(sql_plan_line_id
                13 and 14 below). As sql monitor shows , Is those
                1hr+ DB time for the 14 million times index
                unique scan justified? And making it to go for full
                using hints is chewing up ~100gb+ temp space too. I
                can't not think of any partitioning strategy which
                can help us here to get the table access faster, as
                we have to look up whole column data here for
                incoming value of COL1.

                So want to understand how the access path to this
                table can be made faster? Will index organized tables
                be suggested in such a scenario and will help us in
                this kind of requirement? Or any other design
                strategy required here?

                Global Information
                ------------------------------
                 Status     :  EXECUTING
                 Instance ID    :  4
                 SQL Execution ID     :  67108973
                 Execution Started    :  01/27/2022 07:43:01
                 First Refresh Time     :  01/27/2022 07:43:05
                 Last Refresh Time    :  01/27/2022 09:04:19
                 Duration     :  4879s
                 Fetch Calls    :  2782

                Global Stats

=================================================================================================================
                | Elapsed |   Cpu   |    IO    | Application |
                Concurrency | Cluster  | Fetch | Buffer | Read | Read
                 |  Cell   |
                | Time(s) | Time(s) | Waits(s) |  Waits(s)   |
                 Waits(s)   | Waits(s) | Calls |  Gets  | Reqs |
                Bytes | Offload |

=================================================================================================================
                |    5469 |    2065 |     2464 |    0.01 |
                 0.00 |      940 |  2782 |   124M |  24M | 202GB |
                9.09% |

=================================================================================================================


==============================================================================================================================================================================================================================
                | Id    |                Operation              |
                      Name     |  Rows   | Cost  |   Time    | Start
                 | Execs |   Rows   | Read  | Read  |  Cell   | Mem |
                Activity |           Activity Detail    |
                |       |               |      | (Estim) |       |
                Active(s) | Active |       | (Actual) | Reqs  | Bytes
                | Offload |     |   (%)    |             (# samples)    |

==============================================================================================================================================================================================================================
                |     0 | SELECT STATEMENT              |    |
                  |       |      4875 |   +4 |     1 |      14M |
                  |     |         |     |     1.11 | Cpu (53) |
                |     1 |   NESTED LOOPS OUTER              |    |
                 166K |    1M |      4875 |   +4 |     1 |      14M |
                      |     |         |     |     0.04 | Cpu (2)  |
                |     2 |    VIEW               |      |    166K |
                 901K |      4875 |     +4 |     1 |      14M |
                |       |         |     |     0.40 | Cpu (19)     |
                |     3 |     NESTED LOOPS OUTER              |    |
                   166K |  901K |      4875 |   +4 |     1 |      14M
                |       |     |         |     |     0.04 | Cpu (2)  |
                |     4 |      HASH JOIN              |    |    166K
                |  402K |      4875 |   +4 |     1 |      14M |
                |     |         |  1M |     0.29 | Cpu (14) |
                |     5 |       JOIN FILTER CREATE              |
                :BF0000    |    1836 | 66073 |         1 |   +4 |
                1 |     1890 |       |     |         |     |
                 |  |
                |     6 |        TABLE ACCESS STORAGE FULL        |
                STAGE_TAB            |    1836 | 66073 |   5 |     +0
                |     1 |     1890 |     |       |         |     |
                0.02 | gc cr multi block request (1)            |
                |     7 |       JOIN FILTER USE               |
                :BF0000      |     48M |  336K |      4875 |     +4 |
                    1 |      15M |       |       |         |     |
                  0.13 | Cpu (6)      |
                |     8 |        PARTITION RANGE SINGLE           |
                     |     48M |  336K |      4875 |     +4 |     1 |
                     15M | |       |         |     |     0.04 | Cpu
                (2)      |
                |     9 |         TABLE ACCESS STORAGE FULL       |
                TRAN_TAB           |     48M |  336K |  4878 |     +1
                |     1 |      15M | 11748 |  11GB |  92.40% |  7M |
                0.25 | Cpu (11)             |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | cell smart table scan (1)
                     |
                |    10 |      PARTITION RANGE SINGLE             |
                       |       1 |     4 |  4875 |     +4 |   14M |
                    2M |       |       |         |     | 0.53 | Cpu
                (25)             |
                |    11 |       TABLE ACCESS BY LOCAL INDEX ROWID |
                TRAN_TAB2          |       1 |     4 |  4875 |     +4
                |   14M |       2M |  433K |   3GB |         |     |
                2.52 | gc cr grant 2-way (26)             |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | Cpu (57)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gcs drm freeze in enter
                server mode (2)  |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | cell single block physical
                read (35)     |
                |    12 |        INDEX RANGE SCAN               |
                TRAN_TAB2_IX1      |       1 |     3 |      4875 |
                  +4 |   14M |       2M |  271K |   2GB |         |
                  |     2.40 | gc cr grant 2-way (18)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | Cpu (63)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | cell single block physical
                read (33)     |
                |    13 |    TABLE ACCESS BY INDEX ROWID          |
                REF_TAB      |       1 |     3 |      4877 |     +2 |
                  14M |      13M |   11M |  88GB |         |     |
                 34.01 | gc cr block 2-way (1)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr disk read (6)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | Cpu (397)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | cell single block physical
                read (1214)   |
                | -> 14 |     INDEX UNIQUE SCAN                 |
                REF_TAB_PK       |       1 |     2 |      4876 |
                +4 |   14M |      13M |   12M |  89GB |         |
                |    57.09 | gc cr block 2-way (154)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr block 3-way (14)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr block busy (520)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr disk read (30)     |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr failure (1)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc cr grant 2-way (106)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc current block 2-way (4)
                    |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc current grant 2-way (3)
                    |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gc remaster (2)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | Cpu (754)      |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | gcs drm freeze in enter
                server mode (6)  |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | latch: object queue header
                operation (1) |
                |       |               |      |         |       |
                        |        |       |          |       |       |
                        |     |          | cell single block physical
                read (1121)   |

==============================================================================================================================================================================================================================
                Predicate Information (identified by operation id):
                ---------------------------------------------------
                   4 - access("TRAN_TAB"."SBID"="SFID")
                   6 - storage(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
                filter(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
                   9 - storage(("TRAN_TAB"."PART_DATE1"=:B1 AND
                SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
                 filter(("TRAN_TAB"."PART_DATE1"=:B1 AND
                SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
                  11 - filter("TRAN_TAB2"."PART_DATE1"=:B1)
                  12 - access("TRAN_TAB"."TX_ID"="TRAN_TAB2"."TX_ID"
                AND "TRAN_TAB2"."P_CD"='XX')
                  14 - access("TRAN_TAB2"."ANBR"="REF_TAB"."COL1")

References:
- Need design suggestion , whether index organized table is good choice here
  - From: Pap
- Re: Need design suggestion , whether index organized table is good choice here
  - From: Lothar Flatz
- Re: Need design suggestion , whether index organized table is good choice here
  - From: Pap
- Re: Need design suggestion , whether index organized table is good choice here
  - From: Andy Sayer
- Re: Need design suggestion , whether index organized table is good choice here
  - From: Pap
- Re: Need design suggestion , whether index organized table is good choice here
  - From: Laurentiu Oprea

Re: Need design suggestion , whether index organized table is good choice here

Other related posts: