On the previous estimate (NL one) I would check the high and low values
for the join key PERSON_SK.
Is this a growing value? What is the meaning behind table
REQ_27333_DX_ENCNTR?
Wonder if the time difference for the stats make a difference on the
join estimate.
Regards
Lothar
Am 11.07.2020 um 09:24 schrieb Jonathan Lewis:
Ram,
I see the plan did a broadcast - which was necessary for the best performance - but I was a little surprised to see that it didn't do a partition iterator for the parallelism on the big table, given that the degree was 8 and the number of partitions was 32 (viz: a multiple of 8). However, that probably wouldn't have made very much difference. The minimal improvement at the higher degree was probably because DOP 32 probably pushed you into queueing at the disc, it's counter-intuitive but you might find 16 (e.g.) is better than 32.
It looks as if someone, or something, may have gathered some stats since the nested loop run. The cardinality estimate for the hash join is now 15M when the estimate for the nested loop was 2.8M: if the stats hadn't changed the cardinality estimate wouldn't change; and a factor of 5 on the NL cost might have been enough to push it into a hash join automatically.
Regards
Jonathan Lewis
On Sat, Jul 11, 2020 at 2:14 AM Ram Raman <veeeraman@xxxxxxxxx <mailto:veeeraman@xxxxxxxxx>> wrote:
Jonathan, I hinted to use HASH join, and as you said it was lot
faster - returns in15 mins overall. Tried increasing the
parallelism upto 32, but I only gain 2 mins with higher degrees.
Here is the same document with the new statement towards the end:
https://drive.google.com/file/d/1hIUHbVfBxsvJz5zC3g5ZZGRg99MPz6p7/view?usp=sharing
Now I need to find out why it was not using the HASH in the first
place without hint and also see if I can make it even faster.
On Fri, Jul 10, 2020 at 4:57 PM Ram Raman <veeeraman@xxxxxxxxx
<mailto:veeeraman@xxxxxxxxx>> wrote:
Thanks Jonathan. There is no compression, but many
columns dont have values in them. I am bit puzzled by the
naming convention, but that is a different topic
What you said makes good sense. It doesnt make sense to access
them by nested loops. Here are the big table stats:
NUM_ROWS BLOCKS LAST_ANAL AVG_ROW_LEN
CHAIN_CNT
------------------------ ------------ --------- -----------
----------
1,738,075,641 40,228,648 24-JUN-20 371 0
Only one block size for the whole DB:
SQL> sho parameter db_block_size
NAME TYPE VALUE
------------------------------------
-------------------------------- ------------------------------
db_block_size integer 8192
Partitions stats for the big table:
PARTITION SUBPARTIT PARTITION_COUNT STATUS
--------- --------- --------------- --------
HASH NONE 32 VALID
I did some calculation myself. If the query has to retrieve
14M rows from the big table, given the average row length is
371 and hence rows per block is about 22, it has to visit 634K
blocks. Does not make sense to do nested loop join for that.
Let me try a HASH join hint
Thanks
Ram.