PQ - Can a set of slave processes be prevented from starting a rowrource scan if their result is bound to be discarded?

From: Jure Bratina <jure.bratina@xxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 24 Nov 2015 12:58:39 +0100

Hi,

This scenario pertains to an 11.2.0.4 single instance database (it's not on
Exadata, but the behavior is the same when I tested it on Exadata). To keep
the question as short as possible, I create the necessary tables using the
CREATE TABLE commands from this Jonathan Lewis' blog post:
https://jonathanlewis.wordpress.com/2013/10/14/parallel-execution-2-2/ .
The testcase is below, however I'll first try to summarize and write the
question.
I have two tables, t1 (used as the build rowsource) and t4 (used as the
probe rowsource), and I join them with a parallel HASH JOIN where the slave
process set that scans t4 uses a HASH distribution. To illustrate the case,
I deliberately disabled Bloom filtering. The first set of slave processes
scans table t1, and because of the filtering condition in the WHERE clause,
it finds 0 rows. If there were any rows returned from t1, they would be
distributed to the second set of slave processes and this second set would
build an in-memory hash table. After that, the first slave set scans table
t4, gets 343000 rows, distributes them to the second slave set which
buffers them, and after it reads all of the data, it performs a hash join.

My question is: does anybody know why (i.e. what might be the reason) table
t4 gets scanned even though it's clear that after t1 is scanned, the hash
join will return no rows regardless of what is returned from t4? I have
come across this scenario on an actual 11.2.0.4 production database where
t4 was a large table partition and I'm curious if anybody could explain
this behaviour. If the scanning of the two tables took place
simultaneously, it would be understandable that it wouldn't be stopped, but
since at most two sets of parallel slaves can be working at the same time,
and in this case the processes which scan t1 also scan t4, that's clearly
not the reason. It seems that when using the HASH distribution, the second
rowsource gets scanned unconditionally and the query coordinator doesn't
stop the (unnecessary) scan of t4.

If instead of a HASH distribution I use a BROADCAST distribution for t4,
the first slave set scans t1 and sends the results to the second slave set
which has to scan and probe t4 on its own, and in this case it doesn't do
that. The scan of t4 is also avoided in a serial (non-parallel) execution
plan.

The testcase:

SQL> select /*+
2 parallel(t1 2)
3 parallel(t4 2)
4 leading(t1 t4)
5 pq_distribute(t4 hash hash)
6 no_px_join_filter(t4)
7 monitor
8 */
9 t1.*
10 from t1, t4
11 where t1.id = t4.id1
12 and t1.small_vc = 'X';

no rows selected

The execution plan (with misleading values for runtime statistics):

---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | TQ
|IN-OUT| PQ Distrib | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | |
| | | 0 |00:00:01.16 | 8 |
| 1 | PX COORDINATOR | | 1 | |
| | | 0 |00:00:01.16 | 8 |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 2450 | Q1,02 |
P->S | QC (RAND) | 0 |00:00:00.01 | 0 |
|* 3 | HASH JOIN BUFFERED | | 0 | 2450 | Q1,02 |
PCWP | | 0 |00:00:00.01 | 0 |
| 4 | PX RECEIVE | | 0 | 1 | Q1,02 |
PCWP | | 0 |00:00:00.01 | 0 |
| 5 | PX SEND HASH | :TQ10000 | 0 | 1 | Q1,00 |
P->P | HASH | 0 |00:00:00.01 | 0 |
| 6 | PX BLOCK ITERATOR | | 0 | 1 | Q1,00 |
PCWC | | 0 |00:00:00.01 | 0 |
|* 7 | TABLE ACCESS FULL| T1 | 0 | 1 | Q1,00 |
PCWP | | 0 |00:00:00.01 | 0 |
| 8 | PX RECEIVE | | 0 | 343K| Q1,02 |
PCWP | | 0 |00:00:00.01 | 0 |
| 9 | PX SEND HASH | :TQ10001 | 0 | 343K| Q1,01 |
P->P | HASH | 0 |00:00:00.01 | 0 |
| 10 | PX BLOCK ITERATOR | | 0 | 343K| Q1,01 |
PCWC | | 0 |00:00:00.01 | 0 |
|* 11 | TABLE ACCESS FULL| T4 | 0 | 343K| Q1,01 |
PCWP | | 0 |00:00:00.01 | 0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T1"."ID"="T4"."ID1")
7 - access(:Z>=:Z AND :Z<=:Z)
filter("T1"."SMALL_VC"='X')
11 - access(:Z>=:Z AND :Z<=:Z)

The output from v$pq_tqstat confirms that the first slave set (processes
P002 and P003) doesn't obtain any row when writing to the table queue 0
(i.e. scanning of T1), but gets 343000 rows when writing to table queue 1
(i.e. scanning of T4) and the second slave set also consumes all those rows:

DFO_NUMBER TQ_ID SERVER_TYP PROCES NUM_ROWS BYTES WAITS
TIMEOUTS
---------- ---------- ---------- ------ ---------- ---------- ----------
----------
1 0 Producer P003 0 48
0 0
P002 0 48
0 0

Consumer P001 0 48
88 35
P000 0 48
76 34

1 Producer P003 171912 688704
9 3
P002 171088 685408
8 2

Consumer P001 196000 785200
39 35
P000 147000 588912
39 35

2 Producer P000 0 24
0 0
P001 0 24
0 0

Consumer QC 0 48
2 0

I also generated a SQL Monitor report, but for brevity I won't include it
here. I hope the output from v$pq_tqstat is enough for illustrating the
testcase.

Thank you in advance for any comments.

Regards,
Jure Bratina

Follow-Ups:
- RE: PQ - Can a set of slave processes be prevented from starting a rowrource scan if their result is bound to be discarded?
  - From: Jonathan Lewis

PQ - Can a set of slave processes be prevented from starting a rowrource scan if their result is bound to be discarded?

Other related posts: