RE: Curiosity: single-column index on sparse data cannot be built in parallel

  • From: <rajendra.pande@xxxxxxx>
  • To: <sacrophyte@xxxxxxxxx>, <jonathan@xxxxxxxxxxxxxxxxxx>
  • Date: Wed, 15 Jul 2015 13:31:43 -0400

Just a quick response based on another experience – check if resource manager
is in use and a plan is affecting what happens

Your 9-4 comment triggered that response







From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On
Behalf Of Charles Schultz
Sent: Wednesday, July 15, 2015 1:28 PM
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: Curiosity: single-column index on sparse data cannot be built in
parallel



Jonathan, your blog post looks quite lucid and helpful - I'll start to utilize
some of your observations in my ongoing testss. Thanks for wrapping it up in a
way for the rest of us to see.



Just an update on the little mystery. It seems when I create the index during
"off hours", the kernel allocates plenty of parallel slaves, but when I run
after 9:00 am and before 4:00 pm local time, I see "Parallel operations
downgraded 75 to 99 pct" (according to the AWR report). Correlating this to
v$pq_tqstat, I see 302 rows during "off hours" (runs in 5 minutes) but only 8
rows during working hours (4 Producers, 2 Consumers, 1 Ranger and 1 Final QC
Consumer), taking 90 minutes this morning.



It is possible this is pure coincedence. I will be doing more tests to see if
this pattern holds up. One other factor is that this database is currently on
ZFS (solaris) with a snapshot (copy on write), and the back-end SAN is shared
across our enterprise for all developement work. I just find it exceptionally
odd that this one index seems to be the sticking point.



Thanks for helping me think this out loud. :)



On Wed, Jul 15, 2015 at 6:29 AM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
wrote:




Charles

302 rows in v$pq_tqstat is right for your parallel 100. If you check the blog
that Mark referenced
(https://jonathanlewis.wordpress.com/2015/07/15/pq-index-anomaly/) you'll see I
ran parallel 4 but got 14 lines in the activity report: That's (2 x 4) + (1 *
4) + 1 (ranger) + 1 (final qc step).

It's the v$pq_slave where I saw too many rows - but that may not be seeable in
your case because you could have had a history of parallel execution which left
an arbitrary number of slaves from other activities still available for use.

I wonder if your comment about sparsity of data is on point. Perhaps there's a
special case where the RANGER (which is the query coordinator) finds no data
(or such a small amount of data) that it decides the query doesn't need to run
parallel. (This seems unlikely, since the size of the table should still
encourage it to do a parallel tablescan, but it's another possibility to bear
in mind.)


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle

________________________________

From: Charles Schultz [sacrophyte@xxxxxxxxx]
Sent: 15 July 2015 12:03


To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: Curiosity: single-column index on sparse data cannot be built in
parallel



Jonathan,



Thanks for the continued feedback, this is excellent information and insight. I
will be running more tests today, since my test late last night proved that
sometimes the index can indeed be built using parallel slaves. As you pointed
out, sometimes my query against v$pq_tqstat returns way "too many" rows - even
though I only ever request parallel 100, sometimes I get 302 rows returned. I
have not yet analyzed the rows (but I did save them in an output file).



And yes, we are licensed for AWR, and I have started to take snapshots between
each test to better isolate the stats. Once I reproduce the problem (that's
always fun, trying to get a bad run), I will take a closer look at those
"parallel downgraded" messages. Your observation about most slaves having
nothing to do does make sense, but I have a hard time explaining why it runs in
4 with parallel slaves and 50+ minutes with no apparent parallel slaves running.



More to follow later in the day. Thanks again.



On Tue, Jul 14, 2015 at 11:48 PM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
wrote:


Charles,

Apologies for the delay - I had to shut down for the day before your reply came
through.

I've written a blog note about the anomaly - planned to re-read it and polish
it a little this morning.

Your non-null data is 3% of your index, and a parallel build would mean RANGE
distribution of the data, so if your build ran anything less than parallel 33
(approx) the point would come where only one slave (in the second set) would be
doing anything because it would be the one that got all the real data and had
to sort and build. That's why you (I believe) you might see a point where only
one process seemed to be doing any work. There is still benefit in running
parallel, of course, since you have to scan and discard the 97% of the data
that is NULL and can't go into the index.

The anomaly is that when I examined v$pq_tqstat after creating my 310,000 row
index from my 10,000,000 row table (running parallel 4) the first set of slaves
had passed 10 million rows to the second set of slaves whiich means they must
have been passing all the (null, rowid) entries rather than discarding them.


Are you licensed to dig back into the ASH history ? Can you find out if the one
running process was a Pnnn process or a normal shadow process ? If it was a
normal shadow the problem might simply be that at the moment the create index
started all the PX slaves were involved in building other indexes. If you can
run an AWR report for the time around the start of build you might get a clue
from the statistics about "Parallel operations downgraded to ..."







Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle

________________________________

From: Charles Schultz [sacrophyte@xxxxxxxxx]
Sent: 15 July 2015 01:05
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: Curiosity: single-column index on sparse data cannot be built in
parallel

I didn't check $pq_tqstat, but I show results when I next do the test. I say it
was not running in parallel because only one session was active via Enterprise
Manager (yeah, I was being lazy, I know) and the creation took over 1 hour the
first test, 51 minutes the next test, whereas all my other indexes (mix of
local and non-partitioned) on the same table were taking less than 20 minutes
(some less than 10).



Jonathan, can you elaborate a little more on the "interesting anomaly"? :) It
is entirely possible I am hitting that, but would like to learn more.



As far as parallel 100, I was just trying to open the doors to see where my
bottlenecks are. The extent sizes are included in the index ddl (5m); again, I
don't have issues with the other indexes, but there is something about this
being on a single column, so you may be on the right track with large extents
and small volumes of data per slave. You are making me curious now. :) I love
the anthromorphic angle on "refusing to cooperate". *grin*



PS - in case it is not obvious, this is a vendor supplied table and index.

PS2 - tests currently executing on Sun T5440, for those that care about such
things.

PS3 - as to "why"; I got a lot of indexes to build in a short time, so I am
looking for the bounds. :)





On Tue, Jul 14, 2015 at 4:12 PM, Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
wrote:




I'm not going to try for 500M rows, but I just built a table with 10M rows with
your pattern of data and ran a parallel create index.

There is an interesting anomaly to the create index (11.2.0.4) that might make
the build look like a serial build - how are you showing that the build isn't
parallel ?
I queried v$pq_tqstat after the build.

Parallel 100 seems a little optimistic for such a "small" index - 17M rows at
about 13 bytes per row gives about 220MB or 2MB per slave to build. What's your
default extent size - perhaps there's something about large extents and small
volumes of data per slave that makes Oracle refuse to co-operate.


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf
of Charles Schultz [sacrophyte@xxxxxxxxx]
Sent: 14 July 2015 21:59
To: ORACLE-L
Subject: Curiosity: single-column index on sparse data cannot be built in
parallel

Good day,



I am trying to find the technical reason for why Oracle cannot use parallel
slaves to build a single-column index on a sparse column with few distinct
values:





F COUNT(*)

- ----------

538937561

Y 51464

N 17486819





Just by playing around, I discovered that if I put this column as the leading
edge on an index with many columns, it can be built in parallel.



According to the relevant documentation
<http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel007.htm#i1009131>
, I would expect that the base table is sampled, but perhaps the random
sampling returns 0 keys due to the sparse nature of the column?





Facts:

Oracle Enterprise Edition 11.2.0.4

table is partitioned, but not on this key (lol)

parallel_max_servers=3600



ddl extracted via datapump:

CREATE INDEX "FIMSMGR"."FGBTRND_ENCD_INDEX" ON "FIMSMGR"."FGBTRND"
("FGBTRND_DEFER_GRANT_IND")

PCTFREE 10 INITRANS 2 MAXTRANS 255

STORAGE(INITIAL 5242880 NEXT 5242880 MINEXTENTS 1 MAXEXTENTS 2147483645

PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1

BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)

TABLESPACE "FIN_LARGE_INDX" PARALLEL 100 ;



--

Charles Schultz







--

Charles Schultz







--

Charles Schultz







--

Charles Schultz

Please visit our website at
http://financialservicesinc.ubs.com/wealth/E-maildisclaimer.html
for important disclosures and information about our e-mail
policies. For your protection, please do not transmit orders
or instructions by e-mail or include account numbers, Social
Security numbers, credit card numbers, passwords, or other
personal information.

Other related posts: