Re: parallel recovery slaves waiting on undo reads

From: Frits Hoogland <frits.hoogland@xxxxxxxxx>
To: Noveljic Nenad <nenad.noveljic@xxxxxxxxxxxx>
Date: Sat, 29 Feb 2020 12:35:29 +0100

Sadly, wait event timing can be influenced by database parameters, and recently
I found that multi tenant changed the way the timing was done too.

Wait events typically (but not always!) time system call(s), for which the wait
event time sometimes is an indication of performance in another layer in the
application stack.
I often use wait events to talk to for example storage admins about
performance. Therefore, it’s critical that the wait event timings do correlate
with timings of the admins of the other layer, so we can work together. This is
one of the most important reasons I study wait events to the level that I do;
so I understand what the timing incorporates, and therefore can explain that to
for example the storage admin.

Last time I checked, the db file parallel read wait event timing for
asynchronous IO looked like this:
1. io_submit (multiple IOs via an iocb struct, see:
http://man7.org/linux/man-pages/man2/io_submit.2.html)
2. start wait event
3. io_getevents (blocking; wait for all IOs to finish)
4. end wait event
So not the total IO time is timed (although a very little part of it isn’t),
and indeed if you disregard the small part that isn’t timed, it’s the timing of
the slowest IO of all IOs that this wait event shows.

For synchronous IO, preadv() performs the same function, but with submission
and waiting combined. All the IOs are submitted via an iovec
(https://linux.die.net/man/2/preadv) using a single system call. The timing of
this system call is obvious:
1. start wait event
2. preadv
3. end wait event
I can’t find a definitive source that tells me how preadv is implemented on
linux. I would assume that linux is prepared for modern IO and does not assume
it’s operating on a single disk and therefore performing the different IOs
serially, but as I said, I would love to be pointed to the kernel source where
the vector read is performed to validate it being serial or parallel.
So for pread, I hope it’s the maximum time of the slowest IO, but it could be
the sum of all individual IO times (serial).

I recently studied log file parallel write (again) for a conference is poland.
Much to my surprise I found that the log file parallel write timing was done in
the following way
1. io_submit
2. io_getevents (non-blocking; if all IOs are found goto 6)
3. start wait event
4. io_getevents (blocking)
5. end wait event
6. done
In other words: if the IO subsystem is fast enough, the wait event does not
occur at all.
This is consistent with what I found years ago with Oracle’s asynchronous
direct path read implementation.

However, this was with multi-tenancy turned on. With it turned off, the timing
became:
1. start wait event
2. io_submit
3. io_getevents
4. end wait event

I am surprised that multi-tenancy has this massive change in timing
implementation. Of course the wait event timing (the latency) does not change
that much, and both essentially give the IO time of the longest taking IO.
In the light of new technologies like persistent writable memory (“pmem”) I can
see this making sense: if the IO is nearly instantaneous, assume it is, and
only start the time accounting (alias wait event) if it turns out it isn’t.

Frits Hoogland

http://fritshoogland.wordpress.com ;<http://fritshoogland.wordpress.com/>
frits.hoogland@xxxxxxxxx <mailto:frits.hoogland@xxxxxxxxx>
Mobile: +31 6 14180860

On 28 Feb 2020, at 12:00, Noveljic Nenad <nenad.noveljic@xxxxxxxxxxxx> wrote:

From a performance perspective, the problem with this wait event is that
the timing of the wait event has no absolute meaning: waiting for a single
IO is something different that waiting for let’s say 70 IO requests
submitted at the same time. p2 tells you the amount of oracle blocks, p3
the amount of requests.

Assuming that ‘db file parallel read’ for multiple blocks starts measuring
just before submitting IO operations and stops after the last wait completed,
do you find the following interpretation of "db file parallel read" wait time
correct when multiple blocks are involved:
Async IO: “db file parallel read” wait time = max(IO time)
Sync IO: “db file parallel read” wait time = sum(all IO times); (sum because
the reads are executed sequentially, I think)

Best regards,

Nenad

https://nenadnoveljic.com/blog ;<https://nenadnoveljic.com/blog>

From: Frits Hoogland <frits.hoogland@xxxxxxxxx
<mailto:frits.hoogland@xxxxxxxxx>>
Sent: Freitag, 28. Februar 2020 09:53
To: Noveljic Nenad <nenad.noveljic@xxxxxxxxxxxx
<mailto:nenad.noveljic@xxxxxxxxxxxx>>
Cc: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx
<mailto:jonathan@xxxxxxxxxxxxxxxxxx>>; oracle-l@xxxxxxxxxxxxx
<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Re: parallel recovery slaves waiting on undo reads

Not sure how relevant this is, because you are looking for the reason your
recovery worker processes do a lot of IO I assume.

Below is a description of the what happens when you see db file parallel read:

With newer oracle versions (12+), you’ll see plan lines indicating the word
‘BATCHED’. I believe it’s these points where oracle knows it has to read
multiple non-adjacent blocks that it is getting these all at once. However, I
read indications it might be happening outside of the ‘BATCHED’ lines, and is
implemented at any time it knows multiple non-adjacent blocks are needed,
which would have been read serially in the past.

This can be implemented on the OS level as asynchronous IO requests via the
regular asynchronous IO mechanism (io_submit-io_getevents), or uses a
synchronous version to submit a a batch of IO requests: preadv. The mechanism
of requesting multiple non-adjacent blocks has its own wait event: db file
parallel read, which is a reasonable accurate description of what it actually
does: it wants to read data from multiple places at the same time.

From a performance perspective, the problem with this wait event is that the
timing of the wait event has no absolute meaning: waiting for a single IO is
something different that waiting for let’s say 70 IO requests submitted at
the same time. p2 tells you the amount of oracle blocks, p3 the amount of
requests.

Frits Hoogland

http://fritshoogland.wordpress.com ;<http://fritshoogland.wordpress.com/>
frits.hoogland@xxxxxxxxx <mailto:frits.hoogland@xxxxxxxxx>
Mobile: +31 6 14180860

____________________________________________________
Please consider the environment before printing this e-mail.
Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.

Important Notice

This message is intended only for the individual named. It may contain
confidential or privileged information. If you are not the named addressee
you should in particular not disseminate, distribute, modify or copy this
e-mail. Please notify the sender immediately by e-mail, if you have received
this message by mistake and delete it from your system.
Without prejudice to any contractual agreements between you and us which
shall prevail in any case, we take it as your authorization to correspond
with you by e-mail if you send us messages by e-mail. However, we reserve the
right not to execute orders and instructions transmitted by e-mail at any
time and without further explanation.
E-mail transmission may not be secure or error-free as information could be
intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also
processing of incoming e-mails cannot be guaranteed. All liability of
Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively
referred to as "Vontobel Group") for any damages resulting from e-mail use is
excluded. You are advised that urgent and time sensitive messages should not
be sent by e-mail and if verification is required please request a printed
version.
Please note that all e-mail communications to and from the Vontobel Group are
subject to electronic storage and review by Vontobel Group. Unless stated to
the contrary and without prejudice to any contractual agreements between you
and Vontobel Group which shall prevail in any case, e-mail-communication is
for informational purposes only and is not intended as an offer or
solicitation for the purchase or sale of any financial instrument or as an
official confirmation of any transaction.
The legal basis for the processing of your personal data is the legitimate
interest to develop a commercial relationship with you, as well as your
consent to forward you commercial communications. You can exercise, at any
time and under the terms established under current regulation, your rights.
If you prefer not to receive any further communications, please contact your
client relationship manager if you are a client of Vontobel Group or notify
the sender. Please note for an exact reference to the affected group entity
the corporate e-mail signature. For further information about data privacy at
Vontobel Group please consult www.vontobel.com <https://www.vontobel.com/>.

Follow-Ups:
- Re: parallel recovery slaves waiting on undo reads
  - From: Mladen Gogala
- RE: parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad

References:
- parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad
- Re: parallel recovery slaves waiting on undo reads
  - From: Andy Sayer
- Re: parallel recovery slaves waiting on undo reads
  - From: Rich J
- RE: parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad
- RE: parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad
- Re: parallel recovery slaves waiting on undo reads
  - From: Jonathan Lewis
- Re: parallel recovery slaves waiting on undo reads
  - From: Jonathan Lewis
- RE: parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad
- Re: parallel recovery slaves waiting on undo reads
  - From: Frits Hoogland
- RE: parallel recovery slaves waiting on undo reads
  - From: Noveljic Nenad

Re: parallel recovery slaves waiting on undo reads

Other related posts: