Think of it this way: *this* is the audience who will pull you or your
story to pieces, so you've already cleared that hurdle.
I didn't intend to cause stress, just letting you know that stories like
this are *valuable*. Anyone can find problems, fewer can present a
solution or two, fewer still can honestly describe what it takes to find
a solution.
I enjoyed your summary email, so I know I'd enjoy a longer treatise,
that's all.
On 9/18/18 10:02, Chris Taylor wrote:
Well, that's not intimating. Not at all :)
Talking to Oracle-L is one thing, putting it out for a wider audience? I'm not so sure about all that .
Chris
On Tue, Sep 18, 2018 at 9:41 AM Tim Gorman <tim.evdbt@xxxxxxxxx <mailto:tim.evdbt@xxxxxxxxx>> wrote:
Chris,
You have a gift for telling the story. This could make an amazing
newsletter article or a fantastic presentation that you'll enjoy
creating and presenting. The fun part of such an article or
presentation would be explaining the underlying technology, so
that the reader/audience enjoys the punchline as much as we have.
For an article in a newsletter, please consider RMOUG SQL>Update
<http://www.rmoug.org/newsletter/>, the NoCOUG Journal
<http://nocoug.org/newsletter.html>, IOUG Select
<http://select.ioug.org/>, or UKOUG Oracle Scene
<http://www.ukoug.org/membership-new/member-activities/oracle-scene/>?
There might be others I have overlooked, for which I apologize.
Hope this helps!
-Tim
On 9/18/18 08:22, Chris Taylor wrote:
So, I need to own up to something.
My whole approach to this problem was wrong. I had been told
that the problem with this table was the CLOB and the number of
sessions updating the table at the same time. So, my whole
approach was based on that information being TRUE.
I was blinded to the 'real' problem and wasn't actively looking
for an alternative answer until I posted this thread. Due to the
discussion here, I took a few steps back and actually asked
myself "why would this be row locks/blocks?" and WHY don't I see
internal locking on the clob if the clob is the problem?
At that point I was in 'question everything' mode and started
really looking at the Event that the blocking session was sitting
in (instead of just assuming it was a problem with design).
The blocking sessions were ALWAYS sitting in EVENT: SQL*Net MORE
DATA FROM CLIENT and WAIT_CLASS: NETWORK and the wait times were
reaching upward of 2 minutes before the wait_time_micro restarted
counting.
When I saw that (and really looked at it), I was like that makes
no sense. Why would a session be getting latency from the PHP
server? (Why is it taking so long to send an object from the PHP
server to the database?)
Running strace on the sessions on both the db server and the php
server showed the sessions just sitting in either read() or
write() to the network socket and never completing.
Large file transfers INTO the main db server would stall.
(Transfers OUT of the db server were fine) - this affected
SQLNet, SCP, ncat etc. Outgoing was fine, Inbound was terrible.
Ultimately we discovered these 2 settings had been DISABLED on
this box at some point (as its been around a while):
/proc/sys/net/ipv4/tcp_sack
/proc/sys/net/ipv4/tcp_timestamps
Both of those were "0" whereas all our other db servers those
were "1" and weren't experiencing any problem.
After setting both of those to "1" , our sessions now complete
and no longer block.
I know some of you may be thinking "How could he miss all the
network waits?" - And that's a good question. All I can say is
sometimes you get "blinded" by assumptions and aren't really
seeing the problem for what it is.
I've known this a long, long time. So, remember kiddos you may
have to question 'what you know versus what you think you know' ;)
Also, the very fact of sharing this problem with the you on the
list gave me the ability to step back and look at it with a new
mindset.
Anyway, I wanted to share that as a testament (and a huge thank
you) to the community we have here.
Thanks,
Chris
On Fri, Aug 31, 2018 at 11:48 AM Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx
<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:
If the event is "enq TX: row lock wait" the p1/p2/p3 values
will be about the TX lock in v$lock. p2 = id1, p3 = id2, and
p1 will encode "TX" and the lock mode.
Select p1raw (if its available) or
to_char(p1,'XXXXXXXXXXXXXXXX') and you'll get
00000000054580006 - or possibly a 4 on the end. If it's a
mode 4 lock then that's a clue that it's not really about a
table row and more likely to be about an index/IOT or some
internal anomaly.
If you've got the ASH data then current_obj# may give you the
object_id of the object being accessed, but it's not entirely
reliable.
Regards
Jonathan Lewis
________________________________________
From: Stefan Knecht <knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx>>
Sent: 31 August 2018 16:04:20
To: Jonathan Lewis
Cc: christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx>; ORACLE-L
Subject: Re: Looking for ideas on blocked sessions updating
partitioned table with CLOB
You can also look at the p1/p2/p3 values of the sessions that
are blocked by the TX contention - and see which object the
contention is on. If it's the lobindex, that may lead to
further clues about some of the possibilities raised by Jonathan
Stefan
On Fri, Aug 31, 2018 at 8:55 PM, Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx
<mailto:jonathan@xxxxxxxxxxxxxxxxxx><mailto:jonathan@xxxxxxxxxxxxxxxxxx
<mailto:jonathan@xxxxxxxxxxxxxxxxxx>>> wrote:
There is a major problem with basicfile LOBs (with similar,
but not so drastic symptoms appearing for securefile LOBs),
so as a first strategic step you should probably be planning
to find a painless way to move to securefile LOBs.
I've written an entire system about a problem with basicfile
LOBs when you do inserts and deletes - but the problem is the
same if you do lots of updates because for a LOB an update is
a delete followed by an insert (with the old LOB left in the
LOB segment rather than being copied to the UNDO segment).
The way LOBs handle deletes is that the LOB index is a
two-part index of which the first part indexes in time order
the chunks of LOBs that have been deleted and the second part
indexes by lobid the current LOB chunks.
A problem appears when you try to insert a LOB and there
isn't any free space in the segment, but there is plenty of
freeable space. Your session will free up all the freeable
space (deleting LOB index entries) as it goes. This can take
a very long time and while it's happening any other session
that want to insert a LOB will wait for your session to
finish freeing up the space.
Unfortunately I've only see HW enqueues appearing as a result
of this action, I don't think I've seen TX enqueues (and I
wouldn't really expect to see the "enq TX - row lock
contention" enqueue). THe series start at this URL:
http://jonathanlewis.wordpress.com/2017/01/26/basicfile-lobs/
Different possibility - is the LOB declared with multiple
freepools ? This is the default and it means the LOBindex has
only the two parts I describe above. If you have N freepools
then the index consistents of N pairs of parts. This doesn't
avoid the problem I've described above (your session will try
to free ALL the free space from ALL the pools if it needs
space), but having a single freepool may explain your
observations. If I insert a very large LOB PERHAPS you can't
insert one until I finish insert LOBIndex entries into my bit
of index, in which case maybe you'd show a row wait of some
sort - though, again I'd expect a different wait to appear
(maybe buffer busy).
Another thought - your LOBs enable in-row storage: how many
of them are short enough to fit in the row (a few, lots,
most) and do they get updated many times in situ, and are
many of them likely to start short and then grow a few times
before getting too long ? How long are the rest of the
columns in the row ? I'm thinking about the possibility that
your smaller LOBs spend some time growing in-row and causing
row migration before they get long enough to move into the
LOB segment - and row migraion does funny things with ITLs.
Maybe something odd happens as a row that's migrated moves
its LOB to the LOB segment and the row migrates back to it's
original block.
---
In the absence of answers - a plan for the future might be to
range partition by sess_id and hash subpartiiton by sess_id:
I'm assuming that sess_id is a value that's going to increase
with time when I say that. If that is the case then the
benefit of doing this composite partitioning is that on a
regular basis you get a new partition and all the action
moves into it and any garbage that's happened in the earlier
partition can be cleared up (or dropped).
If you stick with basicfiles then look at freepools just in
case it's relevant - securefile lobs automatically have a
better strategy for concurrency but there is a parameter you
can set of increase concurrency. (Can't remember which it is
at present).
Since I've mentioned the series - one feature of the
mechanism is that the LOB segment can become much larger than
it needs to be, so if you can sum() the sizes of the lobs
that exceed the in-linesize and it's much smaller than
segment size you'll know that your update mechanism is
introducing some sort of problem.
Regards
Jonathan Lewis
Regards
Jonathan Lewis
________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx
<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx
<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>
<oracle-l-bounce@xxxxxxxxxxxxx
<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx
<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>> on behalf of Chris
Taylor <christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx><mailto:christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx>>>
Sent: 31 August 2018 14:23:46
To: knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx><mailto:knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx>>
Cc: ORACLE-L
Subject: Re: Looking for ideas on blocked sessions updating
partitioned table with CLOB
Ah, very good questions.
For the blocking issue:
1. Blocking Type = enq: TX - row lock contention, yet the
SESS_IDs (PK) is different.
2. Basicfile LOBs (I assume these were carried over from an
upgrade from a prev version)
For the PHP object transfer being slow issue:
1. TNS settings:
db server:
sqlnet.ora
---------------------------------
send_buf_size=2097152
recv_buf_size=2097152
default_sdu_size=32768
default_tdu_size=32768
tnsnames.ora
-----------------------------
Nothing on the tnsnames.ora
web/app server:
tnsnames.ora
-------------------------------
Nothing specified here either for SDU etc
2. OS = Red hat Linux 6.8 64-bit , kernel 2.6.32-642.el6.x86_64
3. Block Size = 8192
Thanks,
Chris
On Thu, Aug 30, 2018 at 10:02 PM Stefan Knecht
<knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx><mailto:knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx>><mailto:knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx><mailto:knecht.stefan@xxxxxxxxx
<mailto:knecht.stefan@xxxxxxxxx>>>> wrote:
When you say "blocking" - what event are the blocked sessions
waiting on?
Also, what's your TNS config - particularly SDU sizes between
the mid-tier and the database (TNS connection string and the
receiving listener)?
Are you using securefile or basicfile LOBs?
What OS is the database on and what block size are you using
in the tablespace where the lobs are stored?
On Fri, Aug 31, 2018 at 6:21 AM, Chris Taylor
<christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx><mailto:christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx>><mailto:christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx><mailto:christopherdtaylor1994@xxxxxxxxx
<mailto:christopherdtaylor1994@xxxxxxxxx>>>> wrote:
Env: 12.1.0.2
We have a table that stores session data (base64 encoded)
from web sessions.
The table has 32 HASH partitions and contains a CLOB.
The table is hash partitioned by SESSION_ID which is a
64-char string.
We have this code that executed in PHP into the Oracle database:
UPDATE sessions_table
SET session_data = :session_data // (CLOB)
WHERE session_id = :session_id;
For 95% of the sessions this runs very,very fast.
For 5% of the sessions that LOB being passed from PHP is 15MB
and for some reason PHP is really slow about transmitting
those session OBJECTS over to the database. (Verified
through strace etc).
While those sessions are waiting to complete the update to
the CLOB, they end up BLOCKING other sessions that aren't
trying to update the same session id but are in the same
partition.
I believe they're blocked on the CLOB on the same partition
that the long running session has open as I've confirmed that
the SESS_ID being updated is different in most cases.
So, here's my question, what performance strategy should I be
investigating for CLOBS. Should I just add more partitions
and spread out the likelihood that a session will end up in
the same partition? Seems logical but I'm not sure how CLOBS
play into this.
The CLOBS are stored like this:
Segment Name = SYS_LOB0051640773C00002$$
Index Name = SYS_LOB0051640773C00002$$
Chunk = 8192
PCTVERSION = 10
CACHE = YES
LOGGING = None
Encrypt = None
Compression = None
IN_ROW = Yes
Partition = Yes
Retention = Yes
Looking for any ideas on how to prevent a session from
blocking other non-related sessions that are updating the CLOB.
I'm trying to mitigate the blocking at the db layer while we
stand up a product such as Redis to handle the session
caching at the server level (and remove it from the db).
Chris
--
//
zztat - The Next-Gen Oracle Performance Monitoring and
Reaction Framework!
Visit us at zztat.net
<http://zztat.net><http://zztat.net><http://zztat.net/> |
@zztat_oracle | fb.me/zztat
<http://fb.me/zztat><http://fb.me/zztat><http://fb.me/zztat>
| zztat.net/blog/
<http://zztat.net/blog/><http://zztat.net/blog/><http://zztat.net/blog/>
--
//
zztat - The Next-Gen Oracle Performance Monitoring and
Reaction Framework!
Visit us at zztat.net <http://zztat.net><http://zztat.net/> |
@zztat_oracle | fb.me/zztat
<http://fb.me/zztat><http://fb.me/zztat> | zztat.net/blog/
<http://zztat.net/blog/><http://zztat.net/blog/>