RE: Question on concurrency waits

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: "'Pap'" <oracle.developer35@xxxxxxxxx>, "'Andy Sayer'" <andysayer@xxxxxxxxx>
Date: Tue, 26 Oct 2021 12:01:07 -0400

A guess that satisfies the symptoms is you hit a calendar expired password
cycle and issue a change your password instruction interactively to each of
many users who have start of workshift collisions, so they are having
concurrency issues with each other.

This is similar to an hilarious problem with some badly written timecard punch
the clock applications.

A common solution is to divide such monthly calendar collisions by
approximately 30 by using the person’s day of the month as the monthly cycle,
but you may need to spread out the days that fall on weekends lest Mondays
become a bump.

Or it could be something completely different. Answers that merely fit the
symptoms are worth checking if the cost is low to check, but they are not
always correct. That is the difference between “consistent with” and “cause of.”

Good luck. And upgrade, as per Andy. You might still have a poor underlying
situation, but concurrency issues CAN be ameliorated by having each of the
colliders take a smaller time slice so there is an improved chance they take
place when another is not in flight.

From: Pap [mailto:oracle.developer35@xxxxxxxxx] ;
Sent: Tuesday, October 26, 2021 4:03 AM
To: Andy Sayer
Cc: Mark W. Farnham; Oracle L
Subject: Re: Question on concurrency waits

Thank You Andy.

In ASH the column sql_opname is showing 'alter user' and the module and program
is passchng.exe for user_id as the same user. The sql_id is not captured either
in gv$sql or dba_hist_sqltext so dont know the exact DDL text. And the irony is
even so many failed login attempts were happening and the concurrency issue was
happening , but it was self correcting itself after sometime, not sure how. Or
is it that only a handful of sessions were logged in with the wrong pwd and
thus failed. But this Alter user has been logged in ASH during the start of
each issue period which happened 2-3 times.  And we see the password last
change date(PTIME in User$) for that same user is showing as a month past date,
so not sure if it means the password has not successfully altered for that user
even attempted.

On Tue, Oct 26, 2021 at 1:13 PM Andy Sayer <andysayer@xxxxxxxxx> wrote:

To clarify, where exactly are you seeing it is an alter user statement being
executed? Do you have a sql_id or is this just using the sql command type
column in ASH?

Thanks,

Andy

On Tue, 26 Oct 2021 at 04:49, Pap <oracle.developer35@xxxxxxxxx> wrote:

Thank You Mark and Andy. Yes it's mostly the invalid login attempts that have
exposed the concurrency bug and have caused the impact. But yes the cause must
be the place from where exactly those 'alter user..' (which is mostly password
change request only) is coming up.  We are yet to get it from the application
side.

On Tue, Oct 26, 2021 at 3:22 AM Mark W. Farnham <mwf@xxxxxxxx> wrote:

Yes to both of Andy’s remarks, but you still also have the issue with “program
'passchng.exe'” and what is causing someone to issue Alter User commands (which
from the program name the suggestion is that the password is being changed).

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Andy Sayer
Sent: Monday, October 25, 2021 4:46 PM
To: Pap
Cc: Mark W. Farnham; Oracle L
Subject: Re: Question on concurrency waits

This sounds like the 11.2 issue with bad concurrency when incorrect password
attempts are made, this was fixed in 12.1 (both better concurrency and
instrumented as a more obvious wait).. I suggest you check for ORA-1017 in your
application and listener logs.

Also, it's about time you upgrade ;)

Thanks,

Andy

On Mon, 25 Oct 2021 at 21:29, Pap <oracle.developer35@xxxxxxxxx> wrote:

I tried the Dash Top script and below result from that clearly showing the top
consumer as 'oauth' session. I am not able to see any rows from
v$db_object_cache when passing the P1 as filter to hash_value column, maybe
they must have been flushed away. Can this be some kind of buggy behavior?

TotalSeconds

AAS

%This

EVENT

TOP_LEVEL_CALL_NAME

SQL_OPNAME

P1TEXT

P1

FIRST_SEEN

LAST_SEEN

92480

77.1

72%

library cache lock

OAUTH

handle address

1.28E+11

24-10-2021 22:25

24-10-2021 22:39

14560

12.1

11%

SQL*Net message from dblink

V8 Bundled Exec

SELECT

driver id

6.76E+08

24-10-2021 22:20

24-10-2021 22:39

4390

3.7

3%

V8 Bundled Exec

SELECT

driver id

6.76E+08

24-10-2021 22:20

24-10-2021 22:39

820

0.7

1%

OAUTH

handle address

1.28E+11

24-10-2021 22:25

24-10-2021 22:39

780

0.6

1%

VERSION2

SELECT

cellhash#

1.61E+09

24-10-2021 22:20

24-10-2021 22:26

On Tue, Oct 26, 2021 at 1:18 AM Pap <oracle.developer35@xxxxxxxxx> wrote:

Thank You Mark. got it now.

Now as we are still trying to find what exact changes were made by the dev
team. I tried capturing the Dash Wait chain using tanels scripts. And attached
are the results. The top sessions seem to be from 'jdbc thin client' but from
SYS , so I'm wondering if these are just from the OEM tool. And this might be
the victim but not the culprit. And the application queries which were blocked
with event = 'library cache lock' were showing in_parse and in_hardparse column
as 'N' , so means they were not getting parsed while experiencing 'library
cache lock'.

However, while I was manually trying to travel through the wait chain, I saw
its ending at some session with top_level_call_name as 'OAUTH'. and it was
executing the below query and then might be some 'alter user' from that
session. And because of sampling , it has not captured all statements though.
But we see the number of sessions with top_level_call_name as 'OAUTH' has been
increased from <10 per hour to ~40K+ in an hour during the issue period. So can
it be the cause or is it the victim of some other ? How should we proceed to
get to the cause? And if it's a good idea to bump the shard_pool size here?

select exptime, ltime, astatus, lcount from user$ where user#=:1

On Tue, Oct 26, 2021 at 12:19 AM Mark W. Farnham <mwf@xxxxxxxx> wrote:

dc_users is being modified by alter user. any parse needs to check dc_users for
what that user is allowed and where to put things like temp. consistency is
protected by a latch.

Is that the answer to your question about how it is logically related?

mwf

From: Pap [mailto:oracle.developer35@xxxxxxxxx] ;
Sent: Monday, October 25, 2021 12:22 PM
To: Mark W. Farnham
Cc: Oracle L
Subject: Re: Question on concurrency waits

Thank You Mark. So we are trying to reach out to the Dev team because it seems
to be fired from the application user as we can see from the user_id column in
v$active_session_history. But unfortunately the exact sql_text we are not able
to see from the database side.

And regarding the relation of library cache lock+cache cache lock with the
alter user, Would you explain a bit , how logically this is related. Is this
because each application query must be getting hard parsed , if the underlying
user from which it's fired is getting altered or anything else?

On Mon, Oct 25, 2021 at 9:30 PM Mark W. Farnham <mwf@xxxxxxxx> wrote:

yes, and the immediate question is to find out why alter user is being issued.

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Pap
Sent: Monday, October 25, 2021 6:31 AM
To: Oracle L
Subject: Re: Question on concurrency waits

My mistake, one correction:- the cache_id for all those 'row cache lock' is
pointing to dc_users.

On Mon, Oct 25, 2021 at 3:12 PM Pap <oracle.developer35@xxxxxxxxx> wrote:

Hi,  It's 11.2.0.4 Oracle database. We are suddenly seeing many application sql
queries running slow and are showing 'library cache lock'. And checking the ASH
for the exact time period when the issue started and the wait event appeared,
we found few SYS sessions were doing 'ALTER USER' from program 'passchng.exe'
and we are not able to see exact statement from sql_text for this (which may be
because of its DDL and for DDL the AWR doesn't capture the text as of
11.2.0.4). So I wanted to understand if the ALTER USER command can cause such
locking issues?

And also I see in the initial few minutes this session(Alter user session) was
showing 'library cache lock' and I don't know how to get more information from
the value of "handle address" but after some time, that session was showing the
'row cache lock' with cache_id pointing to the below cache objects. Are these
can cause concurrency / "library cache lock" for other application queries?

kqlsubheap_object

extensible security user and rol

extensible security principal pa

extensible security UID to princ

extensible security principal na

extensible security principal ne

XS security class privilege

qmtmrctp_cache_entries

qmtmrciq_cache_entries

References:
- Question on concurrency waits
  - From: Pap
- Re: Question on concurrency waits
  - From: Pap
- RE: Question on concurrency waits
  - From: Mark W. Farnham
- Re: Question on concurrency waits
  - From: Pap
- RE: Question on concurrency waits
  - From: Mark W. Farnham
- Re: Question on concurrency waits
  - From: Pap
- Re: Question on concurrency waits
  - From: Pap
- Re: Question on concurrency waits
  - From: Andy Sayer
- RE: Question on concurrency waits
  - From: Mark W. Farnham
- Re: Question on concurrency waits
  - From: Pap
- Re: Question on concurrency waits
  - From: Andy Sayer
- Re: Question on concurrency waits
  - From: Pap

RE: Question on concurrency waits

Other related posts: