Oracle 11i Forms-based connections hang in RAC

  • From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • To: <racdba@xxxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 26 Aug 2005 17:01:17 -0400

Hi All,
I need some assistance/suggestions with an issue that we are facing
during our test runs of a two-node 11i-RAC environment. Following is the
system configuration:

* Hardware Configuration:
** Servers:
   RAC Node#1: Sun 15K domain with 12 CPUs, 24 GB RAM
   RAC Node#2: Sun 15K domain with 12 CPUs, 24 GB RAM
  These are dedicated nodes for this RAC environment and do not run any
other instance/application

* Interconnect:
   Three private gigabit interconnects between the two servers connected
via a switch

* Network:
   The network interface between the middle-tier and the backend tier
goes over a gigabit private network.

* Storage:
   EMC DMX 3000

* Software configuration
  Veritas DBE/AC for 9iRAC 3.5/MP3
  Oracle applications 11.5.9
  Oracle RDBMS 9.2.0.6 (64-bit) configured with ODM
  Shared Pool size: 1GB
  Buffer cache size: 6GB
  Total SGA size: ~ 7.5 GB

The concurrent managers are configured with PCP and are running fine.

Problem Description:
This environment is a copy of production and is ~750GB in size. We have
a process in place where we run baseline tests using WinRunner for our
production releases to assess how the new code or any new/changed
configuration might have an impact once rolled into production. For this
environment, using the same test process, we gathered baseline
transaction timings on a single-instance non-RAC'd configuration. This
was done so that we could compare the impact of RAC on the apps. We ran
the next test with a two-node RAC configuration. The test did not go
well. The forms-based connections, run by WinRunner, were getting timed
out and loosing connections. While this was happening, I tried to open a
forms-based connection to the core apps and it took me over a minutes to
finally log into the apps. It normally takes around five to ten seconds.
At the same time I also ran a script from the middle-tier to loop and
repeatedly make connections to the database tier using SQL*Net and the
connections were quick and I did not see any latency.
I re-ran the same test and this time set event 10046 at level 12 to
capture trace files from the forms connections as well as scheduled
StatsPack to run every five minutes. The information captured from the
RAW trace files as well as the StatsPack is consistent for all the forms
server sessions that were trying to establish connection to the
database. Following is a section from the tkprof'd file:

begin :v1 := fnd_bes_proc.process_event(:v2, :v3); end;


call     count       cpu    elapsed       disk      query    current
rows
------- ------  -------- ---------- ---------- ---------- ----------
----------
Parse        3      0.00       0.00          0          0          0
0
Execute      3      4.98     145.05          0      13027      24410
3
Fetch        0      0.00       0.00          0          0          0
0
------- ------  -------- ---------- ---------- ---------- ----------
----------
total        6      4.98     145.05          0      13027      24410
3

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 173     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total
Waited
  ----------------------------------------   Waited  ----------
------------
  global cache null to x                        393        0.17
9.47
  global cache open s                            16        0.07
0.42
  global cache open x                           109        0.07
0.19
  global cache s to x                            61        0.07
0.34
  log file sync                                  30        0.00
0.14
  buffer busy waits                            1802        0.91
18.51
  buffer busy global cache                     2540        0.17
87.80
  buffer deadlock                              2078        0.00
0.00
  latch free                                    195        0.02
0.36
  enqueue                                       443        0.49
19.37
  global cache null to s                         10        0.09
0.31
  KJC: Wait for msg sends to complete            28        0.00
0.00
  global cache cr request                      1618        0.12
2.05
  library cache pin                               1        0.00
0.00
  buffer busy global CR                         305        0.00
0.39
  global cache busy                              38        0.07
0.99
  lock escalate retry                             3        0.00
0.00
************************************************************************
********

The "buffer busy global" event is the highest contributor among all the
events in most of the connections that either took long time to connect
or failed to connect due to timeout. The file#, block# information from
the RAW trace file point to an IOT that is part of a standard Oracle
workflow Queue table. What I am failing to understand is that why a
simple connection to the core apps would call the workflow queue. I can
also send an output from StatsPack if anyone is interested.
Any help/suggestion will be appreciated.

Thanks
Amir

Other related posts: