Runaway pl/sql in the db from orphaned oas connections causes complete memory saturation

  • From: Charles Schultz <sacrophyte@xxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 6 Dec 2012 21:08:46 -0600

Good day,
I am hoping that some kind soul who is uber awesome at tracking down pl/sql
and PGA issues will be able to help us out on this one. :)

*Basic Environment:*

   - Oracle HTTP Server (Fusion Middleware) 11.1.1.4 running on Red Hat
   Linux 5.8
   - Oracle OSEE 11.2.0.2 (and 11.1.0.7) running on Solaris 64-bit 10
   - LoadRunner and RationalRose on various Windows desktops


*Basic Problem:*
During testing of a particular application, we noticed that memory on the
db server (solaris box) was quickly consumed and eventually caused the
server to tailspin (hang). After lots of testing, we have been able to
paint a much more detailed picture of what is going on behind the scenes.

*Detailed Problem:*
For reasons yet unknown, a small handful of connection pool processes
(between 1 and 5, typically) from the oas will report a timeout to the
end-user (simulated via the batch job; LoadRunner/RationalRose). However,
the associated database process(es) will suddenly allocate massive amounts
of pga and uga memory in a looping fashion until all system memory is
completely consumed. At first we thought the application was at fault, but
we have since duplicated the issue using a simple "hello world" procedure
called by HTP (which is called from the simulated user sessions).
Terminating either the oas process or the db session has the desired
effects of stopping the memory leak.

*Interesting symptoms and observations:*

   - At first we thought the problem was caused by something in 11.2.0.2
   (recently upgraded, hence the suspicion). However we determined the same
   memory leak actually does happen in 11.1.0.7 yet at a significantly less
   rate, enough to be almost unnoticeable in normal operations.
   - Via the application, adding an index to speed up various queries
   actually made the memory leak worse. Slowing down the application (removing
   the index, enabling event 10046) seemed to slow down the rate of memory
   leaking.
   - Similarly, turning on plsql debugging (via event 10938) or reducing
   the number of plsql requests (PlsqlMaxRequestsPerSession) to 1 also slows
   the processes down so much that memory leaking is not evident.
   - We are using prstat (sorted by rss) and 'top' to determine "leaks";
   typically we will see a process on the db server start by grabbing a
   gigabyte of extra memory over and above the sga allocation, and then
   growing at about 1gb per minute per session.



We managed to accumulate 3 open SRs on this problem:

   - SR 3-6456891351 : Hemorrhaging Memory
   - SR 3-6496145406 : KV: OHS/mod_plsql Connection Keeps Growing and
   Causes DB to Become Unresponsive  /* this one has the most detail */
   - SR 3-6532544331 : Customer getting ORA-2730x errors




-- 
Charles Schultz


--
//www.freelists.org/webpage/oracle-l


Other related posts: