Re: Weird database hanging

  • From: "Don Seiler" <don@xxxxxxxxx>
  • To: "Alex Gorbachev" <ag@xxxxxxxxxxxx>
  • Date: Tue, 18 Sep 2007 21:02:36 -0500

1.  I assume Support mentioned open_cursors and session_cached_cursors
because the ADDM report mentioned them.

2.  You are right that I couldn't log in during the hang.  However
Support has clued me into the "sqlplus -prelim / as sysdba" that does
let me log in.  I was able to issue the oradebug commands then to try
to get the level 10 system state dump.  However the session gets
booted before the dump finishes, although by then there is quite a lot
of info in the dump already (a 7+ MB text file).

3. No it isn't RAC nor is it ASM.

4. Per Oracle Support I have generated ASH, AWR and ADDM reports for
the time period of one of the "hangs" yesterday, in addition to the
RDA tool.  However the ashrpt file doesn't have the string "enq" in it
anywhere.  Perhaps I'm being dense as to what you meant.

5. We are not using huge pages.  My SA says he'll set it up, since it
can't hurt now.  I'm reading the Puschitz tuning page about it now.  I
suppose shrinking the SGA or shared pool is always an option.

One other difference between old and new servers is that I have
enabled the default degree of parallelism on our "warehouse" tables
and indexes, and set NOPARALLEL on our frontend objects.  On our old
hardware it was kind of random which tables or indexes had it enabled
and what degree was set.  Also our parallel_max_servers is 32, where
before it was 8.  I don't think it would hurt to bring it down again.

Don.

On 9/18/07, Alex Gorbachev <ag@xxxxxxxxxxxx> wrote:
> Don,
>
> Kudos to Jeremiah for directing you away from guesswork path (thanks from
> BAAG) but Oracle Support normally does exactly that so don't wait for them -
> rather try to guide them to proper analysis. Did they give you explanation
> on how they wanted to avoid these hungs increasing OPEN_CURSORS and
> SESSION_CACHED_CURSORS?
>
> So far there seems to be a clear indication that new session creation is
> part of the problem so maybe this can give us some hints.
> Few things that come up to my mind to help with investigation:
> - Can you pre-establish SQL*Plus connection as SYSDBA and have it handy?
> During the hang run system state dump and haganlyze dump. This would provide
> some food for thoughts. Oracle support might request it anyway.
> - Do your application/bulk load process reconnect often? Maybe it's
> visibility of hung whereas you simply can't connect? If that's connection
> issue - audit enabled? AFTER LOGON trigger exist?
> - You haven't mentioned it explicitly or I missed but I assume it's not RAC.
> Is it?
> - Do you think you can generate ASH report and figure out object id for
> "enq: SQ - contention". This might give us the hot sequence. Same report can
> provide some hints (P1,P2,P3) for other waits. If instance was really bad,
> we might not have any ASH info if instance was really bad, though.
>
> Since you mentioned large SGA, do you use huge pages? Without them, there
> might be tremendous overhead of large SGA with many processes and especially
> overhead when process starts and starts accessing shared memory.
>
> Hope this helps a bit,
> Alex
>
>
>
> On 9/18/07, Don Seiler <don@xxxxxxxxx> wrote:
> >
> > I have opened an SR with Oracle, as it has hung 3 times today and
> > actually crashed once.
> >
> > When the database hang, Ignite is showing "latch: shared pool" and
> > "latch: library cache" waits.  Otherwise I don't see these at all.
> >
> > Oracle has had me up OPEN_CURSORS and SESSION_CACHED_CURSORS, but I
> > did that last night (with instance restart) and, as I said, it has
> > hung 3 times and crashed once since then.  Oracle's also telling me
> > that this is largely due to application coding.  My problem with that
> > is that the application code has been in place for a while.
> >
> > To recap: we migrated to the 64-bit machine on Sep 1.  Hanging has
> > occurred since Sep 13, seemingly during bulk load activity.  Our SGA
> > is quite a bit larger (16 GB) than on our 32 bit box (1.5 GB).
> >
> > db_cache_size                        big integer 12G
> > shared_pool_size                     big integer 2G
> >
> > I haven't seen the "block change tracking buffer space" wait since
> > yesterday morning, thankfully.
> >
> > Any tips would be appreciated.
> >
> > Thanks,
> > Don.
> >
> > --
> > Don Seiler
> > oracle: http://ora.seiler.us
> > ultimate: http://www.mufc.us
> > --
> > //www.freelists.org/webpage/oracle-l
> >
> >
> >
>
>
>
> --
> Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
> http://www.pythian.com/blogs/author/alex
> http://www.oracloid.com
> BAAG party - www.BattleAgainstAnyGuess.com


-- 
Don Seiler
oracle: http://ora.seiler.us
ultimate: http://www.mufc.us
--
//www.freelists.org/webpage/oracle-l


Other related posts: