UNDO question? 4031 bug??

  • From: Wolfson Larry - lwolfs <lawrence.wolfson@xxxxxxxxxx>
  • To: Oracle-L@xxxxxxxxxxxxx
  • Date: Mon, 6 Dec 2004 15:53:20 -0600

Hi, guys!

        I've got a client running 9.2.0.4 who's system wouldn't allow new
connections for about 45 minutes last week.  Most of the processes (240 out
of 340-360) are from an app using Web Logic which never disconnect.  I was
monitoring the system with TOAD and about the same time TOAD showed the CPU
and I/O activity dropped waay down, but the TOAD connection wasn't lost.
And it seemed no one could log on during this time.  We tried connect as
sysdba and got we were connected to an "idle" instance.

        All of a sudden we could connect.  One of the DBAs noticed at the
same time we could get in again a bunch of UNDO segments were created.  She
went on METALINK and found a couple of potential bugs with UNDO allocation.
One of the notes talks about setting an EVENT but didn't give the EVENT
number and we're still waiting for them to research her TAR.

The problem seemed to go from 8:45 to 9:30 or 14:45 - 15:30 GMT

From alert log, about when we could get in again:
ARC0: Completed archiving  log 19 thread 1 sequence 31755  
Wed Dec 1 15:24:33 2004  Created Undo Segment _SYSSMU50$  
Wed Dec 1 15:24:33 2004  Created Undo Segment _SYSSMU51$  
Wed Dec 1 15:24:33 2004  Undo Segment 50 Onlined  
Wed Dec 1 15:24:33 2004  Undo Segment 51 Onlined  
Wed Dec 1 15:24:33 2004  Created Undo Segment _SYSSMU52$  Undo Segment 52
Onlined 
Wed Dec 1 15:24:33 2004  Created Undo Segment _SYSSMU53$  Undo Segment 53
Onlined  
Wed Dec 1 15:25:30 2004  Created Undo Segment _SYSSMU54$  
Wed Dec 1 15:25:30 2004  Undo Segment 54 Onlined  
Wed Dec 1 15:25:30 2004  Created Undo Segment _SYSSMU55$

From V$UNDOSTAT:
                                                        MAX
S T A R T  T I M E UNDOBLKS     TXNCOUNT  MAXQUERYLEN   CON UNXPSTEALCNT
-------------------- ------ ------------ ------------ ----- ------------ 
02-DEC-2004 14 01 54  3,378    5,113,239           84    12            0
02-DEC-2004 14 11 54  3,489    5,119,315          759     3            0
02-DEC-2004 14 21 54  2,741    5,124,347          595     8            0
02-DEC-2004 14 31 54  3,763    5,131,233          282     6            0
02-DEC-2004 14 41 54  3,461    5,137,022          274     6            0
02-DEC-2004 14 51 54  3,691    5,143,683          292     8            0
02-DEC-2004 15 01 54  3,162    5,149,672          138    13            0
02-DEC-2004 15 11 54  4,420    5,157,174          203     4            0
02-DEC-2004 15 21 54  3,683    5,163,799          316     5            0
02-DEC-2004 15 31 54  4,416    5,171,223          156     8            0
02-DEC-2004 15 41 54  4,010    5,179,075           47     8            0
02-DEC-2004 15 51 54  4,776    5,187,425          130     8            0
02-DEC-2004 16 01 54  4,211    5,195,450          413     6            0


She said someone told her that something happens to UNDO every 12 hours and
There may be a performance bug.
  Anyone familiar with anything like this?

        TIA


  
Same system - 3 times in the last 4 months they've been incurring 4031
errors that at first appeared to coincide with UNDO segment creation, but
now we think is a separate issue.  The problem stems from one of 3 Web logic
application servers that I don't have access to.
  The 4031s range from 26-52K in size.  Last spring they weren't having any
problems like this and the Shared_pool_reserve always showed totally empty.
  I must say we really weren't monitoring that and it was brought to our
attention that the activity had changed.  The SP showed 70 to 120M free, but
we couldn't get a 52K allocation.  We adviced them to make some application
changes and haven't gotten a response.  That is, sometime next year.  The
last 2 times we increased the SP size, but I really didn't want to do that.
Was 240M then 300 now 420M.  The application really isn't that big, but they
have lots of cloned code that gets around cursorSharing=force, because of
the use of nulls.  This is Java app and they don't use bind variables.

     They use lots of public synonyms and I was wondering if that could be
part of the SP fragmentation.  We suggested they change the code and that's
going nowhere.  Anyone have an idea on how to deal with the PSs?  Since I'm
not getting much assistance from the application group.

One of the HOTSOS guys suggested we flush the shared pool once a day and pin
everything, but we were already doing that.  The pinned objects take around
70M.

TIA again.

        Larry


**********************************************************************
The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged.
If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication is strictly
prohibited.
If you have received this communication in error,
please re-send this communication to the sender and
delete the original message or any copy of it from your
computer system. Thank You.

--
//www.freelists.org/webpage/oracle-l

Other related posts:

  • » UNDO question? 4031 bug??