RE: Server failures

  • From: "Freeman, Donald" <dofreeman@xxxxxxxxxxx>
  • To: "Freeman, Donald" <dofreeman@xxxxxxxxxxx>, "'Chris.Taylor@xxxxxxxxxxxxxxx'" <Chris.Taylor@xxxxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 30 Sep 2008 09:55:40 -0400

Just to follow up, responsibility for problems is hard to assign at my 
location.   The application owners pay for the servers and manages the users, 
the server team operates and manages the servers, and the database team 
operates the database.   I couldn't tell you how often we endure an outage 
because of the lack of willingness to step up and at least say something when 
something goes wrong.   The servers are aging out and failing and everybody 
waits for everybody else to take action.   Everybody is paralyzed into inaction 
by fear of the response, "That's not your job, mind your own business."  I have 
a six year old production DB server down right now that previously failed back 
in June.  We have servers or VM's that we could have moved it to but everybody 
is pretending that its not their problem.

My DBA's get testy also when I ask them to look into something that is not 
strictly their responsibility.  All of us get nervous when we are clearly on 
somebody else's turf.  When they find something I can take it up the chain and 
get something done for the benefit of all of us.   I point out to my team that 
when things draw to their logical conclusion and a system fails that it will be 
them working around the clock to move and restore a database on Christmas Eve.

Donald Freeman
Database Administrator II
Commonwealth of Pennsylvania
Department of Health
Bureau of Information Technology
2150 Herr Street
Harrisburg, PA 17103
dofreeman@xxxxxxxxxxx<mailto:dofreeman@xxxxxxxxxxx>



________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Freeman, Donald
Sent: Tuesday, September 30, 2008 9:34 AM
To: 'Chris.Taylor@xxxxxxxxxxxxxxx'; ORACLE-L
Subject: RE: Server failures

I'm sure it depends but I have access to all our database servers and review 
server logs when something happens.  Then I open a ticket if I find something.  
I'm sure lines of authority vary widely in the field.

Donald Freeman
Database Administrator II
Commonwealth of Pennsylvania
Department of Health
Bureau of Information Technology
2150 Herr Street
Harrisburg, PA 17103
dofreeman@xxxxxxxxxxx<mailto:dofreeman@xxxxxxxxxxx>



________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Taylor, Chris David
Sent: Tuesday, September 30, 2008 9:19 AM
To: ORACLE-L
Subject: Server failures

So how many of you are responsible for examining your database servers for 
hardware/software faults when it crashes?  Not the database, but the actual 
machine?

We recently had a server crash that reported problems when it came back up.  It 
has also saved a dumpfile to be examined and it reported problems during the 
POST routine.

Now I get this email from my DBA manager: (paraphrased)

"Chris,

John [pc/lan mgr] requested that we try to put our finger on what caused 
MachineA to failover on Saturday.  I looked through the logs extensively today 
[uh huh] and couldn't find anything - can you look around too and see if you 
find anything?"

-Bob"

(Obviously names changed)

Maybe I'm just in a bad mood this morning....grrrr



Chris Taylor
Sr. Oracle DBA
Ingram Barge Company
Nashville, TN 37205
Office: 615-517-3355
Cell: 615-354-4799
Email: chris.taylor@xxxxxxxxxxxxxxx<mailto:chris.taylor@xxxxxxxxxxxxxxx>

Other related posts: