Re: Bank Databases

From: Matthew Zito <matt@xxxxxxxxxxxxxxxxx>
To: mark.powell2@xxxxxx
Date: Mon, 25 Jun 2012 13:35:42 -0400

I think there's often a tendency to blame the outsourced team whenever
these kinds of issues crop up and there's contractors or remote teams
or offshored folks involved.  But in a failure like this, as I think I
said upthread, there's plenty of blame to go around:

- If the offshore people were unqualified, why was management allowing
them to do this upgrade?
- If engineering "got to work on a fix" weds morning, why weren't they
involved in the planning to insure sufficient safeguards?
- If this system was so critical, why was the vendor not already
involved in the upgrade process?
- If the vendor was involved, why the heck did it take days to get a
fix for a major international bank?

I work with software far less operationally critical to normal
business execution, and I *still* get direct calls from customers that
say, "We're planning to upgrade to 8.2 of your software, and I was
wondering if you can take a look at our plan and make sure we're not
doing anything wrong?"

I can only imagine the planning they would do if my software would
prevent them from allowing people to access their money.

Matt

PS - Full disclosure notice, I work for BMC software, which makes a
competing job scheduling product to CA's, though I don't work with it,
have never used it,  or even seen a demo - totally different side of
the company.  So I have no axe to grind against CA, wish them all the
best, and my views are definitely not those of BMC.

On Mon, Jun 25, 2012 at 1:15 PM, Powell, Mark <mark.powell2@xxxxxx> wrote:
> The problem is not the CA-7 software in my opinion but the failure of the 
> out-sourced staff to properly use the software.
>
>
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
> Behalf Of Matthew Zito
> Sent: Monday, June 25, 2012 7:53 AM
> To: Řyvind Isene
> Cc: howard.latham@xxxxxxxxx; niall.litchfield@xxxxxxxxx; oracle-l
> Subject: Re: Bank Databases
>
> Doh - resending as got dinged for overquoting:
>
> Timely enough, the Register is reporting that CA's job scheduler software may 
> be responsible:
>
> http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/
>
> Could certainly mean that Oracle was still involved (or Sybase, or some other 
> database), but the inability to schedule jobs was the root issue.
>
> Matt
>
>
>>>>
>>>> I'm particularly interested as we test our failover every 3 months
>>>> and last time we did so there was a power outage on the standby
>>>> which was running temporarily as primary which we hadn't
>>>> anticipated. The start up script tried to bring what was currently a
>>>> primary db as a standby. I'm trying to automate this and yuk without
>>>> dg broker which has its own set of problems I'm a bit stymied!
>>>> I'm not suggesting Nat West hadn't tested thir failover , but
>>>> imagine its difficult due to volumes.
>>>> On 25 June 2012 12:08, Matthew Zito <matt@xxxxxxxxxxxxxxxxx> wrote:
>>>> > Yes, though I doubt it's anything as simple as an "Oracle issue".
>>>> > From my experience watching large organizations deal with complex
>>>> > crises like this, typically it's a series of cascading failures -
>>>> > so perhaps an Oracle database was involved, but many separate
>>>> > pieces had to fail in order to get to this point.
>>>> >
>>>> > For example, I once saw a major global company's firmwide email
>>>> > system go down for over a day due to a cascading series of:
>>>> > - storage array failure
>>>> > - misconfigured hardware
>>>> > - engineer typo
>>>> > - misunderstood recovery architecture
>>>> >
>>>> > I'm trying to keep it vague intentionally, but if any one of those
>>>> > things hadn't happened, they would have had an hour downtime on
>>>> > their email instead of a 30 hour downtime.  I suspect the natwest
>>>> > issue is similar, *though* I do expect that we'll get more info in
>>>> > the coming days/weeks, so maybe we can get some more details then.
>>>> >
>>>> > Matt
>>>> >
>>>> > On Mon, Jun 25, 2012 at 7:01 AM, Howard Latham
>>>> > <howard.latham@xxxxxxxxx>
>>>> > wrote:
>>>> > >
>>>> > > So Nat west being unable to process transactions for 5 days due
>>>> > > to a
>>>> > change
>>>> > > in backup software and  fail over could well be an Oracle issue.
>>>> > >
>>>> > > --
>>>> > > Howard A. Latham
> --
> //www.freelists.org/webpage/oracle-l
>
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
--
//www.freelists.org/webpage/oracle-l

References:
- Bank Databases
  - From: Howard Latham
- Re: Bank Databases
  - From: Niall Litchfield
- Re: Bank Databases
  - From: Matthew Zito
- Re: Bank Databases
  - From: Howard Latham
- Re: Bank Databases
  - From: Matthew Zito
- Re: Bank Databases
  - From: Howard Latham
- Re: Bank Databases
  - From: Matthew Zito
- RE: Bank Databases
  - From: Powell, Mark

Re: Bank Databases

Other related posts: