I think there's often a tendency to blame the outsourced team whenever these kinds of issues crop up and there's contractors or remote teams or offshored folks involved. But in a failure like this, as I think I said upthread, there's plenty of blame to go around: - If the offshore people were unqualified, why was management allowing them to do this upgrade? - If engineering "got to work on a fix" weds morning, why weren't they involved in the planning to insure sufficient safeguards? - If this system was so critical, why was the vendor not already involved in the upgrade process? - If the vendor was involved, why the heck did it take days to get a fix for a major international bank? I work with software far less operationally critical to normal business execution, and I *still* get direct calls from customers that say, "We're planning to upgrade to 8.2 of your software, and I was wondering if you can take a look at our plan and make sure we're not doing anything wrong?" I can only imagine the planning they would do if my software would prevent them from allowing people to access their money. Matt PS - Full disclosure notice, I work for BMC software, which makes a competing job scheduling product to CA's, though I don't work with it, have never used it, or even seen a demo - totally different side of the company. So I have no axe to grind against CA, wish them all the best, and my views are definitely not those of BMC. On Mon, Jun 25, 2012 at 1:15 PM, Powell, Mark <mark.powell2@xxxxxx> wrote: > The problem is not the CA-7 software in my opinion but the failure of the > out-sourced staff to properly use the software. > > > -----Original Message----- > From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On > Behalf Of Matthew Zito > Sent: Monday, June 25, 2012 7:53 AM > To: Řyvind Isene > Cc: howard.latham@xxxxxxxxx; niall.litchfield@xxxxxxxxx; oracle-l > Subject: Re: Bank Databases > > Doh - resending as got dinged for overquoting: > > Timely enough, the Register is reporting that CA's job scheduler software may > be responsible: > > http://www.theregister.co.uk/2012/06/25/rbs_natwest_what_went_wrong/ > > Could certainly mean that Oracle was still involved (or Sybase, or some other > database), but the inability to schedule jobs was the root issue. > > Matt > > >>>> >>>> I'm particularly interested as we test our failover every 3 months >>>> and last time we did so there was a power outage on the standby >>>> which was running temporarily as primary which we hadn't >>>> anticipated. The start up script tried to bring what was currently a >>>> primary db as a standby. I'm trying to automate this and yuk without >>>> dg broker which has its own set of problems I'm a bit stymied! >>>> I'm not suggesting Nat West hadn't tested thir failover , but >>>> imagine its difficult due to volumes. >>>> On 25 June 2012 12:08, Matthew Zito <matt@xxxxxxxxxxxxxxxxx> wrote: >>>> > Yes, though I doubt it's anything as simple as an "Oracle issue". >>>> > From my experience watching large organizations deal with complex >>>> > crises like this, typically it's a series of cascading failures - >>>> > so perhaps an Oracle database was involved, but many separate >>>> > pieces had to fail in order to get to this point. >>>> > >>>> > For example, I once saw a major global company's firmwide email >>>> > system go down for over a day due to a cascading series of: >>>> > - storage array failure >>>> > - misconfigured hardware >>>> > - engineer typo >>>> > - misunderstood recovery architecture >>>> > >>>> > I'm trying to keep it vague intentionally, but if any one of those >>>> > things hadn't happened, they would have had an hour downtime on >>>> > their email instead of a 30 hour downtime. I suspect the natwest >>>> > issue is similar, *though* I do expect that we'll get more info in >>>> > the coming days/weeks, so maybe we can get some more details then. >>>> > >>>> > Matt >>>> > >>>> > On Mon, Jun 25, 2012 at 7:01 AM, Howard Latham >>>> > <howard.latham@xxxxxxxxx> >>>> > wrote: >>>> > > >>>> > > So Nat west being unable to process transactions for 5 days due >>>> > > to a >>>> > change >>>> > > in backup software and fail over could well be an Oracle issue. >>>> > > >>>> > > -- >>>> > > Howard A. Latham > -- > //www.freelists.org/webpage/oracle-l > > > -- > //www.freelists.org/webpage/oracle-l > > -- //www.freelists.org/webpage/oracle-l