RE: Dell-Oracle-Linux: Anyone else run this...because its not working for us!

  • From: "Marquez, Chris" <cmarquez@xxxxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 8 Dec 2005 16:19:58 -0500

The resolution...and this is no joke...directly from Dell...I was on the
conference call;

Dell (and I assume its hardware partners) could not resolve our/their
hardware problem, 
nor will they further try to resolve the problem and we should begin
working with our 
Dell sales rep. to negotiate some form of hardware exchange/rebate.

Yes, I was speechless too.
We had been running on hardware (in a config) that had been tormenting
us for a year and was never going to work!
Oh it gets better or more crazy...depending how you look at it.

Turns out (I asked directly) that this config;
        - Dell PowerVolt 220 External SCSI Storage
        - PERC "n" DC (RAID) Controler Card
        - Dell PowerEdge 2*50 Servers
It is still sold every day!!!
"We sell these for Windows [servers] all day long"...what!?
Turns out that this config is not "recommend" and/or has been
de-certified for Oracle!...what!?
They stopped just short of blaming Oracle.

Remember in my original post I indicated that we did not see the same
issues on the same hardware when we ran *WITHOUT* RAID implemented on
the PERC controler cards nor with the PowerVolt in "cluster
mode"...although I believe the RAID and PERC cards to be the problem.

So reading between the lines what I gather from all of this is either;
Do NOT use RAID (and Oracle) with this config _or_ run low IO apps on
this config.
What I got out of this painful ordeal is that this hardware works just
fine as long as you do not "push" it hard!
 :o|

This one will stick with me for a while.
 
Chris Marquez
Oracle DBA

PS It has taken me so long to post because I have to rebuilt our entire
2 db servers and 2 backup db servers off of this config.
We were 100% RAID and now we have no RAID at all...but very solid backup
and failover operations.
Nothing like moving, or rather "juggling" you production servers on a
days notice.

PPS I should mention that this is for another client...not from where I
send this email.
I would not want to misrepresent somebody else's sever environment.


  


------------------------------------------------------------------------
--------
From: Glenn Stauffer [mailto:alaxsxaq@xxxxxxxxx] 
Sent: Thursday, November 17, 2005 6:29 PM
To: Marquez, Chris
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!


We run Dell 2650's with Redhat Linux for our OAS systems.  These boxes
work very well for commodity boxes but the storage systems are fairly
basic with internal drives in a mirrored configuration.  We've had many
problems with two Dell SANs that we use for non-Oracle applications
(email; central file storage).  Dell has been quick to get in to replace
failed hardware, but we've had problems with the automated support stuff
not working and Dell has been just plain terrible helping troubleshoot
performance problems with one of our SANs.  I'm about to buy new
database servers with shared storage and Dell doesn't make the cut.  You
will pay (a little/lots) more for HP, IBM, or Sun hardware, but Dell's
hardware and support doesn't compete as far as I'm concerned.

--Glenn


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Paul Drake
Sent: Thursday, November 17, 2005 9:16 AM
To: tomday2@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!

On 11/17/05, Thomas Day <tomday2@xxxxxxxxx> wrote:
> Just about to replace a hard drive on our PowerVault just as soon as 
> the Dell tech gets here.  Doesn't sound like a very reliable machine.

> I'm not sure if it's the hard drive or the controller.  But the price 
> is right and the gumment loves them.  Our web farm people report that 
> they have to replace a hard drive once a month on the average (30 web
servers).
>
> In my case I have to play with cards I'm delt but the only other time 
> that I've had hard drives fail was 10 years ago when I had to work 
> with some NCR pieces of junk.  There, with just one machine, the 
> controller would report a hard drive bad about once a week.  Nothing 
> really wrong with the drive, just the controller had decided to make
life interesting.
>
> The tech would put in the new drive, test the "bad" drive, and return 
> it to storage.  I really got to practice my recovery techniques.
>
> Looks like the good old days are back again.

A week ago last Monday, a single failed hard drive in a hardware RAID 10
configuration took down a server running Oracle 10g R1 on a Dell PE
2650. Yes, that RAID volume supported the OS mount points but isn't
thepoint of (hardware) RAID 10 to handle the fault and not propagate the
failure to the OS?

A replacement of the failed drive and a system restart kicked in the
auto-rebuild of the volume, but still I was unhappy that the unit didn't
take the hit and kept ticking. I haven't had a window yet to upgrade the
firmware and drivers and see if that alleviates the problem.

Months ago, a refurbished Dell PE 2800 repeatedly threw errors when
running a large import job. The internal RAID 10 vols would simply go
offline. Replacing the PERC (poweredge raid controller) resolved the
issue.

At a client site, a Dell PE 6350 under load would occassionally lose all
connectivity with its pair of direct attached SCSI RAID PV 220S units,
across a pair of perc cards. Fortunately is was only the test system and
not production and a system restart would remount the volumes.

Across several installations, Dell + EMC Clarion units have been stable
and solid.

Paul

#/etc/init.d/init.cssd stop
-- play a Sony CD, install a rootkit today
--
//www.freelists.org/webpage/oracle-l



 

-----Original Message-----
From: Paul Drake [mailto:bdbafh@xxxxxxxxx] 
Sent: Wednesday, November 16, 2005 4:15 PM
To: janine@xxxxxxxxxx
Cc: mkb125@xxxxxxxxx; Marquez, Chris
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!


Hi Janine.

We have a Dell PE 2800 that threw errors during testing ... importing
data into a new database. After multiple go-rounds with Tech Support ...
I was lucky enough to get ahold of a competent Tech Support engineer in
the server support group. He authorized a replacement of the RAID
controller and I haven't had a storage issue on that box since then.
That server is running w2k3 svr 32 bit, but had run RHEL 3 update 5 in
testing.
That box has 10 drives, 2 PERC cards as the removable drive bay was
populated.
When it would hit an error, the 8 internal drives not in the drive bay
would go bye-bye.

The same box threw memory errors.
We've gone thru 3 separate iterations of attempting to replace the
failed module.
As it was in pairs ... they've sent the wrong parts, sent one module
(unpaired).
We're still awaiting replacement parts and have been limping along on
only 2 GB in that box.

Their support is spotty - some great techs, some bad - kind of like
Oracle or any other company.

Paul


-----Original Message-----
From: Janine Sisk [mailto:janine@xxxxxxxxxx] 
Sent: Wednesday, November 16, 2005 3:09 PM
To: mkb125@xxxxxxxxx
Cc: Marquez, Chris; oracle-l@xxxxxxxxxxxxx
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!

On Nov 16, 2005, at 11:52 AM, mkb wrote:

> Hehehehe...I just finished a call with Dell support - memory issue for

> probably the 3rd time this year.

We are a much smaller shop than the rest of you and we run dinky  
little servers by comparison, but even I have a Dell horror story.   
We had one server we bought a few years ago, I think it was a 2550 but
I'm not sure about that, that had hardware problems from day one.  
I was trying to load a multi-GB Oracle export and the system kept
restarting itself halfway through.  Dell very reluctantly sent
replacement parts several times but we were never able to get it fully
working.  It has exhibited a multitude of symptoms over the years.

My sys admin is both busy and lazy and he didn't follow up very well
with Dell, plus they moved as slowly as humanly possible, with the
result that the machine finally went out of warranty and still was not
working right.  We have never been able to use it in production.

The sys admin finally figured out what the root cause is some time ago;
he read somewhere that this particular hardware has disk controller
problems when you have two CPUs in the box and are running Linux.  It
would probably work fine if we could take one of the CPUs  
out, but you can no longer buy the appropriate blanks from Dell.   
They know about the problem, but have never managed to fix it (not that
they are admitting to, anyway).  I should say for the record that this
info is third-hand or worse and should not be relied upon as the gospel
truth, even though I have no reason to doubt it based on our experience.

We have continued to buy servers from Dell (holding our noses each
time) because they have been the most cost-effective choice, even with
the hassle factor.  But I have been reading lately that they are telling
analysts they are going to bump up their profit margins and do less
discounting.  I almost hope that happens, just so I have an excuse to
buy from someone else!

janine




-----Original Message-----
From: mkb [mailto:mkb125@xxxxxxxxx] 
Sent: Wednesday, November 16, 2005 2:52 PM
To: Marquez, Chris; oracle-l@xxxxxxxxxxxxx
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!

--- "Marquez, Chris" <cmarquez@xxxxxxxxxxxxxxxx>
wrote:
> For the 4th time in 12 months our hardware has let us down and we are 
> running on the a backup db server.
> Disk errors controller (module?) failure.
> ..............
> ..............
> Chris Marquez
> Oracle DBA
> 

Hehehehe...I just finished a call with Dell support -
memory issue for probably the 3rd time this year.  

We're running 6650s on RHAS 3.0 with EMC/Dell/Clariion
CX-200 as our db storage.  Don't know what version of
PERC or RAID s/w but we do seem to have quiet a bit of
hardware errors on our servers.

Ususally memory and disk issues.  Our SAN is pretty
stable with no outages there.  But yeah, the server HW
seems a bit prone to failures.

--
mohammed




-----Original Message-----
From: Paul Drake [mailto:bdbafh@xxxxxxxxx] 
Sent: Wednesday, November 16, 2005 1:59 PM
To: Marquez, Chris
Subject: Re: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!


Chris,

We're running RHEL 3 ES update 5 with 10.1.0.4 on a single Dell PE 2650
with a single PV220S unit (split backplane). I haven't run clustered
anything.

Other than one drive failure, we've had no issues.

This unit did run 9.2.0.5 on RHEL 3 ES (was probably update 2 at the
time) without issue.

Let me know if there is anything in particular you're looking for.

Paul






------------------------------------------------------------------------
--------
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Marquez, Chris
Sent: Wednesday, November 16, 2005 1:34 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Dell-Oracle-Linux: Anyone else run this...because its not
working for us!




*********
SPECS
*********
Dell 2650
PERC 4/DC (Dual Channel) RAID Controller for *external* storage (on 2
servers)
PERC 3/DC (Dual Channel) RAID Controller for *external* storage (on 2
other servers)
PowerVault 220

Oracle EE 9205
Oracle Cluster Manager 9205 (oracm, version[ 9.2.0.5.0.51 ]) 
Oracle OCFS-Oracle Cluster FileSystem 1.0.13-PROD1 (on 2 servers)
EXT3 (on 2 other servers)

Red Hat Enterprise Linux ES release 3 (Taroon)
kernel 2.4.21-15 (resently upgraded at Dell request.
Linux SCSI MegaRAID Driver, Version 2.10.9.0 - Release Date: 10/25/2004
- Products Supported: MegaRAID Controllers
*********
SPECS
*********

For the 4th time in 12 months our hardware has let us down and we are
running on the a backup db server.
Disk errors controller (module?) failure.

We have tick with Dell open.  Today Dell Support tells us that this
config is *now* not supported for Oracle (RAC?)!!!

My SA tells me that on some Dell forum he sees lots of pleading for help
from those running Dell-PowerVault and PCI PERC RAID Controller for
*external* storage and MegaRAID Driver.  He says most please go
unanswered.

We run this is a RAID 1 config and previous ran it in a RAID 5
config...Dell is telling us that only RAID 10 works for the hardware
(for Oracle)!?

He is the really sad part.
One of our 2 Dell PowerVault 220's we bought over two years ago with
PERC 3/DC Controller for *external* storage.
We ran this for Oracle 817 on SuSE 7.3 (desktop version, pro?...not
server) with the out of the box MegaRAID Driver from SuSE.  Also, we ran
"naked" drives...no raid at all.  And guess what, not a single Disk,
Controller, Driver failure I can remember.

Now that *same* hardware in the config described above;
9i-RHEL3-ETX3-MegaRAID Driver has failed us over and over.
Between the two like hardware systems (one RAC, on NON-RAC) we have a
probably 6 total failovers and many, many short crash outages.

Seems to me that this software and RAID just doesn't work.

Anyone have experience with this hardware?

Thanks,

Chris Marquez
Oracle DBA 


--
//www.freelists.org/webpage/oracle-l


Other related posts: