Re: Hardware / OS recommendation

  • From: Carel-Jan Engel <cjpengel.dbalert@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Sat, 20 Mar 2004 16:16:00 +0200

OK Mogens, you're right. I agree totally with you. Commit. Respect. But.

Do I read you well that you are putting a thesis that, once proved to be true, makes it unavoidable to apply magic bullets, without proper testing? So we have to leave the scientific approach? I cannot read your post in another way, but that can be caused by my tiredness, just after a full day of final rehearsing a full-evening performance with my choir. [Jared amd list-members, I'm not trying o catch up a closed thread!]

So we end up in the oximoron, that more complexity requires more testing. Testing more complexity requires more time. Time to market becomes shorter and shorter. There is less time to test (and optimize! let's face that. Many teams of huge amounts of cheap junior java programmers think that optimization is not needed any more. Just add more hardware). We need more time to test. So, don't test, but increase complexity by adding more hardware. This rat-race must end somewhere, when we all run over the edge, like lemmings.

However, it's not only in our business that I see this happen. In Rotterdam new streetcars were put in service. Many, many problems came up, the delivered ones were sent back, new ones were temorarily refused. Now everything seems to be fine, but this is an example of 'testing in production' of renowned engineering companies. Another example: bridges (London, Rotterdam), shivering from winds during storms. Pretty mature engineering, bridgbuilding, wasn't it? Still 'not tested enough'.

This is no excuse for not testing. One should test. But, When systems are growing too complex, full testing becomes impossible. How to cope with this? I think that sufficient knowledge of the system must be available, at least in the first days/weeks/months when a system goes life. What is sufficient? Sufficient is equal to maturity and experience, of thos who have tested many isolated components, enough to be able to pinpoint a possible cause when the system fails. And then aagain, it's all a matter of money, trade-offs, common sense. A couple of years ago Mercedes-Benz decided that not the BEST product, but the product that would stand the life of the car the best for the lowest price was what they wanted. Not every site is expecting top-of-the-bill performance, avalability, consulting. We shoudl deliver the best we can, given the price the customer is willing to pay. Quality is: The expected result for the price agreed. Nothing more, nothing less. If we charge not enough, or put it another way, promise too much, it's our problem, not the customer's. We're the experts (we think), we haven to guide the customer through the process. But we should also guide the customer when he's moving targets (which they always do). Too often, just to save the relationship, we let derail the train, and plunge into the river of sadness together with the (unsatisfied) customer.

So, testing all is most likely impossible. Keeping on testing new features, in isolation, to gain knowledge still makes sense. This testing can still be done in a scientific setting. This knowledge can help solving problems in a quick (and dirty?) way, not by magic bullets, but by recognizing a symptom and know the cure. Life is pretty much an adventure. It is not possible to kwno everthing in advance. When there is no time, calculate the risk, share your thoughts with the customer, and give it a try, or not.

Don't know whether this post makes any sense anymore, bottom line is: Skyhigh complexity makes applying magic bullets inevitable. Although a weapon factory doesn't test the bullet that's sold to you, it does test copies of them in isolation. Behaviour is pretty much predictable. Every now and then one will fail. I didn't serve in the army, but I heard sometimes that a gun might blow in your face when the bullet's stuck. Nothing different her.

Regards, Carel-Jan

===
If you think education is expensive, try ignorance. (Derek Bok)
===


At 09:05 AM 3/20/2004, you wrote:
You're the man...

Agreed - that one is easy. But what about real test scenarios? The last sentence in Dave Ensor's Oracle Design book from 1997 says:

"It really is absurdly difficult to design a piece of code that will completely meet a totally unknown requirement."

Funny as it is, this is increasingly what we're asked to do. Maybe not "totally unknown", but trying to translate business requirements into scientific technical stuff is not easy, to say the least. Impossible? That depends on your definition of impossible, of course.

But think about implementing Oracle Apps (or SAP or Siebel or JDEdwards or Axapta or whatever). 42 modules have been bought by the customer. Some of them will be accessed by a known number of users in an unknown way. Some of them by an unknown number of users in a known way. And so on.

I'm not saying it's impossible to test realistically for performance on such a system. I have just never seen it done. I have seen many tests, though. But they always end up testing a well-defined workload - and how many systems have well-defined workloads? And for what usage intervals? End-of-month? A typical Tuesday? The possibilities are many :-).

Which explains why some of the World's biggest (and medium-sized) systems still get into trouble. Not because they didn't have the right people allocated to size and test the system. Not for lack of test harnesses or licenses to LoadRunner and other excellent products. But because you can't test complex dependencies and load combinations.

So many systems end up being over-sized. In case of doubt, we'd better buy the biggest machine we can get our hands on. Let's also throw in 128 GB of RAM just to be on the safe side. You never know.

I'm not arguing against testing for performance, availability or functionality. I'm just arguing that many times when eg supporters (was one myself for 10 years in Oracle, remember :-)) say: "Ah, but this situation could have been avoided if you had tested this more carefully" they are of course right. Water runs downhill. But could they have done it better themselves? In most cases I think not.

OK, then you put the system into production after careful testing. It runs OK for a while. Then you need to upgrade from 9.2.0.3 to 9.2.0.5. So the heuristics (constants) of the optimizer changes a little bit, because somebody in Development has fixed a bug or enhanced some functionality. Then you should test again, right? I mean, you shouldn't upgrade unless you have tested...

Then you hit some bug, and the fix will be put into 10g release 2. Until then you're advised to set _many_complex_joins_forever=42 and unset timed_statistics and change to AUM to work around the problem. Well, now you need to test again. You can't just put these changes into init.ora without testing it...

Imagine the amount of time it could take to fully test the environment in these cases. It's not feasible. And what about emergency situations (ora-600's, security patches, etc.)? Do you wait a week or two before applying patches or do you just do it?

And any patch is a potential de-stabilizer of your system.

My stipulation is this: Sooner or later you're forced to just do it out there in the real world. Remember how we used to be extremely careful about applying patches to our Windows laptops? Now it says: Click here and you will have all sorts of changes done to your system. So you click, because you don't even want to think about going through it, test it, backup the system before you click, etc.

It's what Oracle is doing in 10g, by the way: "You should apply this monster patch set. OK, done. Oracle cannot possibly know how it will impact your system, and you're not going to test it before clicking.

Writing this, and thinking about the projects we're getting into, and seeing what customers are asking, I think we will increasingly end up with exactly the situation Dave's sentence describes.

That's not because the people that need the systems are more stupid or more evil than their equivalents 10 years ago. It's because the complexity is increasing so fast (exponentially) that it has become impossible to do the isolation thing anymore, except in some absurdly simplistic systems for very, very specific and repeated usage.

Yes, a dynamic system with "on-demand" resources ad libitum (yeah! and without new licensing models this will ROCK for software and hardware vendors) could address some performance needs, thus pretty much removing the need for that part of testing (in the perfect world of infinite resources). But still there's the interdependency, the serialisation, the functional, and other testing requirements that won't be adressed by infinite hardware resources.

So is the grid, RAC and all that really about admitting that you cannot test anymore? :-)

Mogens

Pete Sharman wrote:

Ooohh I like it when you're rough! :)
Actually, given what John responded in a separate email as an example of why there were complaints about lack of testing, I totally agree with you. It's impossible to test for things that are out of scope. In this case, testing to include a script that a user wrote him/herself that you know nothing about just ain't viable. What are needed here are scope boundaries. Like any good project, the testing should define up front what's in scope and what's out of scope. User written scripts that are in the user's login directory and not under change control for the app are clearly out of scope. And if you need ad hoc query capability, then maybe you need the ability to dynamically grow and shrink the computing resources available to you. Heck, maybe you need a grid! :)
Pete


"Controlling developers is like herding cats."
Kevin Loney, Oracle DBA Handbook

"Oh no, it's not. It's much harder than that!"
Bruce Pihlamae, long-term Oracle DBA
-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Mogens Nørgaard
Sent: Saturday, 20 March 2004 9:34 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: Hardware / OS recommendation
And I think there never, ever can be enough testing. If anything goes
wrong, or if anything behaves worse than what we want or expect, we can
always - always - say: "Ah, they should have tested it (more)". But this
is NOT the case, in my opinion. It's just an easy way out for all of us.
A way to blame someone else, when we don't know what to do ourselves anyway.
[This is not an easy shot at you, Pete. But I've been wondering about
tests for a while. I think they're not worth much, to put in mildly :)].
If a test should be of any value, it should prove something. But can it
ever prove that your environment - your combination of online ad-hoc,
planned batch and ad-hoc batch - can run on a given combination of
thingies? No, it can't.
You can test and measure and judge and guess that your system can
sustain the IO workload the system can handle. You can pray that
serialisation (latches, locks, enqueues, etc.) won't get in the way. But
can you control batch in a Unix/Windows world? No, you can't. Can you
direct certain services/stuff to dedicated CPU's? Yes, but with great,
great difficulty.
In the words of my old friend Ole (sorry, that's his name. So Ole' Ole
sounds pretty cool...): "Benchmarks are always in-conclusive."
They are. They might serve the purpose of making the bosses happy and
feeling good in their stomach. But they will never be able to mimick the
real load on the system.
In the managed mainframe world they can usually predict fairly precisely
what will happen to application A if X happens and what will happen to
app B if Y happens.
No way to do that in our world. Or to be more provocative: If there
really was a systematic way of doing this, I would have thought it would
have been standardized a long time ago.
So lean back, Pete, and tell me what you would test before putting a
mixed online/batch environment into production? How the Hell are you
going to emulate an ad-hoc environment without "just" doing the "Yes!
We've done this, we've done that" routine in the benchmark?
I'm a bit rough on you right now, and that's not what I meant. You're a
rather cool man who knows his stuff.
Mogens
Pete Sharman wrote:


Well, he did say "have to field user complaints for weeks after each move, despite testing." That immediately implies there hasn't been sufficient testing to me. :)


Pete


"Controlling developers is like herding cats."
Kevin Loney, Oracle DBA Handbook

"Oh no, it's not.  It's much harder than that!"
Bruce Pihlamae, long-term Oracle DBA

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Mogens Nørgaard
Sent: Saturday, 20 March 2004 6:01 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: Hardware / OS recommendation


Of what?

Pete Sharman wrote:



More testing? :)


Pete


"Controlling developers is like herding cats."
Kevin Loney, Oracle DBA Handbook

"Oh no, it's not.  It's much harder than that!"
Bruce Pihlamae, long-term Oracle DBA

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of John Flack
Sent: Saturday, 20 March 2004 12:54 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Hardware / OS recommendation


We are currently running 8.1.7 databases on a 4 year old Dell Pentium
machine, under SCO UnixWare.  This is the last supported version of
Oracle under UnixWare, and since the hardware is getting old in internet
years, we're thinking of getting new hardware running a supported OS for
Oracle 9i R2 or 10g.  I'm the official DBA, but my system administrator
has been wearing an Asst. DBA hat doing much of the day to day work.

The SA wants to get a low-end Sun SPARC machine running Solaris, since
the price of these has come down to around the same price as the sort of
high end Intel or AMD machine that we would normally use as a server.  I
would normally vote for the Intel/AMD solution running Red Hat or SUSE
Linux, since we already run several of those.  And maybe there are some
low-end machines from HP or IBM (or someone else) that we should
consider.

One thing I'd definitely like is an OS that Oracle will support for a
long time.  We started on old SCO Unix, moved to SCO Openserver when
Oracle stopped supporting it, moved to UnixWare when Oracle stopped
supporting Openserver, and now have to move again.  Oracle is Oracle,
and we've never had much of a problem with the database stuff - an
export and an import, and we've been good to go.  But the shell scripts,
COBOL and C programs have required tweaking every time we moved.
Nothing major, but just enough to have to field user complaints for
weeks after each move, despite testing.

Suggestions, anyone?
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html ----------------------------------------------------------------- ---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html -----------------------------------------------------------------

---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html -----------------------------------------------------------------

Regards, Carel-Jan


===
If you think education is expensive, try ignorance. (Derek Bok)
===

Other related posts: