Can anybody offer a plausible (even if it is hypothetical) explanation for this behaviour? Or maybe direct me to some resources that will help me better understand *how* or *why* such things can happen?
I mean, really! 50% IDLE + 50% WIO is awfully strange! My Linux skills are maybe a bit "lightweight", but I've been working with UNIX for decades and never seen (or at least never *noticed*) anything remotely like this...
We had a similar situation. But in our case it was caused by a lot (and I mean a *lot*) of indexed io. Random, indexed IO. Which if you have not enough io controllers and disk access paths, can easily thrash the Linux 2.4 kernel IO system.
How did we fix it? Three lines of attack: first: increase the number of disk controllers and access paths to the disks. We had everything going through one fc controller, two fc controllers made a world of difference. second: io elevators (elvtune) to very low values - typically 24. third and best solution: run kernel 2.6 with the new io scheduler. The wait io did not disappear - nor did we expect it to - but it certainly went dramatically down.
Another few of things to watch out for: use aio if at all possible. And when doing iostat, ALWAYS do it in a cycle - never a once of - with -d and -x <device> for each device you want and logged on as root or equivalent. Anything else, I've found can subvert the results and give you a false indication of what is really hapenning with your IO.
Again: our experience, YMMV, etcetc.
-- Cheers Nuno Souto in sunny Sydney, Australia dbvision@xxxxxxxxxxxx -- http://www.freelists.org/webpage/oracle-l