RE: i know cary millsap is super smart but...

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <Josh.Collier@xxxxxxxxxxxx>, "'Cary Millsap'" <cary.millsap@xxxxxxxxxxxx>, <ahmusch@xxxxxxxxx>
  • Date: Tue, 27 Aug 2013 20:37:02 -0400

The underlying principal remains the same: You can run more than strictly
the number of cpus in parallel without slowing things down by waiting for
cpu to the extent the jobs have to intermittently do other things than
computations on the cpu.

Cary's observation at the time was that i/o operations for batch (meaning
mostly jobs that do not have to wait for user input or think time) took up
about half the real elapsed time of most jobs when run against no
competition. So you could indeed bump the number of running batch jobs up to
about 2 times the number of available cpus without causing any new cpu wait.

All things being equal and cpus being the most expensive element of the
systems of the time, elimination of cpu slack time without increasing cpu
waits for any other jobs over time lets you hit something near the highest
maximum throughput AND efficiency for a job set.

If, instead, you more than marginally exceed this threshold, you start to
rack up waste and inefficiencies of context switches and possibly shuttling
more program data on and off chip cache.

Remember to leave available CPU if interactive users will be intruding on
the batch window.
Remember this is very different from trying to apply all resources of a
machine (or machines) to get the answer to a single question as quickly as
possible. (The original design goal of Oracle Parallel Query.) When that is
the goal, various resources probably will have slack time when you achieve
the fastest solution (but you don't care.)

While the underlying principal remains the same, the fraction of time
waiting for i/o operations to complete on either a 100% memory system or a
well pipelined SSD (especially non-flash) is likely a lot less than yanking
the data from spinning rust. So that mitigates toward lowering the magic
number of 2. Depending on whether you're counting cpus or cores and how the
threading works for a given combination of hardware and software, you may
observe that your job mix stalls on cpu significantly more if you're
counting cores and use 2.

2 is still a doggone good starting point.

And you're right: Cary is super smart. More than that Cary is a methodical
scientist. (He's also a good friend and an outstanding parent, but that is
another story.)

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Josh Collier
Sent: Tuesday, August 27, 2013 5:55 PM
To: Cary Millsap; ahmusch@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx; lomasky@xxxxxxxxxxxxxxxxxxx
Subject: RE: i know cary millsap is super smart but...

I would like to check for understanding:
This paper appears to apply very specifically to Oracle Applications
Concurrent Manager. It states that one should not run more than 2x*number of
CPU batch jobs on an Oracle Applications system (E-business Suite).

The ideas in this paper do not extend to ETL batch jobs and their parallel
processes? For example, on a 32 core system, would I be limited to never
using more than 64 parallel processes simultaneously?

Thanks for your time,

Josh C.

From: Cary Millsap [mailto:cary.millsap@xxxxxxxxxxxx]
Sent: Friday, June 14, 2013 6:04 PM
To: ahmusch@xxxxxxxxx
Cc: Josh Collier; oracle-l@xxxxxxxxxxxxx
Subject: Re: i know cary millsap is super smart but...

Josh and Adam,

I was just discussing that this week with a client. I've asked the same
question, and I just haven't done the tests yet.

My expectation would be that for a two-quad-core system, the number of
"effective CPUs" (let's call it) would be something less than 2 x 4 = 8 but
more than just 2. Probably 6-ish, I would expect. ...Meaning that on a 2x
quad-core system, you could apply the idea behind the paper as if the actual
number of CPUs were something like 6.

I'd love to learn what you find out if you test it.


Cary Millsap
Method R Corporation

On Fri, Jun 14, 2013 at 2:37 PM, Adam Musch
<ahmusch@xxxxxxxxx<mailto:ahmusch@xxxxxxxxx>> wrote:
I would think so, from a certain point of view.  Each core is reported to
the OS as a CPU, and that's what you should use at it pertains to the rule
of 2.  So if you have two cpus each with four course, your rule of 2 number
would be 8.
The underlying mathematics of queuing theory still remain the same.


On Fri, Jun 14, 2013 at 2:08 PM, Josh Collier
<Josh.Collier@xxxxxxxxxxxx<mailto:Josh.Collier@xxxxxxxxxxxx>>wrote:

> This paper is 13 years old, is it still valid in the era of quad core 
> processors?
> Batch Queue Management and the Magic of '2'
> Cary Millsap/Hotsos Enterprises, Ltd.
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>

--
Adam Musch
ahmusch@xxxxxxxxx<mailto:ahmusch@xxxxxxxxx>


--
//www.freelists.org/webpage/oracle-l



--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: