Re: cache buffer chains/where in code

  • From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • To: Christo Kutrovsky <kutrovsky.oracle@xxxxxxxxx>
  • Date: Sat, 28 Nov 2009 09:05:46 -0800

Given that config, I'd say that system is has at over 4X the amount of
db connections it probably should (and needs to work well) - I'd back
it down to 64 as a staring point and make sure the connection pool
does not grow.  Set the initial and max connections to be the same
number.  One might think that you need more sessions to keep the CPUs
busy (and you may need more than 1 per CPU thread) but the reality is
this:  With a high number of sessions, the queue is longer for
everything.   The chance of getting scheduled when it needs to goes
down and if there is fairly steady and a medium to high load, any
"bip" will cause a massive queue for a resource.  Consider what
happens when calls are taking milliseconds and for a split second,
some session holds a shared resource - it may take the system tens of
minutes to recover from that backlog.  This is why most high
throughput OLTP systems only want to run at a max of 65% (or so) CPU
utilization with very short run queues - so that if there is any slow
down, there is enough resource head room to recover.  Otherwise the
system will likely be in a unrecoverable flat spin at Mach 5.

On Sat, Nov 28, 2009 at 12:13 AM, Christo Kutrovsky
<kutrovsky.oracle@xxxxxxxxx> wrote:
> Greg,
>
> It's a single UltraSparc T2 CPU, which is 8 cores, 8 threads. Note that each
> core has 2 integer pipelines. So you could assume 16 CPUs and 64 threads.
>
> There are many things that are wrong with this setup, and reducing the
> number of connections is something I am considering. However it's not that
> simple. Imagine that instead of CPU those were doing IO. You want to have a
> relatively deep IO queue to allow the raid array to deliver.
>
> One thing that puzzles me is given that the suspicion is deep cpu run queue
> is problems, why only one very specific latch is causing the problem. There
> are several different types of queries running at the same time, why only
> one specific query is causing latch contention, why not the other ones.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l


Other related posts: