Re: cache buffer chains/where in code

  • From: Christo Kutrovsky <kutrovsky.oracle@xxxxxxxxx>
  • To: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • Date: Sat, 28 Nov 2009 03:13:18 -0500

Greg,

It's a single UltraSparc T2 CPU, which is 8 cores, 8 threads. Note that each
core has 2 integer pipelines. So you could assume 16 CPUs and 64 threads.

There are many things that are wrong with this setup, and reducing the
number of connections is something I am considering. However it's not that
simple. Imagine that instead of CPU those were doing IO. You want to have a
relatively deep IO queue to allow the raid array to deliver.

One thing that puzzles me is given that the suspicion is deep cpu run queue
is problems, why only one very specific latch is causing the problem. There
are several different types of queries running at the same time, why only
one specific query is causing latch contention, why not the other ones.

On Fri, Nov 27, 2009 at 11:37 PM, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx> wrote:

> 400 sessions seems very excessive for this hardware (how many and what
> model are the CPUs?, what does cpu_count show if defaulted).  I've
> seen numerous systems that run significantly better when they reduce
> the number of connections/sessions significantly.  Most think that
> more == better, and that is usually not the case.  Generally I refer
> to this scenario as being "over processed".
>
> I'd be interested to know if the issue still appears with a reduced
> number of sessions.  I'd suggest to experiment what is the minimal
> number of sessions required to keep the response times acceptable and
> how that impacts the CPU usage and run queue.  As a starting point I'd
> use 1 session per CPU core (thread in the case of the CMT processors).
>
> On Fri, Nov 27, 2009 at 11:18 AM, Christo Kutrovsky
> <kutrovsky.oracle@xxxxxxxxx> wrote:
> > I've analyzed ASH data for problem period, usually there's 10-20 sesions
> for
> > each sample. When this happens, there's near 400 sesions, with 250 of
> them
> > waiting on the same latch/latch address, and 170 "ON CPU".
> >
> > So that drives me towards Greg's suggestion that it could be a deep CPU
> > run-queue issue. This can be comfired with your suggestions of capturing
> > vmstat/prstat information.
> >
> > I wonder what is the correct approach here to prevent deep CPU run-queues
> > from causing latch contention, considering UltraSparc T2 CMT cpus. Reduce
> > the number of sessions? Implement resource manager?
>
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
>



-- 
Christo Kutrovsky
Senior Consultant
Pythian.com
I blog at http://www.pythian.com/blogs/

Other related posts: