Re: RAMSAN Experience

From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
To: Rob Dempsey <Rob.Dempsey@xxxxxxxxxx>
Date: Fri, 11 Sep 2009 17:58:38 -0700
Given this host has 2 x quad core CPUs, the default DOP would be 16.
Parallel "default" (2 * cpu_count) is generally able to leverage the
CPUs to their fullest with one query, given modest filtering and
joins.  I would comment that 30GB for your db_cache is probably a
waste with PX.   The "Buffer Pool Advisory" section of your AWR report
would likely confirm this. I would generally say to add more to your
pga_aggregate_target but I dont think that you can leverage much more
than what you have with only 16 or 32 PX servers.  Thus my comment
would be that this host is not well balanced for a PX workload; it has
too much memory and not enough CPU (usually 4GB per cpu core is good).
 As a comparison point, the nodes for the DB Machine are G5 DL360s
with 2 x quad core CPUs with 32GB RAM and we generally use
pga_aggregate_target=16GB and sga_target=8GB.

On the storage side 2 x 4gbps ports will do 2 x ~400MB/s = 800MB/s
max.  Can your storage deliver this to the host?  You can use Oracle
Orion or try a simple count(*) at the default DOP and use
vmstat/iostat or similar to validate this.  It is very important to
know what the physical capabilities of your hardware are.

I hate to sound like a school master, but messing with underscore
parameters is really a waste of time, even under constraints.  Root
cause analysis is really required.

I would list your triage tasks as such:
1) Validate the execution plan for your query or queries. Are the
cardinality estimates accurate and is the join order optimal?  Trouble
shooting excess temp writes for suboptimal execution plans due to
non-representative stats is simply a waste of time when you may be
able to eliminate them all together with an optimal execution plan.
2) Understand how much I/O scan capacity your system has.  Do some
micro benchmarks with Orion or a count(*) FTS.  This will give you a
sense of how fast your table scans should take and how many of them
can run simultaneously before you exhaust your I/O bandwidth.
3) Understand the "tipping point" for when your HJ spills to temp for
a given DOP, given an optimal plan.  Is it 1M rows?  10M rows? 100M
rows?,etc.  At each of those, how much PGA memory is being used?   I
would suspect you can fit quite a number of keys for a HJ in 8GB or
16GB of RAM.

A systematic approach will put you light years ahead of mucking with
parameters, especially hidden/underscore parameters.


On Fri, Sep 11, 2009 at 8:50 AM, Rob Dempsey <Rob.Dempsey@xxxxxxxxxx> wrote:
> Hi
>
> Thanks for all the feedback from everyone I will try and answer all the 
> points in this mail.
>
>        Oracle Version 10.2.0.1 going to 10.2.0.4
>        db_cache_size=30720M
>        pga_aggregate_target=28672M
>
>        DL580G5
>        2 * Intel® Xeon® E7330 processor (2.4 GHz, 2x3M cache, 80 Watts)
>        1 card, dual port 4gbit connection
>        64G
>
> Concurrent user 5-10 at the moment
>
> We use parallel query on the table with a value of 8 /*+ parallel(t,8) */. I 
> realise that parallel query will make use of 'direct path reads' to read the 
> data into the process's PGA area however I have found that if we set the 
> small table threshold underscore parameter to a high value the query will use 
> the data in the cache instead. I am making an assumption that this parameter 
> may work on a segment level and yes I do realise that there will be a high 
> usage of latches (I have already used Tom keys run_stats package)
>
> Regarding the queries, the 26G table is roughly around 900 million rows a 
> number of which will self-join the table so that the query maybe be a 100 
> million to 100 million join.  This is why certain pga steps are spilling to 
> TEMP.  I know about tuning using underscore parameters is not the best course 
> of action however the reason I used the parallel clause in the first place 
> was to open more pga memory to the SQL queries.  However an 'option' was to 
> look to add more to try and push the 'global memory bound' beyond 1G (a whole 
> research project in itself)
>
> From my original email I did say we had resource problems and constraints. 
>  Indeed I am an Oracle Developer / DBA.  Storage is not my core strength and 
> we do not have much experience in the department.  This is the root problem 
> but you have to work with what you got.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l
References:
- RAMSAN Experience
  - From: Rob Dempsey
- Re: RAMSAN Experience
  - From: Greg Rahn
- RE: RAMSAN Experience
  - From: Rob Dempsey
- Re: RAMSAN Experience
  - From: Greg Rahn
- RE: RAMSAN Experience
  - From: Rob Dempsey
Re: RAMSAN Experience

Other related posts: