RE: Oracle Performance on Sunfire T2000

Hi Martin,
 
It's about how long latches are held. On a T2 latches are held longer as it
takes more cpu cycles to complete the work which needs to be done under
protection of a latch. This is so even if there is enough CPU capacity
available in the system (no waiting for CPU needed).
 
Yes, the CMT processors scale better, in other words the RELATIVE
performance drops less if you go from 1 to 128 parallel threads. But this is
RELATIVE performance, not the real performance (the number of instructions
executed or the number of business transactions completed).
 
You may need few hundred parallel threads to get X transactions per minute
on a T2, but you may only need 16 parallel threads on an Opteron/Xeon.
That's why the CMT's aren't advertised as performance monsters, but as
giving you scalability (rather than raw performance) and good power/cooling
footprint...
 
Btw you can measure roughly how long latches are held using my LatchProfX
script (written in plain SQL :) or with DTrace by tracing
pid$target:oracle:kslgetl:entry and return probes.
 
One more factor is the cache coherency architecture and whether all your
threads are running on a single CPU core (or chip) and whether the L2/L3
cache is per core or for all cores in a socket. If the cache line which
holds the latch structure is currently owned/cached by a different CPU
(different socket) then the latch getter needs to snoop the other CPU cache
to see what's the latch value right now. At some architectures the snooping
is done by sending a request to memory controller which goes through memory
bus at memory bus base clock rate (which is slower than cpu clock), but in
some (like AMD Opteron) the snooping is done at HyperTransport clock rate
which is faster.
 
 
Tanel.
 


  _____  

From: Martin Berger [mailto:martin.a.berger@xxxxxxxxx] 
Sent: 13 February 2009 13:34
To: tanel@xxxxxxxxxx
Cc: mzito@xxxxxxxxxxx; jifjif@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: Re: Oracle Performance on Sunfire T2000


it's just about the runqueue (I guess). If the runqueue in your 4 fast CPUs
is 'long', you will be happy any of the 'slow' 128 Threads process the task
and release the latch. 
Of course, if you do not utilize 4 CPUs to the limits, you will not need 128
Threads at all.

But still I'm just telling in pure theory, in Summer I will have my new T2+s
and have to prove it. Until then, it's pure theory.

br
 Martin 

--
Martin Berger    http://berxblog.blogspot.com





There's one more catch with slow single thread execution with high
parallelism in Oracle. If you migrate from 4 fast CPUs to 128 slow threads,
you will have much heavier latch contention on busy latches. Doing whatever
work under protection of a latch will probably take longer, thus the latch
is held for longer. And instead of 3-4 concurrent threads trying to get the
latch at the same time you'll potentially have few hundred ones....
 
Glenn Fawcett has quite a few useful blog entries about Oracle performance
on Sun CMT processors http://blogs.sun.com/glennf/tags/throughput
 
Tanel.
 



  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Matthew Zito
Sent: 12 February 2009 19:02
To: jifjif@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: Oracle Performance on Sunfire T2000





We have a couple of t1000s, and while our workload is a little odd (we're an
automation company, so all our several hundred databases do is get
installed, patched, upgraded, uninstalled, etc.), anything involving data
dictionary activities (running catupgd.sql, etc. - high-cpu single threaded
activities) is slower on the t1000s than our ancient v210s.

Supposedly the t1000/2000 are perfect for J2EE apps - lots of threads, not a
lot of heavy-lifting, parallelization of execution is the most critical
piece.

Thanks,
Matt

--
Matthew Zito
Chief Scientist
GridApp Systems
P: 646-452-4090
mzito@xxxxxxxxxxx
http://www.gridapp.com




Other related posts: