Re: Rough order of magnitude for rows/second
- From: "Mark J. Bobak" <mark@xxxxxxxxx>
- To: sol.beach@xxxxxxxxx
- Date: Fri, 31 Dec 2004 00:37:42 -0500
Sol,
It sounds like you have been saddled with a nasty (really, unanswerable)
requirement
by some damagement, er, management type.
I will say that I've heard Jonathan Lewis quote a (VERY ROUGH) number of
APPROXIMATELY 10,000 buffer gets/second/100MHz of CPU. This of course
assumes a few things:
- Everything is cached. As soon as you start reading from disk, it's
all over.
- The buffer gets are all that's happening. I had a case, which I
approached
Jonathan with a few years ago, where I was trying to apply this rule.
It really
didn't work for me at all. I had everything cached, was reporting
relatively few
consistent gets, and it was taking a (relatively) long time. A closer
look at "buffer
is pinned" count showed that Oracle was doing a huge number of this
get. So, I
wasn't seeing the entire picture.
- Consider also, depending on other activity on the system, a consistent
get may
be more or less costly. Specifically, look at stats like:
- no work consistent gets
- consistent gets, cleanout only
- consistent gets, rollback only
- consistent gets, cleanouts and rollbacks
If you've got lots of concurrent DML on the same tables you're querying,
you may be doing lots of cleanouts and/or rollbacks.
- Consider overall CPU load. If the CPU is pegged, or, not even
pegged, but past the
"knee of the hockeystick" (if you read Cary's book, you know what I
mean) you may
get inconsistent results.
- Different CPU types and OSes will vary. It's my experience that
Solaris on Sparc
does much better than Windows on x86. Linux on x86 seems to do better
than
Windows, but not as well as Solaris. Your mileage may vary.
Finally, if you can get past all that and make the rule work for you,
guess what?
That only estimates buffer gets/sec, NOT rows/sec. Why? Well, how do
buffer
gets relate to rows? Well, it varies GREATLY, depending on access
path. Are you
selecting by primary key from a hash cluster? You may only expend 1
buffer get/row,
ideally. What about index driven access by primary key? Depends on the
blevel of
the index. Also, are the selected column(s) contained within the
index? If not, that
table access by rowid is another buffer get. Now, range scans, wow!
They can
really vary depending on the type of query and the suitability of the
index. Full table
scans, again can vary wildly in their efficiency, depending on how many
blocks are
below the HWM and how many rows are in the table. Also, are you doing
array
fetches? How large is the array? If it's only 10 elements, and you're
fetching
100,000 rows, you'll incur significant extra buffer gets there.
There's probably tons of other factors too, but I've listed some of the
most common
ones. At best, this will serve as a strong argument to your damager
that coming up
with rows/sec is simply not a practical exercise. Selling buffer
gets/sec to a non-
technical manager is probably tough, cause they're not *real*. They are
not the
end product that he sees. If you're really stuck, and have no choice,
you may first
try the 10k buffer gets/sec rule and see how well that works for you.
(Don't be
surprised if your results vary significantly from the 10k, as long as
they are repeatable
and consistent.) Once you've got a handle on that, look at your
application. Look
at the most common queries, and analyze them for frequency of execution,
rows
returned and buffer gets. You *may* be able to back into a rough number
of
rows/sec, from there. Maybe....
Hope that helps,
-Mark
On Thu, 2004-12-30 at 21:40, sol beach wrote:
> Yes, I realize "it depends" on many factors.
> Assume sufficient CPU, RAM & I/O exist; a lightly loaded OLTP system
> For the query below what is the rough order of magnitude for
> rows/second being read
> 1
> 10
> 100
> 1000
> 10000
> 100000
> 1000000
> 10000000
>
> SELECT d.product_id id,
>
> d.name data,
>
> NULL url,
>
> COUNT(DISTINCT li.customer_id) customers,
>
> COUNT(DISTINCT li.order_id) orders,
>
> SUM(li.price) revenue,
>
> SUM(li.shipping) shipping,
>
> SUM(li.qty) products,
>
> DECODE(COUNT(DISTINCT li.order_id), 0, 0,
> SUM(li.price)/COUNT(DISTINCT li.order_id)) avg_selling_price,
>
> SUM(li.discount) discount
>
> FROM DW_2000000010525.line_item li,
>
> DW_2000000010525.product d
>
> WHERE li.date_created >= TO_DATE('2004-12-23','YYYY-MM-DD')
>
> AND li.date_created < TO_DATE('2004-12-24','YYYY-MM-DD')
>
> AND li.product_id = d.product_id
>
> AND li.prod_category_id = 3611326867412
>
> GROUP BY d.name,
>
> d.product_id,
>
> NULL
>
> ORDER BY ORDERS DESC,
>
> d.product_id
> --
> http://www.freelists.org/webpage/oracle-l
--
Mark J. Bobak
mark@xxxxxxxxx
"Science is the belief in the ignorance of experts." --Richard P.
Feynman
--
http://www.freelists.org/webpage/oracle-l
- References:
- Rough order of magnitude for rows/second
- From: sol beach
Other related posts:
- » Rough order of magnitude for rows/second
- » RE: Rough order of magnitude for rows/second
- » RE: Rough order of magnitude for rows/second
- » Re: Rough order of magnitude for rows/second
- » RE: Rough order of magnitude for rows/second
- » RE: Rough order of magnitude for rows/second
- » RE: Rough order of magnitude for rows/second
- Rough order of magnitude for rows/second
- From: sol beach