RE: new sql tuning book - Author's response

  • From: Dan Tow <dantow@xxxxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 9 Feb 2004 23:02:58 -0800

Hi, this is Dan Tow, the author of SQL Tuning, wading in with a few remarks:

One of the list members (no, not my editor) let me know of this thread, so I 
thought I might clear up a few things with a response. There's been some 
speculation regarding the origin of the method I describe in the book. I came 
up with it on my own about ten years ago, without a literature search, but 
later learned that there was some fairly similar stuff out there in the 
academic literature that long predated my entry into the field, so I make no 
particular claim to the method being hugely original. However, I believe that 
the book is at least original as a *non-academic*, readable, detailed 
description of the approach.

I'd also like to add a few remarks regarding Mike Thomas's well-thought-out 
review of the book. I'd like to begin by thanking Mike for his review. Where he 
liked the book, I heartily agree;-), and where he listed minuses, I find his 
remarks well-considered. Even where I mildly disagree, he points up choices I 
made that I could have explained better in the book. Since late is better than 
never, I'm taking this opportunity to explain my choices, here, since each of 
the concerns Mike expressed maps to a conscious choice I made in writing the 
book. My responses follow his remarks, below:

Thanks,

Dan Tow
dantow@xxxxxxxxxxxxxx
650-858-1557
www.singingsql.com

> --------------------------------------------------------------------
> 
> I read it cover-to-cover, and its great. It covers
> many database types, not just Oracle.
> 
> Replaces 'trial-and-error' approach with ... well lets
> call it a trial (e.g. hard) ... a manual approach to

Yes, it *is* hard at first. I encourage my readers to keep at it, though. Most 
new tools just slow you down, until you master them, and the tools this book 
teaches are no exception. I learned pottery long ago, and the potter's wheel 
just got in the way, at first, but few potters would do without one!

> tuning SQL. The method it teaches is a very manual
> procedure for analyzing and tuning SQL. Its nothing
> like any other book I've read.
> 
> I'm undecided how to recommend it. So, here are some
> plus and minus points in my opinion:
> 
> PLUS:
> ----------
> 1) Cool tricks I've never seen before to drive SQL
> paths. I'm talking 'way-beyond' concat null or +0.
> 
> 2) Manual techniques to analyze and derive an
> 'optimal' and 'robust' SQL access plan. After you
> learn these techniques you should know if CBO is
> helping or not.
> 
> 3) Diagramming techniques for SQL that are wonderful.
> And, mathematical techniques which go deep into
> selecting a robust SQL plan.
> 
> 4) Discussion supports almost any SQL database, and
> especially good with general purpose databases.
> 
> 5) Good descriptions that eliminate misunderstanding
> of SQL syntax and side-effects on tuning SQL.
> 
> 
> MINUS:
> ----------
> 1) Does not emphasize how to work with Oracle's CBO.
> Following the 80/20 rule, if you accept CBO and
> statistics as good on 80% of SQL, then what do you do
> with the 20% problem cases?
> Options might include a) figure out how to tell/hint
> the CBO to succeed, b) try to manually force the SQL
> plan, or c) try something else e.g. partitions,
> parallel, analytical fns, stored outlines, 10g new
> stuff, etc.. Hmmmm?

Having friends on Oracle's CBO team, I'd have to say that *more* than 80% of 
all SQL is fine without manual tuning. However, it is astonishing how much of  
the *problem* SQL can be improved, and how many times faster it can usually be 
made. Problem SQL is *almost invariably* long-running because it has something 
wrong with it. If you use the right methodologies to find the SQL to tune, you 
will almost never find that you just have to live with how it already performs. 
(However, you will sometimes require outside-the-box solutions such as I 
describe in Chapter 10.) Given all that, I find that the approach to take is to 

0) Find the right SQL to tune. If you do this well, you won't need to tune 
much, and quality-of-result is more important than how long it takes to find 
the result.

1) Decide, using the method in the book, what the right execution plan is, 
without paying any attention to what the CBO is already doing. (Thinking about 
the current plan just prejudices you to repeat its mistake. This is similar to 
why it is hard to debug your own code, because you already know what it is 
*supposed* to do, which just gets in the way of seeing what it *is* doing.) 
Where even the best plan is too slow, figure out how to modify the application 
using the outside-the-box approaches of Chapter 10. 

2) Modify the SQL, or the application, as needed. Once you have step one (which 
is the subject of 80% of the book) behind you, this is straightforward, and 
covered reasonably well, I think, for all three databases, in the 20% of the 
book that is left to this step. There are certainly details, as Mike describes, 
such as stored outlines, that I considered outside the scope of the book, but I 
think these details are never the *only* way to solve the generally easy step 2.

> 
> 2) Heavy focus on nested loops with few and
> complicated exceptions for anything else. I think
> (guess) if you are doing a data warehouse the
> techniques might not help.

It is true that I have spent little of my career on data-warehouse 
applications, but almost all the applications I've worked on had major long-
running batch components, which call for the same tuning techniques that apply 
to data warehouses. I could have explained my emphasis on nested loops better, 
in retrospect:

Even for the biggest, ugliest data-warehouse query, you should *first* find the 
best (or very nearly best) possible all-nested-loops plan. It turns out to 
require 2 long chapters to thoroughly describe how to do that, not even 
counting the chapter that describes how to even define the problem, 
mathematically, in the form of a query diagram. *After* you've found the best 
nested-loops plan, you will very often find that it is so good that it is not 
worth the bother to tune it further! Even where you ought to tune it further, 
you need to have the join order and running rowcounts of that best nested-loops 
plan to perform the relatively simple calculations I describe at the end of 
Chapter 6 to correctly choose where you need hash joins. 

If, instead, you try to decide on join methods before you know the best nested-
loops join order, you will frequently choose hash joins to tables being reached 
at the wrong point (with far too many joining rows), where a nested-loops join 
at the *right* point in the join order would be cheaper. Much of the impression 
of the usefulness of hash joins, I believe, comes about where a *bad* join 
order was made much-less-bad with one or more hash joins, where a *better* 
solution would have been a corrected join order. Correct join orders rarely 
benefit *much* from hash joins, except where poorly designed applications read 
more rows than they should. (Hash joins to small tables are frequently a 
*little* better, and I don't suggest overriding the CBO's join-method choice in 
these cases, but the improvement is most often very minor when the join order 
is correct.)

> 
> 3) Does not discuss Oracle's nor the others (MS SQL
> Server and DB2/2) analytical functions. Personally,
> we've used these to improve our report SQL a lot.

Generally, the analytical functions still require that you read the raw rows - 
I focus on getting to those rows fast - if you do that, the rest is generally 
trouble-free.

> 
> 4) Technique based on row counts, and I'm told it
> would be better with block counts. I have not tested
> yet, but I'll let you know after OP101.

The technique *really* emphasizes row counts to the *largest* tables. (The 
heuristics are set up to reach these tables with as few rows as possible, even 
if that means reading more rows in the smaller tables. Roughly, this in turn is 
a good way to minimize physical I/Os and runtime, since you'll generally reach 
one big-table block per big-table row, and the big tables are going to be by 
far the most-poorly-cached objects in the queries (including the better-cached, 
more-compact big-table indexes that you use to reach the big tables).) That 
said, the technique *also* works to do quite a good job of minimizing rowcounts 
to all the tables, which in turn does quite a good job of minimizing block 
counts. I've always been acutely aware of the true goals and the true cost 
function we need to minimize, but it is astonishing, really, how rarely you 
need to adjust the answer you get by simply minimizing rowcounts, with an 
emphasis on the largest-table rowcounts. I probably could have added another 50 
pages or so of complexity to handle the extremely rare cases where the 
distinction matters enough to bother, but, remember, you (and everyone else) 
*already* find the method hard, especially at first - I'd rather get the main 
points across well, and leave it to the top-top experts (and the automated 
optimizers) to hyper-optimize the very rare corner cases.
> 
> The reason I think the book is great is because I want
> to know how close I'm getting on 'problem' SQL, and
> when I need to try something else. Some of Tow's
> techniques are not discussed anywhere else. The
> diagramming and mathematical techniques are gold. The
> analysis and tuning methods work on MySQL databases.
> 
> In light of recent thread discussions, Tow's book is
> not an arsenal of weapons to fight the Oracle SQL
> tuning war. Rather, its more like GPS and a sniper
> rifle (please don't put this on your cube wall).
> 
> Regards,
> 
> Mike Thomas
Dan Tow
dantow@xxxxxxxxxxxxxx
650-858-1557
www.singingsql.com
We make SQL sing!




----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: