With a helpful nod towards Wolfgang Breitling's and Jonathan Lewis's works, I am trying to analyze a particular scenario where a "select distinct" is being outfitted with a Full Table Scan (FTS) and takes much longer than its "select /* no distinct */" cousin. I understand that conceptually the statement with a distinct will have to group and sort the resulting rows, but I am having a hard time understanding relevant parts of the 10053 trace. One specific aspect of the trace file has me boggled, but let me start with a little background from the trace file.
First, the "First K Rows: Setup begin/end" section. Here are the last few lines from each query: WITH DISTINCT GROUP BY cardinality: 403084.00, TABLE cardinality: 403084.00 SORT resource Sort statistics Sort width: 238 Area size: 208896 Max Area size: 41943040 Degree: 1 Blocks to Sort: 2567 Row size: 52 Total Rows: 403084 Initial runs: 2 Merge passes: 1 IO Cost / pass: 1424 Total IO sort cost: 3991 Total CPU sort cost: 402383422 Total Temp space used: 38872000 Final - All Rows Plan: Best join order: 1 Cost: 341814.3073 Degree: 1 Card: 403084.0000 Bytes: 15317192 Resc: 341814.3073 Resc_io: 274398.0000 Resc_cpu: 67534690293 Resp: 341814.3073 Resp_io: 274398.0000 Resc_cpu: 67534690293
WITHOUT DISTINCT Final - All Rows Plan: Best join order: 1 Cost: 337421.6292 Degree: 1 Card: 403084.0000 Bytes: 15317192 Resc: 337421.6292 Resc_io: 270407.0000 Resc_cpu: 67132306871 Resp: 337421.6292 Resp_io: 270407.0000 Resc_cpu: 67132306871
So far, so good - it makes sense that it takes a little extra effort to sort and group the DISTINCT. But in the rest of the trace for the DISTINCT query, there are absolutely no calculations for "First K Rows", whereas the nondistinct query has a cost comparison for every conceivable "SINGLE TABLE ACCESS PATH (First K Rows)". Why is this? In the join order that was eventually chosen (happened to be Join Order 10 in both cases), the trace for both queries are exactly the same, up until the nondistinct query starts to work on the "First K Rows" part. Well, actually, now that I look at it a second time, the distinct trace has 2 extra lines, each a "Grouping column Cardinality" cost, which I am having a hard time finding relevance for (the raw number is not used anywhere else). Perhaps that is key to understanding and unraveling this little mystery, but I am still confused as to how.
This is more a curiosity on my own part than anything else. I am more than happy to forward the trace files to anyone who wants to look at them, but I am not expecting anyone to "solve" the problem. I am just trying to learn the concepts.
PS - I have read "Under the Hood of the 10053 trace" and JL's "Cost-Based Oracle Fundamentals" - if you know of a particular section that would help me understand, please mention a page number.
-- Charles Schultz