Re: Temporary space needed to create a constraint

  • From: Peter Hitchman <pjhoraclel@xxxxxxxxx>
  • To: oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 12 Jul 2012 14:10:42 +0100

Hi,
When I cam in this morning , I enabled the constraint novalidate and
then ran this:

SELECT DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE,
ORGANIZATIONpERMiD,WCAID
FROM OL_PATENT_ORGANIZATION
GROUP BY DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID,
NAME_TYPE,ORGANIZATIONpERMiD,WCAID
HAVING COUNT(*) >1

It took it 1.5 hours to find one duplicate row, using parallel slaves
to do a FTS. OK not quick but a lot better than what was happening
before.

With a plan like this (numbers from a test system)


| Id  | Operation                              | Name
               | Rows  | Bytes | Cost (%CPU)| Time     |    TQ
|IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |
         |  1098 | 65880 |    32   (4)| 00:00:01 |             |
|                      |
|   1 |  PX COORDINATOR              |
           |          |           |              |               |
        |      |                      |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001
 |  1098 | 65880 |    32   (4)| 00:00:01  |  Q1,01 | P->S | QC (RAND)
|
|*  3 |    FILTER                             |
                    |           |          |              |
    |  Q1,01 | PCWC |            |
|   4 |     HASH GROUP BY            |
           |  1098 | 65880 |    32   (4)| 00:00:01  |  Q1,01 | PCWP |
          |
|   5 |      PX RECEIVE                   |
                | 21948 |  1286K|    31   (0)| 00:00:01 |  Q1,01 |
PCWP |            |
|   6 |       PX SEND HASH             | :TQ10000
       | 21948 |  1286K|    31   (0)| 00:00:01 |  Q1,00 | P->P | HASH
     |
|   7 |        PX BLOCK ITERATOR    |
          | 21948 |  1286K|    31   (0)| 00:00:01 |  Q1,00 | PCWC |
        |
|   8 |         TABLE ACCESS FULL  | OL_PATENT_ORGANIZATION | 21948 |
1286K|    31   (0)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------

A query like this:

SELECT /*+ INDEX(A OL_PATENT_ORGANIZATION_PK) */ DOCUMENT_ID,
PATENT_AP_PERMID, LOGICAL_UNIT, REFID, NAME_TYPE,
ORGANIZATIONpERMiD,WCAID
FROM OL_PATENT_ORGANIZATION A
GROUP BY DOCUMENT_ID, PATENT_AP_PERMID, LOGICAL_UNIT, REFID,
NAME_TYPE,ORGANIZATIONpERMiD,WCAID
HAVING COUNT(1) >1

produces this explain plan:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name
                        | Rows        | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |
                 |    85M       |  5807M|    19M  (1)   | 32:31:45 |
|*  1 |  FILTER                                |
                             |                 |           |
        |              |
|   2 |   SORT GROUP BY NOSORT|
            |    85M       |  5807M|    19M  (1)   | 32:31:45 |
|   3 |    INDEX FULL SCAN             | OL_PATENT_ORGANIZATION_PK |
1715M      |   113G |    19M  (1)   | 32:31:45 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

and on this system it looks like using the parallel slaves is quicker
than scanning the index, because this query is going to take about 2.5
hours to complete.

Thanks for your help.

Regards
Pete

On Thu, Jul 12, 2012 at 10:51 AM, Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx> wrote:
> Peter,
>
> If  "half as much" is a reasonably accurate measure then it makes sense.
> I'm going to do a blog about this (using a small table of my own) to point
> out a few idiocies, but the arithmetic is as follows:
>
> The major code is:
>     hash join
>         fast full scan of index (to build hash table)
>         aggregate of fast full scan (to probe hash table)
>
> The hash table probably spills to disc because it has to hold ALL the data
> from the index before the next step can happen
> The aggregate of the index probably spills to disc because (using a fast
> full scan) it has to spill to disc before it completes.
>
> The "ordered" hint in the SQL seems particularly silly, as we could assume
> that we expect virtually no duplicates (relatively) so want to aggregate
> first and then nested loop to find the few matches.  If we aggregate first
> we could choose to do a serial index full scan and aggregate without
> sorting or hashing - the cost is then the cost of serially reading the
> index (which wouldn't be quick, but could be a better bet than all the I/O
> you're currently doing).
>
> Regards
>
> Jonathan Lewis
> http://jonathanlewis.wordpress.com/all_postings
>
> Author: Oracle Core (Apress 2011)
> http://www.apress.com/9781430239543
>
--
//www.freelists.org/webpage/oracle-l


Other related posts: