Re: row cache lock contention parallel insert

  • From: LS Cheng <exriscer@xxxxxxxxx>
  • To: "Mark W. Farnham" <mwf@xxxxxxxx>
  • Date: Wed, 23 Dec 2009 16:51:44 +0100

Hi!

The dynamic extent allocation (10MB extent size) is around 400 per
subpartition. I just noticed yesterday that the subpartition key is WRONG,
there is only a value for the subpartition key so all data goes to the same
subpartition, seems that the developer didnt check the values and got this
wrong. This will be fixed after the holidays.
I think the waits you are referring where Parallel Slaves pile up is shown
as enq: HV contention.

In the 5 minutes frame there could be around 12000 (400 x 3 tables x 10
processes) extent allocations spread in 10 application processes (therefore
the Parallel Slaves). Is it a lot? May be, no doubt, but should this cause
hanging in a 6 nodes RAC? No, I dont think so at most I expect slow
performance but not hanging.

There is a total of 64 slaves, parallel_max_servers is set to 64, there are
32 Cores.

The jobs are controlled by some metadata status tables, if not finished the
following would not kicked in. For example if process1 is running for more
than 5 minutes the following cycle would not start, it checks status table,
if process1.status = 'RUNNING' then it waits.

The ETL processes goes like this


   1. Staging text files
   2. Read these texts files as External Table and load to a Staging
   Subpartitioned Table (TEMP Table)
   3. Read the Staging Table and loads to two more subpartitioned tables
   concurrently


Thanks!

--
LSC



On Wed, Dec 23, 2009 at 1:10 PM, Mark W. Farnham <mwf@xxxxxxxx> wrote:

>  So how much dynamic extension is going on? Is this the bug where when one
> thread needs to allocate an extent all the parallel inserters to the same
> object pile up and also add an extent?
>
>
>
> So, if you count the number of extents on all the objects getting inserted
> into at 00:04:59 of your 5 minute cycle (presuming the previous cycle
> finished without exhausting the time slice), and you do it again when the
> work for the 5 minute period is over, are we routinely talking about a lot
> of extents being added or not?
>
>
>
> When you say 10 application processes but with 64 slaves, is that 64 total
> or 640 total? What is your runqueue or the equivalent for whatever OS you’re
> on?
>
>
>
> End of diagnostic data questions.
>
>
>
> Random suggestion leading question: What is driving the five minute window?
> Is this controlled such that a set of processes does not start on the five
> minute boundary if the previous set has not finished? What sort of input is
> being consumed by the insert jobs? Are the three tables the input or the
> destination. If input, are they new sets of input or rows added to
> previously processed data? Is there a way to stagger the starts
> productively?
>
>
>
> mwf
>
>
>  ------------------------------
>
> *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
> oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *LS Cheng
> *Sent:* Wednesday, December 23, 2009 3:46 AM
> *To:* Greg Rahn
>
> *Cc:* info@xxxxxxxxxxxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
> *Subject:* Re: row cache lock contention parallel insert
>
>
>
> Hi
>
>
> Thanks for the info
>
> The workload is more or less every 5 minutes 10 data load processes kicks
> in, 3 of them needs to load around 12 million of rows each (going through
> ETL which are 3 tables per process), the rest from 300000 to 3 million.
>
> All insert append with parallel dml and PQ for the SELECT
> (insert...select). Basically there is only 10 application processes but with
> 64 slaves running almost all the time. All ETL Tables has more or less 4000
> to 5000 subpartitions
>
> I noticed also compatible is set to 10.2.0.1 instead of 10.2.0.4, might be
> some old bug? Probably because when I was doing test with a test database,
> also 10.2.0.4 with compatible set to 10.2.0.3 I managed to reproduce massive
> row cache contention, when I changed compatible to 10.2.0.4 row cache
> requests reduced by a factor of 10. I will change compatible to 10.2.0.4
> after holidays because they dont allow anymore system changes until then
>
> Thanks!
>
> --
> LSC
>
> On Tue, Dec 22, 2009 at 4:30 PM, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
> wrote:
>
> That query is used for the row cache read call back for SEG$ when the
> segment entry is not available in the row cache.  Once the row cache
> entry is available, all the reads should go through row cache rather
> than from disk. It seems that something may be wrong with the row
> cache layer.
>
> Can you describe the workload?
> What are the other top SQLs?
>
>
> On Mon, Dec 21, 2009 at 3:47 PM, LS Cheng <exriscer@xxxxxxxxx> wrote:
> > Yes there are many executions on this two query:
> >
> > 2ym6hhaq30r73 - around 11 millions executions per hour
> > SELECT type#, blocks, extents, minexts, maxexts, extsize, extpct, user#,
> >        iniexts, NVL (lists, 65535), NVL (GROUPS, 65535), cachehint,
> hwmincr,
> >        NVL (spare1, 0), NVL (scanhint, 0)
> >   FROM seg$
> >  WHERE ts# = :1 AND file# = :2 AND block# = :3
>
> --
>
> Regards,
> Greg Rahn
> http://structureddata.org
>
>
>

Other related posts: