Re: external table parallel issue

  • From: LS Cheng <exriscer@xxxxxxxxx>
  • To: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • Date: Mon, 17 May 2010 07:35:34 +0200

Hi

The source comes from 20 (but many files not just 20) different places and
each leave their file in a different directory and several external tables
were created for each source file.

The issue is that when the query is run against many external tables glued
together with union all some of PX slaves does no job at all, for example if
16 slaves are spawned probably only a couple of slaves are performing 95% of
tasks and the others just idles, this can be observed from 10046 traces for
all PX processes. I am thinking the cause might be that not all external
tables has data, only 25% of all external tables and the rest are empty
files (but not always sometimes they do contain data) so the working slaves
get assigned the tables with data and the idles one got assigned tables with
empty files.

This des not happen with single external table, all slaves works and
uniformly.

This is reproduceble with select count(*)

I have not tested with heap tables, I will see if I can carry mote tests
today.



TIA


On Mon, May 17, 2010 at 3:32 AM, Greg Rahn <greg@xxxxxxxxxxxxxxxxxx> wrote:

> First off - why not just make a single external table that contains
> all the files of the other 64?  E.g if you have 64 external tables
> with 1 file each, make a single, additional, external table with 64
> files.  This would seem to eliminate the need for a 64 branch union
> (which seems like bad design to me).
> Second - does the issue (which I'm unclear on exactly what it is)
> reproduce if you use heap tables?  E.g make N heap tables as select *
> from external_table and try your 64 branch union.
>
> If your concern is why some PX servers dont seem to be getting work
> when accessing the external table - does this reproduce with a simple
> select count(*) from the external table as well?
>
>
> On Sun, May 16, 2010 at 9:26 AM, LS Cheng <exriscer@xxxxxxxxx> wrote:
> > The size varies from a few MB to a couple hundreds of MB.
> >
> > With 1 external table the query is pretty fast, we have other process
> which
> > run against only a single external tables larger than the problematic one
> > and the timing is much better.
> >
> > The problem with this I am showing is that it is formed by many external
> > tables and to simply the query we created several union all views on top
> of
> > these external tables. I will test without views and see what happens.
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
>

Other related posts: