Re: external table parallel issue

  • From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • To: LS Cheng <exriscer@xxxxxxxxx>
  • Date: Mon, 17 May 2010 06:44:32 -0700

The unit of work for a PX server (for external tables also) is a
granule, not a file (see oracle docs on granules for more info).
Given that you are observing the PX workload skew (some PX servers
have granules/rows, some do not), the likely reason is there simply is
not enough granules to give out to each of them.  This is why I asked
how big the files were.  For external tables the granules are 10MB so
if you have a 100MB file and using a DOP of 16, there will be 6 PX
servers with no work (you would need at least 160MB to give each PX
server a 10MB granule at DOP=16).

The UNION executes each branch on its own which is much less efficient
and desirable than having a single table because the "right side"
slaves (ones that get work first) continually get the granules because
the files are too small.  This is why I asked if the issue was
observed with a single table of all the files.

I dont believe dynamic sampling behaves this way with external tables:
at execution time the PX coordinator parses the list of files and then
gives out the granules.

Bottom line: it's a much better solution to use a single external
table of many files (than to use the UNION) so there is enough data to
give out to all the PX servers.

On Mon, May 17, 2010 at 1:33 AM, LS Cheng <exriscer@xxxxxxxxx> wrote:
> That is eactly the issue and I understand what is going, a bit tricky to
> deal with may be instead of union all in the database I should try merge the
> file into a single file.
>
> However I thought during the parsing phase the optimizer can dynamic
> sampling (I have optimizer_dynamic_sampling set to 4) and can know which
> external tables has more data and which has cero rows and then distribute PX
> slaves based on that but doesnt seem the case.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l


Other related posts: