Re: Fasted Way to Unload Data - Straight SQL query vs. PL/SQL and dbms_output vs. PL/SQL and utl_file

  • From: Mark Strickland <strickland.mark@xxxxxxxxx>
  • To: Nigel Thomas <nigel.cl.thomas@xxxxxxxxxxxxxx>
  • Date: Wed, 29 Sep 2010 11:11:20 -0700

Thanks Nigel, I will certainly be trying utl_file.  Also, I will try loading
the query results into a GTT then selecting from that.


On Wed, Sep 29, 2010 at 11:05 AM, Nigel Thomas <
nigel.cl.thomas@xxxxxxxxxxxxxx> wrote:

> Mark
>
> It's a long time ago (12+ years) but we used utl_file very successfully as
> part of our ETL product (Constellar Hub). As far as I recall, UTL_FILE was
> never the bottleneck (the writes might be, but that's unavoidable).
>
> UTL_FILE works like normal buffered file output (ie a block at a time, not
> the entire output). Of course it can only write to a directory that Oracle
> can see (ie on or attached to the server). On *nix you can also write to a
> pipe - so you can pipe into another process that is gzipping the output, for
> example, or loading into another system. So you can have a straight through
> ETL process. Incidentally UTL_FILE can also read from a pipe... I haven't
> tried that under cygwin though.
>
> DBMS_OUTPUT is really intended for handing back relatively small quantities
> of data to a client; so with SQL*Plus the data would be written on a
> directory into a spool file accessible from the client. But it is buffered
> per client call, as you discovered.
>
> Hope that helps
>
> Regards Nigel
>
>
> On 29 September 2010 17:49, Mark Strickland <strickland.mark@xxxxxxxxx>wrote:
>
>> Oracle 10gR2 on Windows 2008R2 with Cygwin and Bash Scripts, 4 CPUs and
>> 4-GB RAM
>>
>> I've written a horrible set of ETL bash scripts for a customer project and
>> the final script does the "transforms".  It's a set of 14 queries that each
>> join 10-12 tables and write the output in pipe-delimited format.  Lines are
>> up to 1000 bytes wide.  I've set DOP on the tables to 16 in order to get
>> more PGA for hash joins.  The queries themselves seems to run reasonably
>> fast (I've tuned them as well as I can, I think) but the output is the
>> problem.  I have to concatentate '|' between the fields because the next
>> step in the process requires that.  Straight SQL output can write about
>> 775-KB per second to disk.  I got a PL/SQL procedure from Laurent
>> Schneider's blog that uses dbms_output and, once it starts writing, it
>> writes about 3.5-MB per second, a vast improvement.  However between the
>> time when the query heavy lifting "appears" to have completed (buffer gets
>> stop increasing, TEMP tablespace writes drop to nothing, USERS tablespace
>> reads drop to nothing) and when the output starts writing, several hours
>> pass.  I'm assuming that it is buffering all the output (about 8-Gb) before
>> it starts writing.  I see a lot of writes to the Windows pagefile, but
>> little or no writes to the TEMP tablespace.  There certainly isn't enough
>> memory to buffer 8-GB.  I will also try the utl_file approach.  So, I'm
>> looking for at least anecdotal evidence for the best approach here.  I'm
>> under a very tight deadline and am doing as much of my own testing as I can
>> get done, but I'm hoping that others in the list will know right off how to
>> approach this the best way.  Thanks in advance.
>>
>> Mark Strickland
>> Seattle, WA
>> '
>>
>
>

Other related posts: