Re: Fasted Way to Unload Data - Straight SQL query vs. PL/SQL and dbms_output vs. PL/SQL and utl_file

  • From: Nigel Thomas <nigel.cl.thomas@xxxxxxxxxxxxxx>
  • To: strickland.mark@xxxxxxxxx
  • Date: Wed, 29 Sep 2010 19:05:58 +0100

Mark

It's a long time ago (12+ years) but we used utl_file very successfully as
part of our ETL product (Constellar Hub). As far as I recall, UTL_FILE was
never the bottleneck (the writes might be, but that's unavoidable).

UTL_FILE works like normal buffered file output (ie a block at a time, not
the entire output). Of course it can only write to a directory that Oracle
can see (ie on or attached to the server). On *nix you can also write to a
pipe - so you can pipe into another process that is gzipping the output, for
example, or loading into another system. So you can have a straight through
ETL process. Incidentally UTL_FILE can also read from a pipe... I haven't
tried that under cygwin though.

DBMS_OUTPUT is really intended for handing back relatively small quantities
of data to a client; so with SQL*Plus the data would be written on a
directory into a spool file accessible from the client. But it is buffered
per client call, as you discovered.

Hope that helps

Regards Nigel

On 29 September 2010 17:49, Mark Strickland <strickland.mark@xxxxxxxxx>wrote:

> Oracle 10gR2 on Windows 2008R2 with Cygwin and Bash Scripts, 4 CPUs and
> 4-GB RAM
>
> I've written a horrible set of ETL bash scripts for a customer project and
> the final script does the "transforms".  It's a set of 14 queries that each
> join 10-12 tables and write the output in pipe-delimited format.  Lines are
> up to 1000 bytes wide.  I've set DOP on the tables to 16 in order to get
> more PGA for hash joins.  The queries themselves seems to run reasonably
> fast (I've tuned them as well as I can, I think) but the output is the
> problem.  I have to concatentate '|' between the fields because the next
> step in the process requires that.  Straight SQL output can write about
> 775-KB per second to disk.  I got a PL/SQL procedure from Laurent
> Schneider's blog that uses dbms_output and, once it starts writing, it
> writes about 3.5-MB per second, a vast improvement.  However between the
> time when the query heavy lifting "appears" to have completed (buffer gets
> stop increasing, TEMP tablespace writes drop to nothing, USERS tablespace
> reads drop to nothing) and when the output starts writing, several hours
> pass.  I'm assuming that it is buffering all the output (about 8-Gb) before
> it starts writing.  I see a lot of writes to the Windows pagefile, but
> little or no writes to the TEMP tablespace.  There certainly isn't enough
> memory to buffer 8-GB.  I will also try the utl_file approach.  So, I'm
> looking for at least anecdotal evidence for the best approach here.  I'm
> under a very tight deadline and am doing as much of my own testing as I can
> get done, but I'm hoping that others in the list will know right off how to
> approach this the best way.  Thanks in advance.
>
> Mark Strickland
> Seattle, WA
> '
>

Other related posts: