Re: How to improve performance on dataload process

  • From: tboss@xxxxxxxxxxxxxxxxxx
  • To: roon987@xxxxxxxxxxx
  • Date: Wed, 28 Sep 2005 09:14:19 -0400 (EDT)

We're an Informatica DW shop here, solaris 8 over v440s and v880s.  Here's
some of the lessons learned we've run into with Informatica:

- make full use of the caching feature in the Informatica server.  Informatica
can cache all the lookup and reference tables it uses to transform data 
(the "T" of ETL) and it makes the operations run very fast.  I had all these
tables bound to a keep pool in the database ... but it was a waste of memory,
since informatica uses its own caching process.  Consider making informatica's
caching location a ram-disk or use tempfs ... might make it go more quickly,
since it defaults to reading from disk.

- Avoid doing major sql operations from maplets ... we had essentially pl/sql
developers working with the tool, and instead of using informatica programming
they simply called cursors from teh maplets ... and we were plagued with

- You can use hints with informatica-generated sql ... take a look at the
sql it passes in (try to grab it out of v$sqlarea or just work w/ the 
informatica develoeprs.

- Make sure the sql*net client informatica is using to connect to the database
is also ... if you just installed the client from Oracle's download
sites or from the original oracle install disks, its and buggy.  We
ran into big-time memory leaks in the client.

All your database-related fixes are good.  Do your indexing in parallel.  We've
got far too much data to be able to drop indexes, load data, then re-index and
re-gather stats in a maintenance window (overnight or a weekend).  Thus we
depend on staging all the data and pre-creating indexes before using a 
tablespace and an alter table exchange partition process to get new data into
our database.  You may get to that point as well.


> on Solaris 9
> Am loading data into a staging database using a ETL tool (Informatica) 
> and have done the following to improve the performance of the dataload 
> process
> 1) Database in no-archive log mode
> 2) Indexes  being built after the dataload
> 3) Load being done in parallel
> 4) Nologging being used
> 5) Tables not being analysed during the dataload process
> What else can I do to improve the performance of this load process?

Other related posts: