Re: Building fact tables in a data warehouse
- From: "jaromir nemec" <jaromir@xxxxxxxxxxxx>
- To: <tomdaytwo@xxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
- Date: Sun, 23 Dec 2007 21:49:25 +0100
Hi Thomas,
> This worked OK for our ETL developers when we were dealing with 10,000 rows
> in the development database.
> Now we're working with a much larger source set and we're talking
> 100,000,000 rows. It doesn't work that well.
> Does anybody have experience or even a theoretical insight into a better way
> to do this?
One important thing in an ETL process is to know where is the bottleneck. From
the birds view this can be
a) in the data source
b) in the transformation
c) in the data target
Except for some cases such as inefficient selects in data source, indexes on
targets or contention with other processes in prevailing number of cases the
problem lies in the transformation. This is due to the row-by-row processing
logic of the majority of ETL tools.
A typical answer is to use parallelism; in principle to start more ETL sessions
working on partitioned sources and targets. This could indeed help, and with
your row count will be probably required.
But there is other possibility to shift some transformation logic in the data
source.
> To build our fact table, let's call it Z (as I understand the process) we
> grab the primary key of the row in dimension A plus information which
> we can use to find the associated row in dimension B. We go to dimension B,
> grab the primary key plus information which allows us to
> find the associated row in dimension C, and so on through 8 dimension
> tables.
In this case I'd try to join the fact records with the dimensions in the
database (e.g. using parallel, partition-wise hash join) possible with
pre-denormalizing the dimension hierarchy in a single table before the ETL
process starts (to avoid the cascading joins).
One starting point in troubles with ETL process is to rewrite the process
(possible simplified) in a CTAS statement. This gives you a quick orientation
where the problem could be. If the CTAS doesn't work as expected, it is
probably an a) or c) problem and should be solved in the database. Otherwise
you get estimation what can be an expected target throughput of the ETL
transformation.
Regards,
Jaromir D.B. Nemec
- References:
- Building fact tables in a data warehouse
- From: Thomas Day
Other related posts:
- » Building fact tables in a data warehouse
- » RE: Building fact tables in a data warehouse
- » Re: Building fact tables in a data warehouse
- » Re: Building fact tables in a data warehouse
- » Re: Building fact tables in a data warehouse
- Building fact tables in a data warehouse
- From: Thomas Day