Re: Building fact tables in a data warehouse

  • From: Job Miller <jobmiller@xxxxxxxxx>
  • To: tomdaytwo@xxxxxxxxx, oracle-l@xxxxxxxxxxxxx
  • Date: Fri, 21 Dec 2007 18:24:44 -0800 (PST)

Chapter 6, p.215 of the Data Warehouse ETL Toolkit talk about the common 
approach for the "Surrogate Key Pipeline" and presents a diagram basically what 
you describe.

The gist of it is to ensure that all records move through successive lookups in 
parallel, without going to disk, and are reading from cached lookup tables that 
contain the most recent dimension values.  Ideally, you don't have to use the 
entire dimension, because much of the dimension contains dated data..  

however, if you have late arriving data, that may invalidate your ability to do 
all lookups against smaller current lookup tables vs. doing the lookup against 
the full current dimensions.

it goes on to say that you can do it all in one big star join query, (with 
outer joins if you are concerned some dimensional values may not match), but 
that that approach isn't the most efficient..

Thomas Day <tomdaytwo@xxxxxxxxx> wrote: I'm just wondering if anyone has any 
insights on how to speed this up.
  
 Our fact tables consist of the primary keys of the involved dimension tables 
plus numeric count columns.
  
 To build our fact table, let's call it Z  (as I understand the process) we 
grab the primary key of the row in dimension A plus information which
 we can use to find the associated row in dimension B.  We go to dimension B, 
grab the primary key plus information which allows us to
 find the associated row in dimension C, and so on through 8 dimension tables.
  
 This gives us one row in the fact table.
  
 This worked OK for our ETL developers when we were dealing with 10,000 rows in 
the development database.
  
 Now we're working with a much larger source set and we're talking 100,000,000 
rows.  It doesn't work that well.  Basically, each row in the fact table 
requires full a index scan and a rowid fetch from each of the dimension tables. 
  
 Does anybody have experience or even a theoretical insight into a better way 
to do this?
  
 Thanks
 

       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.

Other related posts: