Re: Timestamp dimensions in a data warehouse

  • From: Greg Rahn <greg@xxxxxxxxxxxxxxxxxx>
  • To: peter.schauss@xxxxxxx
  • Date: Wed, 20 Apr 2011 17:31:55 -0700

I think the usage of the data (questions/queries) should influence the
design here more than a text book, but there is truth to Kimball's
logic and design.  The challenge I see here is the granularity level.
If you have intervals of less than 1 minute, will you "round down" to
whole seconds?  If so, consider how much data you will have and how
large that time dimension will be - 31,556,926 rows per year as that
is how many seconds there are - that makes a pretty large time
dimension.

In this case, using an index isn't the modern, big data way, partition
elimination is.  So your fact table should be partitioned at
appropriate granule size on the time key column, whether that be an
integer date key or timestamp type.

On Wed, Apr 20, 2011 at 11:35 AM, Schauss, Peter (IT Solutions)
<peter.schauss@xxxxxxx> wrote:
> I am in the early stages of designing a data warehouse intended to store what 
> I will call metrics and diagnostic data.  The granularity is, most likely, 
> going to be at intervals of less than a minute.  If I follow what I 
> understand to be the classical data warehouse approach, I should have a 
> dimension table with one row for each time interval instead of simply adding 
> a timestamp column to the fact table.  In Ralph Kimball's <The Data Warehouse 
> Toolkit> the author justifies this approach by asserting that "most databases 
> don't index SQL date calculations, so queries constraining on an SQL 
> -calculated field wouldn't take advantage of an index".
>
> Is this statement still valid?
>
> Would there be other reasons for putting the date in a dimension table?

-- 
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l


Other related posts: