RE: Timestamp dimensions in a data warehouse

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <greg@xxxxxxxxxxxxxxxxxx>, <peter.schauss@xxxxxxx>
  • Date: Thu, 21 Apr 2011 15:23:01 -0400

Point well made. Jared's as well in the slightly earlier post.

There remain things too small to justify buying partitioning. There are
things too big to do reasonably without partitioning. And there can be a
need for both partition pruning AND variable time granularity within a
reasonable partitioning granularity. The most productive route for a given
situation may be indexing alone, partitioning alone, or both partitioning
and indexing together. You can probably model a reasonable tradeoff guess
amongst those three possibilities that will be good for a few years.

But there is no magic size evaluation you can do on today's hardware that
will stand the test of time (yes, that is an intentional pun). But don't
worry too much about that: You'll have to reload your data into quantum dna
multidimensional nanotechnology media real soon now anyway.

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Greg Rahn
Sent: Wednesday, April 20, 2011 8:32 PM
To: peter.schauss@xxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Timestamp dimensions in a data warehouse

I think the usage of the data (questions/queries) should influence the
design here more than a text book, but there is truth to Kimball's logic and
design.  The challenge I see here is the granularity level.
If you have intervals of less than 1 minute, will you "round down" to whole
seconds?  If so, consider how much data you will have and how large that
time dimension will be - 31,556,926 rows per year as that is how many
seconds there are - that makes a pretty large time dimension.

In this case, using an index isn't the modern, big data way, partition
elimination is.  So your fact table should be partitioned at appropriate
granule size on the time key column, whether that be an integer date key or
timestamp type.

On Wed, Apr 20, 2011 at 11:35 AM, Schauss, Peter (IT Solutions)
<peter.schauss@xxxxxxx> wrote:
> I am in the early stages of designing a data warehouse intended to store
what I will call metrics and diagnostic data.  The granularity is, most
likely, going to be at intervals of less than a minute.  If I follow what I
understand to be the classical data warehouse approach, I should have a
dimension table with one row for each time interval instead of simply adding
a timestamp column to the fact table.  In Ralph Kimball's <The Data
Warehouse Toolkit> the author justifies this approach by asserting that
"most databases don't index SQL date calculations, so queries constraining
on an SQL -calculated field wouldn't take advantage of an index".
>
> Is this statement still valid?
>
> Would there be other reasons for putting the date in a dimension table?

--
Regards,
Greg Rahn
http://structureddata.org
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: