Re: Is it a good idea to have primary key on DW table

  • From: Tim Gorman <tim@xxxxxxxxx>
  • To: my.oralce@xxxxxxxxx
  • Date: Wed, 04 Aug 2010 08:19:50 -0600

Anupam,

Think about the purpose of a primary key constraint.  The question you need to answer is:  why do you think you need the constraint?

All relational tables must be uniquely identifiable, so that some column or combination of columns (even if it is all the columns) represents a unique identifier.  I don't wish to discuss why that is true with anyone;  go back to Codd and Date for that.

However, the presence of a unique identifier is one thing.  Implementing the mechanism of a primary key (or unique key or foreign key) constraint is quite another.  What the reasons for doing so?
  • Protecting against duplicate entries
  • Providing implicit "documentation" of the presence of the unique identifier
  • Providing part of the implicit "documentation" of a relationship with one or more other tables
There are certainly additional other reasons, feel free to specify them and we can discuss.

So, with the fact table in a data warehouse, do *any* of these reasons truly apply?

I'm sure that the first item (i.e. protecting against duplicate entries) seems the most important, but riddle me this:  is a unique index (which is the enforcement mechanism used by primary key and unique key constraints in Oracle) truly the only way to protect against duplicate entries?

Many people say yes, forgetting that there really are other ways, very low-tech and easy to overlook, but they exist.  The logic of ETL processes can ensure that duplicate entries don't occur.  Validation jobs running during and after data loading can detect whether any errors occurred during data loading, alerting data warehouse administrators of the problem.  The fact is, the inherent latency involved with the ETL or ELT process does not mandate the mechanism of a unique index to enforce uniqueness.

Even if someone were to argue that a unique index is indeed the very best mechanism for enforcing uniqueness, where is the need?  In a dimensional data model, what type of query is it that needs to uniquely identify a row in a fact table?  The answer is:  a mistake.  If you have a query that absolutely needs a unique identifier in a fact table, then it means that your data model has a relationship between two (or more) fact tables, and that is an inherent mistake in a dimensional data model.  When that occurs, it is time to go back to the design floor and rethink assumptions and redraw the model.

So, think it through.

The unique identifier for a fact table should be the combination of all the dimensional key columns.  There is no need for a surrogate key on the fact table, because all of the dimensional key columns are themselves surrogate keys, and to understand the reasoning for that, go back and remember the reasons surrogate keys are used in data warehouses.

The presence of a unique identifier on the fact table does not mean that it needs to be enforced by a primary key.  Uniqueness can be enforced in other ways, and any query logic that may seem to necessitate a unique identifier is either itself in error or has exposed an error in the data model.

Sorry for the brevity, but don't be afraid to question authority on this matter.  You've discovered only the first of many disadvantages of the use of unique indexes in DW environments on Oracle.  Oracle can do a magnificent job with data warehouse databases, but sometimes you have to remember that the mechanisms provided by Oracle (and other RDBMS's) aren't always the best implementation choice.  Keep an open mind, and keep in mind the purpose of the application.

Hope this helps...
Tim Gorman
consultant -> Evergreen Database Technologies, Inc.
postal     => P.O. Box 630791, Highlands Ranch CO  80163-0791
website    => http://www.EvDBT.com/
email      => Tim@xxxxxxxxx
mobile     => +1-303-885-4526
fax        => +1-303-484-3608
Lost Data? => http://www.ora600.be/ for info about DUDE...

On 8/3/2010 10:53 PM, Anupam Pandey wrote:
Hi Listers,
                I was trying to do a poc whether its good idea to have the primary key constraint on FACT table in DW envioronment or not .

Here is what i found :-

1. Once I loaded the table for few days and started the primary key creation process ,other select queries on that table was blocked.

   
  SID SQL_ID SQL_HASH_VALUE BLOCKING_INSTANCE BLOCKING_SESSION SQL_TEXT
1 322 5adg0zfm9zrat 2795494745     alter table FACT_TEST add constraint pk_FACT_TEST primary key (<COLUMN_LIST>) 
2 389 9n8jftdhvb5q0 1639290560 1 322 select date_key,count(*) from fact_test group by date_key
-- //www.freelists.org/webpage/oracle-l

Other related posts: