RE: Load and Query question

  • From: "Shreeni" <shreeni@xxxxxxxxxxxxx>
  • To: "'David Aldridge'" <david@xxxxxxxxxxxxxxxxxx>, "'Oracle Discussion List'" <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 27 Apr 2006 11:47:23 -0500

Nope...we do not have any MVs or summary tables. Though I would wish we had
some summary tables. The largest table I have is the daily journal Tx table
which has 230 columns of which 164 are descriptions, which are 99% the same.
The other-thing the design guys have thrown on us is after everyday's ETL is
complete, we have to do a row count between source and target which is fair
enough and then to do "minus" compare on all the 164 descriptive columns
between source and target which is unfair on the database don't u think ? 

-----Original Message-----
From: David Aldridge [mailto:david@xxxxxxxxxxxxxxxxxx] 
Sent: Thursday, April 27, 2006 11:37 AM
To: shreeni@xxxxxxxxxxxxx
Cc: gogala@xxxxxxxxxxxxx; 'Oracle Discussion List'
Subject: Re: Load and Query question

Having the description in the same table as the metrics does sound odd,
especially if you have to deal with changing descriptive text. It's not
as if Oracle won't partition prune with a predicate on a dim table
description column either. You can also index the fact table based on a
description in a dim table using a bitmap join index. These aren't
materialized views or summaries of fact-dim table joins are they?

I have to say that if you are using table compression then the space
overhead of having lot of entries of "United States Dollar" is probably
going to be very small.

Speaking intuitively that does sound like a very long time for a load of
8million rows -- is the maintenance of those descriptions responsible
for a large proportion of that time? Can you breakdown the load into
separate operations with known durations?

Shreeni wrote:
> Mladen,
>
> I totally agree with you regarding constraints and their lesser
predominance
> in an DWH environment. However, my issue with the current design I see and
> work with everyday is that if there are 70 codes in a table and 8+ mil
rows
> why would I want to load 8+mil codes and their associated desc into the
same
> table where as I could have greatly reduced the load time if I had just
> split the table to fact/dim. The codes and desc I am talking about is 100%
> "conformed" and the design begs to have a conformed dimension. Everytime
an
> adhoc-query dies on these table which have 25-20mil rows, we go around the
> rigmarole of identifying the best possible indexes for these tables. For
> example you can call International Currency names as a conformed dimension
> right ? Because the probabality of the description changing overnight is
> very rare and the OLTP is send me the same codes. In cases such as this
what
> is advantage of loading my finance tables with 10+ mil rows with the  USD
as
> the code and "United States Dollar" as desc everyday ?
>
> Raplh's DWHToolKit has always been beside me and this is what causing me
> heartburn :) I have read Tim's book too, as also Bill Inmon's
dissertations
> on DWH principles and scores of others as well....
>
> Shreeni
>
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> On Behalf Of Mladen Gogala
> Sent: Thursday, April 27, 2006 10:45 AM
> To: shreeni@xxxxxxxxxxxxx
> Cc: 'Oracle Discussion List'
> Subject: Re: Load and Query question
>
>
> On 04/27/2006 11:24:41 AM, Shreeni wrote:
>
>> Hi List,
>>
>> I have a semantic question in a DWH environment. At the current place I
>> work, the tables have been de-normalized to such an extent that they may
>>
> not
>
>> get qualified as 2NF even. Sure there are PK but they are there more to
>> provide a security implementation rather than follow the relational
>> priciples. Out of 200+ tables there is not one dimension or a fact or
even
>> one referential key. Each table has about 130+ columns out of which equal
>> quantities are codes and descriptions.
>>
>> The argument in favor the current design is that joins are expensive.
>> However from a DBA's perspective would you rather have a relational model
>> which would help really large load volumes and get the data out there
>>
> faster
>
>> or go with a totally denormalized structure and don't care about when the
>> data is delivered ?
>>
>> On an average day, the load takes more than 10+ hours and volumes are
>> presently at 8mil+ rows everyday. The env is Solaris 9, Oracle 10.2.0.1.0
>>
>> Thanks
>>
>> Shreeni
>>
>>
>
> Shreeni, the English word "constraint" can be translated roughly as
> "limitation"
> or "ban". Constraints are here to prevent non-conforming data from making
> its way
> into the database. Constraints are, primarily, logical entities which help
> with keeping
> your data logical. If somebody enters date of birth as 4/1/2525, then you
> have to deal
> with either a prankster or doctor Who. Both cases deserve special handling
> so you're well
> advised to prevent them from entering your database. That is the primary
> purpose and the
> prime directive of database constraints.
> In data warehouse situation, you don't have to prevent data from entering,
> data is already
> checked and trusted, you are extracting data from your OLTP database,
> transforming it along
> the way and loading it into the DW. That process is called ETL. In the DW
> you don't need
> constraints, but you do need physical entities which speed up retrieval
and
> joins. Those
> physical entities are called "indexes" or "indices", depending on your
> garden variety of
> English spelling. You must index your DW properly. Indexing schemes are
> widely described
> in various books. The best one I can think of is called "The Data
Warehouse
> Toolkit", by
> Ralph Kimball and introduced to me by my dear friend Tim Gorman, also a
> member of this list.
> I can wholeheartedly  recommend this book as well as Tim's own work:
> "Essential Oracle8i
> Data Warehousing". I'm afraid that he had less success with teaching me
how
> to bowl.
>

--
//www.freelists.org/webpage/oracle-l


Other related posts: