Re: How to store 50 Terabytes per day?

  • From: Tim Gorman <tim@xxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 09 Sep 2004 19:51:06 -0600

Jaromir,

> Hi Tom,
> 

Some call me.....Tim?  Are you looking for A GRR-R-R-R-RAIL?

>> ETL would have to consist of inserts only once, no updates or deletes EVER
>> Current-image data would have to be implemented as either views or
>> materialized views, if desired...
> 
> I completely agree with this strategy but in my opinion there are two
> problems to be solved:
> 
> a) the dealing with rolling window - e.g. the data are held for 7 years, but
> the "current image" should contain complete data.  This is not a trivial
> task , for example using materialized views a drop of an old partition
> should have no influence to the materialized view (may be not so difficult)
> and a complete refresh of a materialized view must remain possible (may be
> more difficult).

The most valuable feature of MVs is query rewrite.  In fact, I would argue
that there is no other reason to use MVs (please let's not argue though --
obviously there are other cool uses like DBMS_REDEFINITION).  Thus, other
features such as fast refresh are merely "nice-to-have" which help you
achieve query rewrite.  Query rewrite is the "sizzle" in this "steak"...

I have no hesitation in not using any of the MV refresh mechanisms provided
by Oracle.  If they work, fine.  If they do not, then hello to custom-built
fast refreshes...

> 
> b) the validation and cleaning make the "insert only once" more complicated.
> I mean the enforcing of business rules beyond of primary and foreign keys.
> Those actions can be moved in current-image views, but this is for sure not
> the desired solution (validation will be repeated for each query).
> 

Depends on how much validation and cleaning you have to do.  It also depends
on how you do it.  Some ETL tools and methodologies are quite wasteful from
an I/O perspective;  some are less so.

Such a high rate of insertion (50Tb/day implies an average of 600Mb/sec)
presses hard against the throughput limits of most storage units available
today.  Top of the line 15000rpm hard drives usually provide sustained
external throughput of about 100 Mb/sec, so where are we at when we are
talking averaging 600 Mb/sec, all day, every day?  Just for writes, never
mind reads?

So, one insert for each row only, one pass at indexing, and one chance to
"analyze".  Massive RAID-0 striping to maximize throughput across devices.
RAID-1 for redundancy -- no foolish talk about RAID-"F" or "caches"...

Even with the insertion of the data once, we're already past the throughput
capacity of most storage arrays, never mind additional like undo and redo,
so direct-path and nologging would be key.  And you really can't have
direct-path without using EXCHANGE PARTITION.  Set the tablespaces to READ
ONLY as soon as possible after load and get 'em backed up.

Whatever throughput is left over must be made available for queries by
end-users...

Hope this helps...

-Tim

--
To unsubscribe - mailto:oracle-l-request@xxxxxxxxxxxxx&subject=unsubscribe 
To search the archives - //www.freelists.org/archives/oracle-l/

Other related posts: