RE: Choosing data file size for a multi TB database?

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <BranimirP@xxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 30 Aug 2005 09:11:47 -0400

First, you try to find a reasonable way to determine the i/o throughput
requirements. If you will have significantly peaked load, you need your
requirements in "burst mode" (acheivable in some layer of cache) and
sustained throughput (rememebering to include getting the archived redo logs
to multiple stable destinations, preferably including one that is offsite,
and depending on your availablility requirements, possibly WAN bandwidth to
a remote standby location.) Remember to look at your entire i/o pathway from
spinning rust or SSD through the various cache levels. If you know in
advance that certain objects will be extremely "hot" in i/o compared with
the rest of the objects, consider provisioning hardware i/o capability at
two or more levels of performance and be certain to keep these pools
separate in allocating your resources.(see note on SAME, below).

Second, you choose a hardware architecture that is capable of maintaining
spread load as you grow. After you determine the hardware architecture, you
will know the size of a piece of growth. With a very large database you
probably should plan to add large chunks of hardware, such as adding a pair
of i/o controllers and two new disk filled "trays." Not only is this cost
effective in purchasing and labor and disturbance risk effective in
installation, it blends well with the leading layout methodologies (SAME,
BORING, and dare I say ASM yet?).

Third, be committed to having nothing but Oracle on this part of your disk
farm. At large sizes compared to the number of disks that will saturate
controllers, you have a tough need to explain to want to pollute the Oracle
i/o signature with anything else.

Okay, now for each "speed" rating of hardware component, you should now be
able to compute the size of expansion. (Like adding a pair of trays of
disks).
Most likely for Oracle you will be best off to duplex drives in the hardware
and stripe across whole trays. (Note on SAME, Stripe and Mirror Everything,
as per Juan Loaiza indicates one stripe across everything. Find his SAME
paper for how to add storage to SAME. This does not dovetail exactly with
multiple pools of storage at different speeds.) For BORING (Balanced
Organization of Resources in Natural Groups) and ASM you add storage to a
given speed group and assign needs to speeds.

Anyway, once you've picked your hardware and your layout architecture, you
have know the numbers to use to figure out the size of files.

For SAME, read the papers.

For ASM create a LUN across the whole tray and give it to ASM. If OS limits
force you to make smaller LUNs than the entire tray, then you have to
compromise and lie a little bit to ASM. You'd like to give LUNs to ASM
within disk groups that are equal in i/o capability. Ideally that includes
the controller paths, so if you call a whole tray a LUN you're all set. If
that is too big, you have to pretend that you might not saturate
controllers, but you can minimize the lie by defining a consistent
horizontal stripe across a whole tray so you have the same number of LUNs
across each tray within a disk pool.

For BORING you do pretty much the same thing. (In fact ASM completely
supports the BORING layout methodology). Except of course in BORING you
don't necessarily entrust your storage to ASM yet. Instead of LUNs, the
whole trays are files. Or in the alternative several matching sized stripes
across
a whole tray become files, and all those files constitute a "stripeset". If
you have multiple trays per controller instead of multiple controllers per
tray, then you should sensibly arrange your "stripesets" into "stripeset
groups" so that controllor capacity can be reasonably taken into accout.
If you have known-in-advance peaked load that varies in hours or calendar
periods, then by having multiple files per tray you can allocate tablespaces
to files and tables to tablespaces such that you give the maximum throughput
available to the particular application during its peak. Likewise, if you
have applications likely to have common or unpredictable peak loads, isolate
them from each other on different stripesets. There is no reason to let a
bad i/o signature from one application group destroy the performance of all
the application groups. Clearly, there is a planning exercise to the
tradeoff between using all the capability you've got in parallel for one
process versus guaranteeing an unintruded reserved level of service. In
BORING you plan ahead for those tradeoffs and have reasonable consistent
building blocks to rearrange yourself if you get it wrong or it changes. As
ASM matures, it might be possible that if the hardware is set up so that you
are not forced to lie to ASM, then ASM will dynamically rebalance the i/o
effectively.

Finally, though, you've got a size (or sizes with multiple speed groups). If
you're building a large disk farm.

One final note: For very large disk farms folks often toss the idea of
non-volative SSD out the window too quickly. If you know in advance that
parts of your database will be "hot" relative to the average on a continuous
basis, then a modest proportion of SSD apart from your large storage array
can "deheat" the cache for the disk farm so that windows when your sustained
i/o forces you all the way to spinning rust are much less frequent. Often
online redo logs, rollback segments (aka UNDO), and temporary are quite hot,
and many applications and application suites have particular objects that
are frequently or continuously "hot." If you go this route, but certain to
understand the MTBF and power outage survival characteristics of the SSD you
use. There exist devices that simply lose memory when you pull out the plug,
some last for many days, and some utilize on board batteries to sweep memory
to a pair of onboard drives.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]On Behalf Of Branimir Petrovic
Sent: Tuesday, August 30, 2005 7:33 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Choosing data file size for a multi TB database?


How would you approach task of sizing data files for a project that will
start with
a 1TB database but may relatively quickly grow to stabilize at around 10TB
mark?

Obvious options are:

    - start with many smallish files (like 2GB each), then add some
thousands more
      as the database grows,
                or
    - start with a number of largish data files (in 10-100GB range each),
then add
      more such files to accommodate growth.

Neither of the above options look very desirable (to me at least). First
option
might be bad choice with checkpointing in mind, but the second option is not
the
winner if data files ever needs to be moved around. Anyway some initial
choice must
be made, and all I'd like at this moment is not to give perilous initial
advice...
(admission: once the "ball" starts rollin', this bastard ain't gonna be
mine:))

So from practical perspective - what would be the least troublesome choice?

Branimir



FYI I  - OS platform is the darkest secret at this point, as is the hardware
specs
(no-one can tell, early signs of "well communicated, well managed" project
are all
there)


FYI II - I've never had to deal with DBs much bigger than 100GB, thus the
need for
"reality check"..
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l

Other related posts: