Re: Storing blobs in database vs filesystem

  • From: Carel-Jan Engel <cjpengel.dbalert@xxxxxxxxx>
  • To: strickland.mark@xxxxxxxxx
  • Date: Sun, 01 Oct 2006 23:07:08 +0200

First, apologies for not chiming in earlier. I spent some time at a
remote Danish Holiday Park, having fun with alotof people
discussing/listening/presenting Oracle stuff and having a little smash
of beer ;-)

At a CT site we're going right the opposite way. Their 25 million
documents (fast growing) Document System has an oracle database (The
Directory) with all authorization info etc. of the scans, and all scan
blobs are stored in files. The way they did it was using an 8-level deep
directory structure resembling 8 words of the unique key of every
document. This is COTS stuff, so changing that approach is out of the
question. This 8 level deep monster of a directory tree is now so big (>
100 million directories, most levels are used in a key just once), that
it takes over a week to rebuild the tree on its own, without the
documents. That is not an option, recovery-wise spoken.

For this reason we're now preparing to extent the system with a database
storage functionality. The system it is distributed, remote offices can
have copies of the documents, The Directory is centralized and (of
course) Data Guarded. Copying back some datafiles is likely to be
handled far more efficient by the filesystem than creating millions of
directories and files. At least we get rid of the overhead of creating
all the directories when restoring from tape. (This is NTFS sitting on
some HP storage, type unknown to me)

So, I'd take this into account very carefully when you decide to go for
the FS solution. Restoring millions of files to a FS is pretty mujch
akin re-inserting row by row in the database. The filesystem backup is a
'logical' backup, like exportfile, and 'logical' recovery resembles
import, without array processing features to speed things up.
Filesystems are not very efficient in creating tons of files, they're
mainly build for efficient I/O to the contents of the files (Kevin,
correct me if I got it wrong). So, unless you can 'dd' the disks as a
backup, the recovery secenario is a serious threat to your solution, if
not a showstopper, even when you have a complete flat directory
structure or can backup 'physical'. 

BTW, the documents are going to sit in a database that is separated from
'The Directory'

Best regards,

Carel-Jan Engel

If you think education is expensive, try ignorance. (Derek Bok)

On Thu, 2006-09-28 at 15:02 -0700, Mark Strickland wrote:

> We're seriously considering doing the same thing.  Out of 1.6-Tb, we
> have 1.3-Tb in LONGs, LONG RAWs and CLOBs that we're probably going to
> move out of Oracle.  We're reaching a tipping point where we'll soon
> have to buy more SAN storage, multiplied by three (physical standby
> and full copy for Staging environment).  I'm very interested in seeing
> other replies, ie, horror stories, gotchas, etc.
> We're also testing compressing the LONGs and CLOBs before they're
> inserted into the database and engineering the application to
> compress/uncompress in flight.
> Regards,
> Mark Strickland
> Seattle,  WA
> --
> //

Other related posts: