RE: No to SQL? Anti-database movement gains steam

  • From: "Matthew Zito" <mzito@xxxxxxxxxxx>
  • To: <dbvision@xxxxxxxxxxxx>
  • Date: Mon, 6 Jul 2009 11:51:05 -0400

See inline, and snipped as best as possible to keep things within the
quote limit, and a generally long email.  Back to work.

> -----Original Message-----
> > I'd be very careful making these kinds of statements.  In my
> > the folks working at companies like Google, Facebook, MySpace, Ning,
> > LiveJournal, etc. are easily as bright and experienced as the folks
> > work in tech at banks, pharmaceuticals, etc.
> "very careful" doesn't even make it into scope.  And before any veiled
> mentions
> of my "career" are brought up by anyone:
> I have had a career, don't need another one. That's why I can speak
> true
> independence, instead of "toeing lines". People seem to like that I do
> given
> they are willing to pay for it.
> Now: the simple fact here is that folks from Google, Facebook,
> Ning
> etcetc, and what they do as far as IT goes, are absolutely and totally
> irrelevant to the VAST majority of enterprise business.
> For starters most of them don't have prior baggage: they can afford to
> something totally new with no concerns whatsoever about existing
> data/code.
<snip with good commentary on how web companies have different IT needs>
> Google et all are a drop in the ocean in the IT market and what they
> does NOT
> define the general market, not by a long shot.
> Which is exactly the point Sunil made and I agreed to in my reply.

Well, so of course I'd never think about impugning your credentials -
you've contributed long enough and articulately enough to this forum
that it's clear you know what you're talking about.

However, my "very careful" comment points out how you're saying two
different things here.  If we go back to your original statement that I
was replying to, you said:

" Bingo.  IOW, a group of inexperienced and incompetent developers
decides to "write a web 2.0 site" and shazam, now "ALL enterprises
should do the same"."

My point was simply that calling them incompetent is a dangerous path.
It's the old, "Not Invented Here" syndrome  - i.e., the way we do things
has worked for us, so someone who does something different must clearly
be incompetent.

Now, however, you clarify your point to mean that the folks in the web
space have concerns that are totally different than those in more
traditional enterprise IT.  I agree largely with that statement, and I
assume that you're no longer calling these developers with different
needs and requirements "incompetent" and "inexperienced".

I also agree that they're starting fresh - they are, after all,
startups.  Existing enterprises have an existing codebase, internal
expertise, etc.

However, I believe that it's important to consider ways in which new
technologies can be leveraged to add efficiency, performance, etc.  For
example, if you look at some of my banking customers, while they have a
ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of
them also use proprietary C++ + Memcache + custom built non-SQL data
stores for things like algorithmic trading.  For them, their needs are
specific enough, and the upside large enough, that it's worth looking at
other options. So clearly even traditional Enterprise IT has areas where
standard relational data stores aren't an appropriate decision.

> > They've simply made a different determination - that the cost of
using a
> > relational database in a scale-up or scale-out configuration is
> > than the cost of using one of these non-traditional data stores.
> Nevertheless, I'd like to see factual proof that non-traditional data
> stores can
> indeed provide that scalability whereas traditional ones can't.
> That proof better be a litle more than just "because it works at such
> such".
> Such is no proof whatsoever that:
> 1- it indeed is the *only* solution for that such and such.
> 2- it does apply to ALL others.
> Which is what the demented fringe of Web2.0 is trying to convince the
> world of.
> In good old web 1.0 lunatic fring style: history after all is cyclic.

Bear in mind that I think everyone on this thread agrees with the idea
that "Because it works for XXXX, it must be great for everything" is
silly.  If I had to judge, I'd say that's reporter mojo speaking there,
or punditry in action.  

Most of the folks I know at these web companies that are working with
this type of tech *also* are large consumers of MySQL, PostgreSQL, the
odd bit of Oracle or SQL Server here or there.  Even they don't think
that the non-relational model is appropriate for everything.  However,
they believe that operations with:
- large degrees of data independence
- very high concurrent query levels
- high levels of throughput
- very strong sensitivity to latency
- a need to scale linearly

Simply don't work well with traditional relational databases, and hence
you have these non-traditional data stores as alternative options for
these types of workloads.  

But just as one example, there's Facebook's Cassandra project, which I
picked because a good friend of mine works for Facebook, and I happened
to have just been reading about it a few weeks ago.  Cassandra is the
rapidly growing semi-structured data store for their user information.
It was started as a way to do full-text search for user inboxes, and is
being extended to support more and more operational data at Facebook.
Some notes from their configuration:
- Approximately 600+ cores as of late '08
- Approximately 120TB of disk space
- 25TB of indexes
- 4B worker threads
- Average ~12ms response time for a search
- Software level features like automatic partitioning, distributed local
and remote replication, insert/append without read, automated data file
collapse and aggregation, 

Now certainly, you can build a >100TB Oracle instance, but the cost and
the complexity would be challenging.  In addition, presumably they only
see this data store growing, and how do you deal with a 200, 300, 400TB
Oracle instance?   Google, for example, in 2006 had approximately 1.2PB
of data in their structured data store.  Heaven knows what it is now.

> I don't know.  But we did not spend anywhere near as much as many
> we
> churn through 0.5TB per day, and it has trebled in one year.
> Our business is good old commercial property management.  Something
> is
> traditionally "low volume"
> Yet our Oracle DW db seems to manage quite well with the above, thank
> Of course:  we collapse data periodically as well. And aggressively

Right, something that is not an option for most of these organizations.
To use the gmail/facebook/my ad startup example, collapsing data means
you lose data.  In the case of the advertising startup, they
realistically can only collapse user persistence data they haven't seen
for a very long time.  Real-time analytics is critical for making ad
display decisions, ad placement optimization, spend analytics, etc.  

Aggregate data is death for some workloads.

> True.  But how many sites are there in general IT that can afford the
> of
> developing and maintaining their own apps from scratch as well as
> implementing
> an entirely new data store technology, incompatible with their
> one?
> I lost track of how many years ago I saw the last one, outside of the
> lunatic
> web fringe.
> The vast majority nowadays is running some form of third party app or
> that
> does most of what they need and refuse point blank to spend one cent
> inhouse development of replacements.
> It might surprise a lot of the web 2.0 folks, but the biggest cost in
> nowadays is inhouse development. Much more so than anything Oracle
> charge.

What you may not realize is that those stats include the cost of the
DBAs, as they get accounted along with the development organization.  

It's all about core competency.  If you're a property management
company, it makes zero sense to build your own email system and search
index.  It has nothing to do with your business.

If your business is vanilla enough, sure, go buy COTS, maybe do a little
tweaking and customization.  If you don't need to write an application,

But there are use cases that don't map to vanilla software packages or
COTS.  For example, when you look at our business, we're an automation
company, and hence we need the ability to have workflows - conditional
execution, branching, parallelism, etc.  Now, there are commercially
available workflow engines that we could have used to power our
automation software.  But a) they don't map properly to the dynamic ways
we need to generate workflows, at least not without enough gyrations
that it isn't worth it, b) that's a software cost that scales with the
product - every time we close a customer, we have to pay a certain
amount to the workflow vendor, and c) the workflow itself *is our core
competency*.  As a comparison, we use things like PostgreSQL, ACE,
OpenSSL, etc. in our product because they're simply convenient pieces of
software that are not core to our business.

So, any sane business - you look at what is or isn't your core
competency, and how closely COTS maps to your core competency, and make
decisions as best you can.  
> Let me cite one small example of how costs can blow out with the web
> stuff.
<snip story about custom development vs. COTS vs. SaaS>
> Cost of re-training staff and users?  Nill!
> Performance and scalability?  It now copes faster with 20 times more
> than
> the original version did, 10 years ago.
> Cost of integration into existing infra-structure?  Nill!
> Try something like this with the new fangled non-traditional data
> and
> their necessarily custom apps and check how much it'll cost.  So much
> the
> "cost-effective web 2.0 cloud" nonsense.

With all due respect, you can hardly hold up one example where a project
was (what sounds to be) poorly managed from start to finish and tar an
entire option.

I have a contravening example.  IHAC that was running on a ridiculously
old legacy, custom written, terminal-driven, ERP system that everyone
loved.  For a series of reasons I can't get into, they made the very
right decision that it was time to upgrade to something "this decade" as
you put it.

They don't manufacture ball bearings - they manufacture unbelievably
complex, very specialized pieces of equipment - to the tune of thousands
of individual parts per units, and they produce a few units a month.

They looked at hiring developers to rewrite their app in something more
modern, and they looked at buying Oracle E-business suite.  They were
sold on the "off the shelf" nature of E-biz, and hired a consulting firm
to do the customizations for the reporting, etc. for their business.

The result?  The project was delayed by a year and a half, users hated
it, it screwed up manufacturing orders, and was overall a huge mess.

The mistake they made was that manufacturing management *is* a core
competency for them, given their business.  Trying to map a traditional
solution to their model created something that was half off-the-shelf,
half written from scratch, and all a mess.

> > Of course, the article is overblown and hyperbolic, because that
> > for a much better story.
> Exactly.  That seems to be a constant with the web 2.0 brigade. It
> help
> their cause one single bit: everyone still remembers the web 1.0 tech
> wreck,
> where the same was rampant.

I don't know - a lot of great stuff came out of the Web 1.0 "tech
- Linux
- Commodity compute
- Distributed clusters
- Grid Computing
- MySQL/PostgreSQL
- Open Source
- Web-based applications
- Content Delivery Networks
- Datacenter Automation/Configuration Management

These are all things that either became powerhouses in their own right,
or fueled the next gen of technology.  

To be honest, I hear the same hype from traditional Enterprise IT, and
even from Oracle itself.  Let's sample the main link on

" With the launch of Oracle Fusion Middleware 11g, Oracle is
fundamentally transforming the way its customers develop, run and manage
their custom, packaged and composite business applications.
Unprecedented integration across the industry's most complete middleware
stack-including application server, SOA, BPM, BI and content management
technologies-will help Oracle customers build agile, adaptable
applications in ways that were not possible until now."

There's not a bit of hype there.  
> Fact is: "not going anywhere" is tremendously cost-effective and
> if perfectly capable of coping with general purpose requirements.
> Storage models that purport to be "better" need to first define
> how general purpose they can be.
> Any fool can create a custom designed system, with custom designed
> and end up with a fast result.  Heck: I know quite a few folks who
> write a lot of
> apps in Assembler and make them lighting fast.  Still true today.
> Would anyone in the enterprise universe pay them to do so?  No way
> Are web 2.0 and these non-traditional data stores easily maintainable?
> No: it is custom code, any changes will involve costly recoding.
> it "refactoring" instead of "recoding" doesn't make it any less
> Change in requirements is a constant in modern IT. Ergo: these
> technologies are
> inappropriate and costly.

I think it's odd you'd assume that they're not "easily maintainable",
especially if we're comparing it to Oracle.  First of all, you have
access to the code, and if there were a critical issue, you could walk
over to the developers who wrote it, smack them on the head, and make
them fix it.

In addition, the levels of operational efficiency that have been
suggested by folks like Google, etc. are extraordinary.  While they
develop their own software in-house, they build it to be fault-tolerant
and self-healing, and hence numbers are frequently thrown around of tens
of thousands of servers per administrator.  I tried to find some hard
stats around this, but they keep it close to the vest.

Again, not to keep hammering this home, it's about your core competency.
If your organization's core competency is IT in one way or another, then
it might make sense to build something rather than buy it.

The beauty of open-source today is, these companies are open sourcing
what they've created.  Now, if Cassandra looks like the right solution
for you - you don't need to build it.  Just download and install it.
You can then decide if you want to develop a competency in supporting
it, but that gets rid of the whole overhead in writing it from scratch.

> True IT professionalism and responsibility picks a general purpose
> store
> and app technology and makes it perform within the requirements, for a
> much
> reduced overall cost and with easy and cheap maintenance.
> That is what the IT enterprise market is all about. Ferraris are great
> show,
> but what is really cost effective for day to day use is a station
> The
> rest is hype.

To extend this analogy further, if what you really need is an 18-wheeler
with refrigeration and three different levels of chilled compartments,
you don't buy three times as many station wagons and put varying levels
of ice in them.  You build an 18 wheeler with what you need.

I'll give you an example - Back in the Day (tm) at, we had
a fraction of the budget of a lot of other web startups, and hence we
wrote our own monitoring software, and bought Linux boxes, and invested
in smart load balancers, etc.  I remember when I was building out our
collocation facilities, the other startups around us were all using nice
big Solaris boxes.  When we were rackmounting the VA Linux boxes we were
buying for $2k/ea, people would literally come ask me why I had so many
tiny boxes, and thought it freakish that anyone would run Linux for the
website.  After all, "Solaris is a REAL Operating System".

And for sure, we hit bugs in Linux that we would not have hit with
Solaris, and we had to accept a higher level of downtime at an
individual server level.  But we built that into our platform and our
load balancers and our monitoring infrastructure.  And in the end, we
were able to build, manage, scale, monitor, and operate that farm for
less than just the CapEx would have been to buy the equivalent capacity
in Sun gear and an off the shelf monitoring solution.

These days, almost everyone uses Linux somewhere in their
infrastructure.  Many people still use Solaris.  They each serve a
purpose.  But this was something that was "new" and "hyped" and turned
out to actually be pretty darn good.

> > So why can't we have both?
> Of course we can have - and need! - both.  Ferraris do exist and serve
> purpose.  What we can hardly afford is yet another round of demented
> black"
> where the whole of IT is told to ditch tried and proven cost effective
> technology for something that can only fit, at the very and costly
best, a
> niche.
> Which is what those articles are clearly promoting and why they need
to be
> exposed for the fraud they are.

It's not "fraud", it's just "hype", something that is rampant in
technology, and the world in general.  It would be nice if reporters
were a little more skeptical.



Other related posts: