Understanding Terracotta caching

  • From: Sandra Becker <sbecker6925@xxxxxxxxx>
  • To: oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 21 Jan 2010 10:18:40 -0700

Platform:  IBM Series z
OS:         SUSE 10
Oracle:    EE 10gR2

*Current situation*:

Concern that the application is not scalable.  Some large tables are have
heavy loads, a mix of read/write, heavier on the reads.  Some inefficient
code on top of a poor design have been causing performance issues.  We both
write and host the application for our customers.

*Tables*:

   1. c - used by a small portion of the transactions,  a mix of read/write,
   code/indexes fairly efficient, have never seen this table as a bottleneck
   2. d - critical table used in approximately 60% of all code, 33M+ rows,
   lots of DML and reads, inefficient code, poor design, frequently a
   bottleneck
   3. i - important table, 7.5M+ rows, inefficient code, lots of DML and
   reads, sometimes it and its children are a bottleneck
   4. m - critical table, 7.4M+ rows, inefficient code, combination of
   columns in tables d and i because "joins are bad", all inserts to d and i
   requie inserts to this table as well, updates on several columns in d and i
   require updates on the same columns in this table, the state column can have
   any one of 13 different values, frequently a bottleneck
   5. p - heavily used, 1.1M+ rows, predominately reads a little DML, has
   been a bottleneck at times in the past 6 months
   6. u - very heavily used, 151K+ rows, predominately reads some DML, some
   inefficient code but not generally a bottleneck

*Proposal:*
Use Terracotta caching to make the application more scalable as we add more
customers and users.

The database architect is proposing that table c be fully cached because he
thinks it is important.  He also proposes only one state of the 13 for table
m be cached.  The working theory is that this single state accounts for most
of the reads against the table and the other 12 states are frequently
updated/inserted.  We both agree that the u table should be cached.  He is
planning to refresh the cache every 5 to 10 minutes.  When I asked him what
were we expecting the end user to see, he said there shouldn't be any change
in performance but this would make the application scalable.  I also asked
what was the expected benefit if he was freshing the cache that often.  I
didn't get an answer.

I suggested that the p table be cached rather than the c table since it is
used more often in the application and doesn't have as much DML.  I also
suggested that we do some research to determine if the state on the m table
he wants to cache is really the most often requested state for reads.  He
nixed that idea saying he knew it was.  By reviewing the sql being executed
today for the m table, I see that it shows it is not the most frequently
requested state for reads but I can't get him to consider he's wrong.  I
also went to the Terracotta website to get more information about their EX
caching and see if they had any recommendations, which they did.

*Questions:*

   1. My understanding of caching in general and Terracotta in particular is
   you should cache heavily used tables that don't experience a lot of DML.  Is
   this an incorrect understanding?  Terracotta says less than 50% of
   transacations against the table should be DML so that's partly what I base
   my opinion on.
   2. My theory is we should focus on those parts of the application that we
   *KNOW* are the bottlenecks rather than just *ASSUMING* we know where they
   are.  If we don't know, then we should research.  I had some success
   researching/tuning other parts of inefficient code so I'm certain we can
   effect some positive changes in this area.  Am I totally off base here
   thinking we should do more research/testing?
   3. I suggested we spend more time tuning the parts of the code we know
   are inefficient along with looking at the caching.  That was rejected as
   taking too much time even though the same code must be reviewed to use the
   caching.  I'm really lost on this one as to why you wouldn't look at both
   options at the same time.  Is that unreasonable?
   4. Is refreshing the cache every 5 to 10 minutes a reasonable time
   period?  Is it too often, not often enough, etc.  Since the refresh has to
   read the table, what should I be looking at to see if the benefits are
   really there?
   5. Terracotta says to look at the cache/hit ratio for the table to
   determine if it really is a good candidate.  How do I look at cache/hit
   ratios for tables?

Thanks in advance for any clarifcation you can give me.

Sandy
Transzap, Inc.

Other related posts: