Re: SSD usage

  • From: Tanel Poder <tanel@xxxxxxxxxxxxxx>
  • To: Tim Hall <tim@xxxxxxxxxxxxxxx>
  • Date: Wed, 16 Dec 2015 21:22:10 -0600

If you measure write performance on an idle Exadata machine without any
other load going on, you are not comparing flash vs disk, you are comparing
flash vs the battery-backed 512MB RAM cache in the "RAID" controllers
within each storage cell!

This is how the "disk" that's supposed to have a couple of milliseconds of
average latency (it still rotates and needs to seek + calibrate to next
track even in sequential writes) gives you sub-millisecond write
latencies... it's not the disk write, it's the controller's RAM write that
gets acknowledged.

And now when you run a real workload on the machine (lots of random IOs on
the disk and Smart Scans hammering them too), your disk writes won't be
always acknowledged by the controller RAM cache. When comparing *busy*
flash disks to *busy* spinning disks vs. *idle* flash disks vs *idle*
spinning disks (with non-dirty write cache) you will get different results.

So, I'm not arguing here that flash is somehow faster for sequential writes
than a bunch of disks when talking about throughput. But if you care about
latency (of your commits) you need to be aware of everything else that will
be going on on these disks (and account for this in your benchmarks).

Without queueing time included, a busy flash device will "seek" where
needed and perform the write in under a millisecond, a busy disk device in
6-10 milliseconds. So your commits will end up having to wait for longer
(yes, your throughput will be ok due to the LGWR writing multiple
transactions redo out in a single write, but this doesn't change the fact
that individual commit latency suffers).

This latency issue of course will be mitigated when you are using a decent
storage array with enough (well-managed) write cache.

So I'd say there are the following things you can compare (and need to be
aware of which hardware are you really benchmarking):

1) Flash storage
2) Disk storage without (write) cache
3) Disk storage with crappy (write) cache
4) Disk storage with lots of well-managed & isolated (write) cache

And the second thing to be aware of:

1) Are you the single user on an idle storage array
2) Are you just one of the many users in a heavily utilized (and randomly
seeking) storage array

So, as usual, run a realistic workload and test it out yourself (if you
have the hardware :)

Tanel.

On Wed, Dec 16, 2015 at 10:40 AM, Tim Hall <tim@xxxxxxxxxxxxxxx> wrote:

Hi.

The experience of Exadata Smart Flash Logging may prove useful in
making a decision. This feature allows Exadata to make two writes, one
to flash and one to disk. Whichever completes first wins. If you read
some of the posts about this feature, you will see a number of people
saying that very few writes complete first on flash. In Jason Arneil's
post it was 2.7% of log writes went to flash. In all other cases, disk
beat flash. Results will vary depending on the type of workload.

What does this mean in general terms, especially for non-Exadata
people? Redo on flash is not guaranteed to improve performance. In
some workloads, it could actually reduce performance.

As Kellyn mentioned in the quoted text above, you may get better bang
for your buck putting other things on flash like data files. As
always, the answer is "it depends". :) You need to make a change, test
it, rinse and repeat. :)

Cheers

Tim...
--
//www.freelists.org/webpage/oracle-l



Other related posts: