RE: ASM vs. dNFS?

From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
To: kyle Hailey <kylelf@xxxxxxxxx>
Date: Wed, 24 Feb 2016 02:12:34 +0000

In our case, there is no router between the switches, servers or the EMC Data
Movers. Jumbo frames are configured end-to-end. Cisco, EMC and Oracle’s network
engineers (These are M5000 servers) looked at the configurations end-to-end
multiple times but could not find anything. We also reached out to Kevin
Closson and he helped us run a lot of diagnostic tests which clearly showed
that the throughout was not what it should be but no one was able to point out
the root cause. The issue was escalated to higher ups within Oracle and
Oracle’s dNFS development team gave us a beta patch to see if by increasing the
dNFS buffer size would alleviate the problem but that patch had even more
adverse impact on the performance. This diagnostics exercise lasted for months
but no smoking gun was found. Now, it is entirely possible that the bottleneck
is somewhere in our infrastructure but if all the vendors are not able to
determine what is causing the slowness then I would think that due to the
umpteen layers involved in the NFS architecture, it is quite a complicated
beast.
We also had two other Oracle ERP systems configured with dNFS and had the
similar issue. Those systems were running on RHL and UCS and the moment we
moved them back to FC, their performance improved considerably and the
customers stopped complaining.

From: kyle Hailey [mailto:kylelf@xxxxxxxxx]
Sent: Tuesday, February 23, 2016 7:44 PM
To: Hameed, Amir
Cc: sethmiller.sm@xxxxxxxxx; Ruel, Chris; oracle-l@xxxxxxxxxxxxx
Subject: Re: ASM vs. dNFS?

I thought the FC vs NFS debate was dead back when Kevin Closson jokingly posted
"real men only use
FC<http://kevinclosson.net/2007/06/14/manly-men-deploy-oracle-with-fibre-channel-only-oracle-over-nfs-is-weird/>"
almost a decade ago.  A recent example is Top 7 Reasons FC is
doomed<http://www.mellanox.com/blog/2015/12/top-7-reasons-why-fibre-channel-is-doomed/>

NFS is the future, has larger bandwidth than FC, market is growing faster than
FC, cheaper, easier, more flexible, cloud ready and improving faster than FC.

In my benchmarking, FC and NFS, throughput and latency are on par given similar
speed NICs and HBAs and properly setup network fabric.

Simple issues like having routers on the NFS path can kill performance.

Latency:

NFS has a longer code path than FC and with it comes some extra latency but
usually not that much. In my tests one could push 8K over 10GbE in about 200us
with NFS where as over FC you can get it around 50us. Now that's 4x slower on
NFS but that's without any disk I/O. If disk I/O is 6.00 ms then adding 0.15ms
transfer time is lost in the wash. That on top of the issue that FC is often
not that tuned so what could be done in 50us ends up taking 100-150us and is
alms the same as NFS.

I've heard of efforts are being made to shorten NFS code path, but don't have
details.

Throughput

NFS is awesome for throughput. It's easy to configure and on things like VMware
it is easy to bond multiple NICs. You can even change the config dynamically
while the VMs are running.

NFS is already has 100GbE NICs and is shooting for 200GbE next year.

FC on the other hand has just gotten 32G and doesn't look like that will start
to get deployed until next year and even then will be expensive.

Analyzing Performance on NFS

If you are having performance issues on NFS and can't figure out why, one cool
thing to do is take tcpdump on the receiver as well as sender side and compare
the timings. The problem is either the sender, network or receiver. Once you
know which the analysis can be dialed in. See
http://datavirtualizer.com/tcp-trace-analysis-for-nfs/

Best
Kyle

On Tue, Feb 23, 2016 at 12:19 PM, Hameed, Amir
<Amir.Hameed@xxxxxxxxx<mailto:Amir.Hameed@xxxxxxxxx>> wrote:
My two cents based on my experience with running dNFS since 2011.
If you have an IO intensive system, you may want to stick with FC. dNFS has
been working fine for us for those systems that do not do a lot of throughput,
like SOA databases, etc. However, for heavy-duty ERP systems, even though we
have implemented 10gbe end-to-end (from hosts to switches to NAS/heads), we are
barely meeting the performance. All of our vendors, including Oracle, storage
vendor and network vendor looked at their infrastructure for literally months
but no one was able to pinpoint where the bottleneck was coming from. We ended
up moving two of our Oracle ERP systems back to FC and will move the remaining
ERP systems in the near future.

From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>] On
Behalf Of Seth Miller
Sent: Tuesday, February 23, 2016 3:02 PM
To: Ruel, Chris
Cc: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Re: ASM vs. dNFS?

Chris,

You do not need ASM for OCR and voting disks. These can be on a supported
cluster file system or over standard NFS.

You are correct that ASM disk groups are all or nothing for storage snapshots.
However, it is not necessary that each database be in its own disk group. For
example, I have two databases in the same disk group, I snap the storage of
that disk group, mount it up to another server and just delete the files of the
other database. This way, you've reduced the management overhead and used the
space more efficiently without using any extra space or losing the ability to
do snapshots. This addresses points 2 and 3.

It is unclear to me why storage snapshots for ASM disk groups required you to
use RDMs. Could you not snap multiple VMDKs at the same time?

All that being said, dNFS has lots of benefits over ASM as well and as I assume
you were alluding to, is not mutually exclusive to ASM. NFS in general is
obviously much more flexible than ASM including the ability to use CloneDB.

Don't forget that you have a third option which may include just enough of
either option to be worth trying, ACFS.

Seth Miller

On Tue, Feb 23, 2016 at 9:35 AM, Ruel, Chris
<Chris.Ruel@xxxxxxx<mailto:Chris.Ruel@xxxxxxx>> wrote:
This is sort of long so bear with me…

TL;DR: Who has compared ASM vs. dNFS (not used together) and what did you
choose and why?

I was wondering if anyone on the list has opinions on, or has evaluated, ASM
vs. dNFS in a mutually exclusive configuration for datafile/arch/ctl/redo
storage?

We have been using ASM on NetApp over fiber channel for many years now with
great success.  We particularly like the ability to add/remove spinning disks
or SSD on the fly.  We can even "upgrade" our filers with no down time by
adding in and removing LUNs and letting ASM do it's rebalance thing.

Recently, some new technology changes have become available for us.  These
changes are in the form of moving our compute platform to UCS/Flexpod
environments and the introduction of VMware.  Operating on the UCS gives us
access to 10gE (currently our infrastructure is primarily 1gE) which brings the
option of using dNFS to the table.

Now, I am just starting down the path of comparing the two for pluses and
minuses and I do not have all the data yet.  Thought I would reach out to the
list.

There are a few things that attract us to dNFS:

1.       Less complication…maybe?  In RAC environments, I still think we need
ASM for OCR/Voting…someone correct me if I am wrong.  But, we will not have to
manage ASM disk groups like we do now.  However, after so many years of using
ASM, our team is pretty well versed in it…so, is it really an added
complication?

2.       Better ability to use NetApp snapshots:

a.       We can do file level recovery with dNFS which cannot be done with ASM

b.      Right now we have to manage separate disk groups for each database
(when we have multiple databases on a node/cluster) if we want to use NetApp
snaps since restore is done all-or-nothing at the disk group level.  In some
cases, we have hit the maximum number of disk groups (63) in ASM.  I think
multiple disk groups like this also results in more overheard managing and
monitoring.  Furthermore, more disk groups seems to waste more space as it is
sometime hard to predict storage needs…I think in the end the best approach is
to over allocate storage instead of having to manage it constantly.

3.       Our primary OS platform is OELinux x86-64.  Linux has a LUN path limit
of 1024.  That sounds like a lot, but, with multiple LUNs per disk group and
multipathing in place, each disk group takes up a minimum of 8 LUNs.  This is
not to mention LUNs supporting the OS and shares.  Since we need to have
separate disk groups for each database to support snaps, a cluster with a lot
of compute power will either hit the LUN path limit or the ASM disk group limit
before we run out of compute.  My understanding is that we do not have this
limit problem with dNFS.

4.       It seems that dNFS will lend itself better to VMware:

a.       Setting up snaps with ASM on VMware led us down the path of using
RDM's (which have limits feature wise) instead of VMDK's.

b.      VMDK's with dNFS seems like less configuration which will allow for
quicker provisioning on VMware.  VMDK's are also the preferred approach
according to VMware.

c.       Using ASM and LUNs with VMware still is an issue with the 1024 LUN
path limit.  However, it moves to the physical hosts in the ESX cluster...not
just the guest OS.  Therefore, we are seriously limiting the number of guest
OS's on our ESX clusters before we run out of compute because will hit the LUN
path limit first.
So, that's it in a nutshell.  I am sure there is a lot more to it but I
appreciate any input people may have.

Thanks,

Chris..

_____________________________________________________________________
Chris Ruel * Oracle Database Administrator * Lincoln Financial Group
cruel@xxxxxxx<mailto:cruel@xxxxxxx> * Desk:317.759.2172<tel:317.759.2172> *
Cell 317.523.8482<tel:317.523.8482>

Notice of Confidentiality: **This E-mail and any of its attachments may contain
Lincoln National Corporation proprietary information, which is privileged,
confidential,
or subject to copyright belonging to the Lincoln National Corporation family of
companies. This E-mail is intended solely for the use of the individual or
entity to
which it is addressed. If you are not the intended recipient of this E-mail,
you are
hereby notified that any dissemination, distribution, copying, or action taken
in
relation to the contents of and attachments to this E-mail is strictly
prohibited
and may be unlawful. If you have received this E-mail in error, please notify
the
sender immediately and permanently delete the original and any copy of this
E-mail
and any printout. Thank You.**

Follow-Ups:
- Re: ASM vs. dNFS?
  - From: kyle Hailey

References:
- ASM vs. dNFS?
  - From: Ruel, Chris
- Re: ASM vs. dNFS?
  - From: Seth Miller
- RE: ASM vs. dNFS?
  - From: Hameed, Amir
- Re: ASM vs. dNFS?
  - From: kyle Hailey

RE: ASM vs. dNFS?

Other related posts: