I want to thank everyone who took the time to respond. It was very insightful
information and I learned a few things. While researching this myself, I did
find that the dNFS vs. ASM debate is not a religious war like many things. Not
too many people seem to have a strong opinion yet one way or another. Don't
get me wrong, there are proponents for both sides but no one seems ready to
sacrifice their first born for their beliefs...strange behavior for internet
debate...I applaud it. It does make it easier for me to see the facts without
too much emotion behind them.
One thing I am beginning to wonder is I am not sure it is time for my company
to consider dNFS. For one, I have yet to find any really compelling reasons to
switch...key word is YET...I am still researching and have just begun testing
myself. One big reason to keep our current ASM configuration is because that
is what my company is very familiar with and it is used in every database we
have. Not sure I need to "fix what isn't broken". Also, moving to dNFS will
mostly likely introduce 2-3 years of having both ASM and dNFS as we migrate our
systems from one to the other...this would require us to support two
configurations.
Let me give some feedback on some of the replies I got...
Mladen: That depends on how do you do ASM. If the drives are iSCSI on a machine
without the proper HBA, then dNFS is a clear choice, since it's much easier to
administer and will even perform better than iSCSI. For FC connections and
iSCSI with the proper HBA, ASM will perform better. Since RAC is ALWAYS about
performance, you should choose what performs better. Generating an artificial
load similar to your workload by Swingbench or HammerOra should provide a good
benchmark.
I will have to go back and review our HBA setup to make sure I understand it
and that it is "right". So far, in my testing using Swingbench, I have found
throughput to pretty even between both ASM and dNFS. However, I think I need
to drive more activity to push the underlying storage to see which one falls
off first.
Seth:
You do not need ASM for OCR and voting disks. These can be on a supported
cluster file system or over standard NFS.
I did not know this. I think non-ASM OCR/voting was only supported for
upgrades of the clusterware from <11.2 to 11.2. I have never tried launching
the GI installer without candidate ASM disks ready which the GI installer will
then launch into the CRS Disk Group setup screen. Even the documentation says
block and raw devices are no longer supported but I guess NFS is not considered
a block storage device?
http://docs.oracle.com/cd/E11882_01/install.112/e41961/storage.htm#CWLIN312
If I am interpreting this incorrectly, I am open to a learning moment here!
It is unclear to me why storage snapshots for ASM disk groups required you to
use RDMs. Could you not snap multiple VMDKs at the same time?
We tried this but had trouble getting it to work. Could be a problem with our
set up and not enough knowledge but we have some pretty good NetApp and VMware
folks.
All that being said, dNFS has lots of benefits over ASM as well and as I assume
you were alluding to, is not mutually exclusive to ASM. NFS in general is
obviously much more flexible than ASM including the ability to use CloneDB.
Yes! And this is what attracts us to dNFS but we want to make sure we
understand what, if anything we are giving up by ditching ASM...especially in
terms of performance.
Amir: If you have an IO intensive system, you may want to stick with FC. dNFS
has been working fine for us for those systems that do not do a lot of
throughput, like SOA databases, etc. However, for heavy-duty ERP systems, even
though we have implemented 10gbe end-to-end (from hosts to switches to
NAS/heads), we are barely meeting the performance. All of our vendors,
including Oracle, storage vendor and network vendor looked at their
infrastructure for literally months but no one was able to pinpoint where the
bottleneck was coming from. We ended up moving two of our Oracle ERP systems
back to FC and will move the remaining ERP systems in the near future.
This is the sort of thing we are afraid of encountering.
Kyle: NFS is the future, has larger bandwidth than FC, market is growing faster
than FC, cheaper, easier, more flexible, cloud ready and improving faster than
FC.
In my benchmarking, FC and NFS, throughput and latency are on par given similar
speed NICs and HBAs and properly setup network fabric.
Simple issues like having routers on the NFS path can kill performance.
Latency:
NFS has a longer code path than FC and with it comes some extra latency but
usually not that much. In my tests one could push 8K over 10GbE in about 200us
with NFS where as over FC you can get it around 50us. Now that's 4x slower on
NFS but that's without any disk I/O. If disk I/O is 6.00 ms then adding 0.15ms
transfer time is lost in the wash. That on top of the issue that FC is often
not that tuned so what could be done in 50us ends up taking 100-150us and is
alms the same as NFS.
I've heard of efforts are being made to shorten NFS code path, but don't have
details.
Throughput
NFS is awesome for throughput. It's easy to configure and on things like VMware
it is easy to bond multiple NICs. You can even change the config dynamically
while the VMs are running.
NFS is already has 100GbE NICs and is shooting for 200GbE next year.
FC on the other hand has just gotten 32G and doesn't look like that will start
to get deployed until next year and even then will be expensive.
Analyzing Performance on NFS
If you are having performance issues on NFS and can't figure out why, one cool
thing to do is take tcpdump on the receiver as well as sender side and compare
the timings. The problem is either the sender, network or receiver. Once you
know which the analysis can be dialed in.
Thanks for the links and the thoughtful insight...again, more of what I am
looking for. It does indeed sound like NFS will be a better choice in the
future...but, is it enough reason for us to consider switching right now? For
one, we just got 10gE...100gE is not even a twinkle in our infrastructure's eye
as far as I know. As long as it performs on par with FC and the flexibility
and available features with dNFS pan out, that could be reason enough to
switch...tough decisions ahead.
Stefan: My clients are using both ASM with FC and dNFS or kNFS for older Oracle
releases.
I recently did an I/O benchmark at a client environment (VSphere 6, OEL 6.7 as
guest, Oracle 12c, NetApp NFS, 10GE, no Jumbo Frames, W-RSIZE 64k) with SLOB
and we reached out close to the max of 1GB/s by an average single block I/O
performance of 4 ms (if it was coming from disk it was round about
8-10 ms and the other stuff was coming from storage cache).
I just comment some of your points.
2a) You can do this with ASM or dNFS by RMAN. I highly recommend that you do
not rely on storage snapshot / backup mechanism only as you will not notice any
physical or logical block corruption until it may be too late. Trust me i have
seen more than enough of such cases.
4b) When you are using dNFS in a VMWare environment for Oracle you have no
VMDKs for the Oracle files (data,temp,control,redo,arch) at all. You map the
NFS share directly into the VM and access it via dNFS inside the VM. You only
have VMDKs for the OS (and Oracle software) for example. In addition to scale
with dNFS you may not do NIC teaming on VMware level, but rather put each
interface into the VM and let dNFS do all the load balancing, etc.
(e.g. ARP).
In sum nowadays there is no reason to demonize NFS for Oracle (with dNFS). It
works very well with good performance (FC like).
... i am a kid from the FC decade and i am saying this ;-)
Thanks for your experience and comments. We are not using Snaps for our total
backup solution...we still use RMAN as a first priority. Snaps are there if we
can use them and for cloning. However, I am glad you reminded me of that fact
as I have been considering coming up with a snap only strategy for our larger
databases (as long as we can mirror the snaps to a geographically separate
site). I see I will have to remember to run RMAN commands (or DBV) to make
sure corruption is not an issue.
Chris..
_____________________________________________________________________
Chris Ruel * Oracle Database Administrator * Lincoln Financial Group
cruel@xxxxxxx<mailto:cruel@xxxxxxx> * Desk:317.759.2172 * Cell 317.523.8482
Notice of Confidentiality: **This E-mail and any of its attachments may contain
Lincoln National Corporation proprietary information, which is privileged,
confidential,
or subject to copyright belonging to the Lincoln National Corporation family of
companies. This E-mail is intended solely for the use of the individual or
entity to
which it is addressed. If you are not the intended recipient of this E-mail,
you are
hereby notified that any dissemination, distribution, copying, or action taken
in
relation to the contents of and attachments to this E-mail is strictly
prohibited
and may be unlawful. If you have received this E-mail in error, please notify
the
sender immediately and permanently delete the original and any copy of this
E-mail
and any printout. Thank You.**