Linux fs.aio-max-nr Leak?

  • From: Kenny Payton <k3nnyp@xxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 29 Sep 2014 10:50:46 -0400

Curious what others are seeing with asynch io requests.  I'm running 11gR2
on ASM and have seen what appears to be aio request leaks for a number of
years now on various versions.  Oracle's recommended suggestion is to set
this at 1M but I've seen it float upwards of 5M oustanding requests between
bounces.  Bouncing the database frees them up and starts over but typically
I just bump the max on the server dynamically and go on about my day.  Most
recently we hit our 5M ceiling, unexpectedly because our monitor was broken
during a recent monitoring system upgrade.  We have bumped our ceiling to
10M and have our monitor back working reporting when we cross 50%.

These are pretty active databases.  The instance in question for this event
is 20T, all flash storage, in size and runs around 15k iops.  Oracle Linux
6.3.

Typically sessions return an error to the client stating max aio has been
reached but this particular case we had an odd scenario.  A number of
sessions wrote the message to their trace file but instead of returning the
error to the client and aborting the statement the sessions spun on cpu.
strace nor 10046 returned any results from the process and ultimately we
had to kill -9 the processes to free up the resources.  The one thing all
of these sessions had in common was they were all accessing, some updates
while others just select, LOB segments.  Possibly a bug in the LOB access
code path that is not handling the aio os message.

Thanks,
Kenny

Other related posts: