Re: AWS RDS Detecting failover

From: Schneider <schneider@xxxxxxxxxxxxxx>
To: stbaldwin@xxxxxxxx
Date: Wed, 4 Oct 2017 15:20:31 -0700

On Mon, Oct 2, 2017 at 12:48 PM, Steve T. Baldwin <stbaldwin@xxxxxxxx> wrote:

In my testing, when a client is connected to a multi-az RDS instance and I
force failover, that client doesn't 'see' it. If it makes any DB request
after or during the failover it ends up timing out - eventually.
Unfortunately this timeout is controlled by the tcp keepalive setting which
defaults to 2 hours. Not very helpful when the actual failover can be
complete in a couple of minutes.

FWIW, I remember having a similar "hanging dead connections" issue
after several planned maintenance operations on RAC clusters (rolling
updates that restarted each instance) a couple years ago. We were
unable for some reason to use ONS and FCF to automatically cleanup
cached connections. If I recall correctly, we ended up updating the
linux keepalive parameters across our fleet to improve the situation.
One positive was that much of our fleet was configured through ansible
at the time so we were able to roll updates out in a controlled and
yet automated manner across multiple data centers.

So this problem might not be limited to cloud deployments; maybe this
is something you could run into with other HA/failover setups too?

-Jeremy
--
//www.freelists.org/webpage/oracle-l

References:
- AWS RDS Detecting failover
  - From: Steve T. Baldwin

Re: AWS RDS Detecting failover

Other related posts: