Re: OOM killer terminating database on AWS EC2

  • From: Fernando Andrade <correo@xxxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 13 Jan 2020 17:03:49 -0500

Hi Sandy

In AWS you can use SES for sending the emails, also use cloudwatch to monitor at a process level.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-procstat-process-metrics.html#CloudWatch-Agent-procstat-configuration

FJA

On 1/13/2020 3:44 PM, Mark J. Bobak wrote:

Hi Sandy,

I know it's (almost certainly) happening *way* above your level, but dropping Oracle support on *any* database, let alone a production database, is foolishness, and certainly *not* a cost savings, not in the long run.....

I run Oracle on EC2, w/ mail enabled, and so far, have never run into an OOM situation.  The system has to be *really* low on memory for the kernel's OOM killer to wake up and start killing stuff.  When it does, Oracle is a big target, because it (almost certainly) is (and should be) the big memory consumer on your (EC2) instance.

Some questions:
1.)  What instance type(s) are you running?  Do you have instance store volumes configured for swap?  Do you have swap configured at all?  What is the level of swap usage you are seeing?
2.)  How is your Oracle memory usage configured?  Do you have hugepages configured?  (Please say yes....)
3.)  What do the outputs of 'free -h' and 'top' tell you? How about 'vmstat'?  'sar -B'?

-Mark


On Mon, Jan 13, 2020 at 2:33 PM Sandra Becker <sbecker6925@xxxxxxxxx <mailto:sbecker6925@xxxxxxxxx>> wrote:

    Server:   AWS EC2
    RHEL:   7.6
    Oracle:  12.1.0.2

    We have a database on an AWS EC2 server that the OOM killer has
    terminated twice in the last 5 days, both times it was the
    ora_dbw0_dwprod process.  On 1/8 postfix was enabled to allow us
    to email the DBA team through an AWS relay server when a backup
    failed.  We stopped running daily backups and cronjobs that did a
    quick check for expired accounts.  We've left postfix enabled for
    sending emails.  We are searching for answers but have none yet as
    to why this is happening.  We also no longer have Oracle support
    available to us.  (management saving money again).

    Questions:

     1. Could postfix be related to the memory issues even though we
        haven't sent any emails since the first crash 5 days ago?
     2. How can we monitor the memory usage of  an EC2 instance?
     3. How do you disable the OOM killer in EC2 should we decide to
        go that route?  (we have it disabled on our on-prem servers) 
        The docs I've found so far have not been helpful.

    I appreciate any help you can give us or pointing us in the right
    direction.

    Thank you,
-- Sandy B.

Other related posts: