Re: OEL - fork: Resource temporarily unavailable

4000 concurrent sessions per node? I assume you mean active sessions.
This triggers a question, how much resources you have available on each of
the nodes to support all these connections?

RAM 128G, you mentioned that. But, is it enough?

Are you also experiencing performance problems with the existing sessions?

Anyway, >>>?fork: Resource temporarily unavailable?<<< is pretty much self
explanatory. At large, it indicates a resource problem.

From your strace input it looks clone call is failing to create a child
process.

*****
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b9c118e8670) = -1 EAGAIN (Resource temporarily unavailable)
********

From checking clone man page it looks like it failed while creating a child
process due to too many processes already running (EAGAIN error for clone).

http://linux.die.net/man/2/clone

According to fork man pages, this call could fail due to the following
errors:

http://linux.die.net/man/2/fork
*****************
1. EAGAIN

fork() cannot allocate sufficient memory to copy the parent's page tables
and allocate a task structure for the child.

2. EAGAIN

It was not possible to create a new process because the caller's
RLIMIT_NPROC resource limit was encountered. To exceed this limit, the
process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE
capability.

3. ENOMEM

fork() failed to allocate the necessary kernel structures because memory is
tight.
*****************

Check your memory consumption and see if there is enough memory available.

You also indicated there are 32K processes were running on the server when
the issue was happening. Have you checked for any defunct/zombie processes?

Aside from what I?ve indicated above, you may also be hitting some of the
known 11.2 bugs, such as 8841501, 9356344, 9398412, 9944177, 9234660,
9855476 (Check MOS Note# 1062676.1)

Although CPU might not be a problem for this particular case, running 4K
processes concurrently may also cause heavy CPU utilization. How many
CPUs(cores) each node has? Knowing it is RAC environment, if CPU is 100%
utilized (unless you use resource manager) you may also experience the heavy
utilized node to be evicted. Has this happened? --- Maybe you should think
of using some connection pooling mechanism, or if you already use it to
check if it is used appropriately/efficiently. Stephane?s comment about
shared servers is also valid.


Hope this helps.

~Mihajlo


> On 09/22/2011 06:12 AM, Upendra N wrote:
> > yeah. This is a very heavily used app/db, we see 4000 co...
>
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l


Other related posts: