[nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation

From: "Garrett D'Amore" <garrett@xxxxxxxxxx>
To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
Date: Wed, 14 Dec 2016 17:28:08 -0800

Please refer to the Open Group specifications for this. fork duplicates
the process memory image, but does not duplicate the threads themselves. I
take that to mean the memory other than for the pthread itself will be
“duplicated”, and that if the only reference to said memory (directly or
indirectly) was on a stack frame for a thread that got left behind, then
the memory will be orphaned.

On Wed, Dec 14, 2016 at 1:57 PM, Karan, Cem F CIV USARMY RDECOM ARL (US) <
cem.f.karan.civ@xxxxxxxx> wrote:

Makes sense.  I can see how the standard library calls are difficult to
deal with; you have no control over how that memory is allocated.  I'm
going to expose my ignorance here; does fork() duplicate structures
allocated by pthreads?  I know it duplicates file descriptors, but I don't
know what else it duplicates.  Off hand, do you know of a complete list of
what fork() duplicates?

Thanks,
Cem Karan

-----Original Message-----
From: nanomsg-bounce@xxxxxxxxxxxxx [mailto:nanomsg-bounce@xxxxxxxxxxxxx]

On Behalf Of Garrett D'Amore

Sent: Wednesday, December 14, 2016 2:31 PM
To: nanomsg@xxxxxxxxxxxxx
Subject: [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and

fork()-safe implementation

All active links contained in this email were disabled. Please verify

the identity of the sender, and confirm the authenticity of all links

contained within the message prior to copying and pasting the address to

a Web browser.

________________________________

Initially, I’m not handling it at all… the expectation is that the thing

that happens after fork() is exec().  (In which case the whole image is

replaced, so it does not matter.)   If you reenter the library in the

child after the fork, the library will “panic” — emit an error message to

stderr, and call abort().  Essentially I’m making it explicit that this

operation is verboten.

In the future, if I ever do the work to make this really fork safe,

where we completely reinitialize the library in the child, I’ll need to use
a

different allocator.  I can either track all memory objects allocated

via my nni_alloc() routine on a linked list, or if I have an allocator that

lets me manage the entire arena, I can just discard the arena.  The

latter is preferable in that it eliminates a lot of overhead, and I don’t

have to figure out how to keep track of the structures allocated

separately.

If I have an arena based approach, or slab based approach, I can avoid

having to block fork operation except when I grow the arena.  But

when I do, or on every alloc/free if I keep a global linked list, I’d

have to acquire a resource that blocks fork (such as a reader writer lock,

that is acquired in write mode in prefork(), and released in parent().)

This is a fair bit of complexity, for an uncommon use case, so I’m not

investing any time in it at present, now that I’ve verified that there is

a clear path to making this work in my implementation if there is a

need.  Btw, this same approach could be used with nanomsg currently

— you’d just basically have to provide an alternate implementation for

nn_alloc() and nn_free().  With these the linked list approach looks

not so bad since they are already doing some fairly dirty things under

the covers.

Note that this will do *nothing* to recover memory allocated by the C

library for underlying things like pthread structures or internal

buffers used for open files.  You can see why the problem is rather

thorny — so much so that the POSIX committee was convinced to

abandon efforts to solve it.

On Wed, Dec 14, 2016 at 10:37 AM, Karan, Cem F CIV USARMY RDECOM ARL

(US) <cem.f.karan.civ@xxxxxxxx < Caution-

mailto:cem.f.karan.civ@xxxxxxxx ;> > wrote:

      As you wish.  Can you post how you handle the memory leaks after

the fork()?  I'm curious to see what you're going to do.

      Thanks,
      Cem Karan

      > -----Original Message-----
      > From: nanomsg-bounce@xxxxxxxxxxxxx < Caution-mailto:

nanomsg-bounce@xxxxxxxxxxxxx >  [Caution-mailto:nanomsg-

bounce@xxxxxxxxxxxxx < Caution-mailto:nanomsg-bounce@xxxxxxxxxxxxx ;> ]

On Behalf Of Garrett D'Amore

      > Sent: Wednesday, December 14, 2016 10:57 AM
      > To: nanomsg@xxxxxxxxxxxxx < Caution-mailto:nanomsg@xxxxxxxxxxxxx

      > Subject: [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(),

and fork()-safe implementation

      >
      > All active links contained in this email were disabled. Please

verify the identity of the sender, and confirm the authenticity of all

links
      > contained within the message prior to copying and pasting the

address to a Web browser.

      >
      >
      > ________________________________
      >
      >
      >
      > Yeah…um… as an operating system engineer, I generally believe I

can keep track of my own objects.  (Kind of critical when you

work inside
      > a kernel.)  In fact, in libnng I require that the system provide

the *size* of the object with the object at free() time.  This is to

permit
      > porting to platforms where this is a requirement.  (Such a

requirement exists in the Solaris kernel.)  It also allows the use of

much much
      > more efficient allocators, like slab allocators, and not having

to stash the size with the object (which is often automatically

known at
      > *compile* time), so you save quite a bit of lookup, and can

improve the odds of having your object aligned on a natural

boundary (such as
      > a page).
      >
      > Portability to embedded systems is really important to me in

this effort, and so a GC is kind of out of the question.

      >
      >
      > On Wed, Dec 14, 2016 at 6:50 AM, Michael Powell <

mwpowellhtx@xxxxxxxxx < Caution-mailto:mwpowellhtx@xxxxxxxxx ;>  <

Caution-Caution-mailto:mwpowellhtx@xxxxxxxxx ;< Caution-mailto:

mwpowellhtx@xxxxxxxxx >  > > wrote:

      >
      >
      >       On Wed, Dec 14, 2016 at 9:07 AM, Karan, Cem F CIV USARMY

RDECOM ARL

      >       (US) <cem.f.karan.civ@xxxxxxxx < Caution-mailto:

cem.f.karan.civ@xxxxxxxx >  < Caution-Caution-

mailto:cem.f.karan.civ@xxxxxxxx ;< Caution-mailto:cem.f.karan.

civ@xxxxxxxx >  > > wrote:

      >       > Have you considered using a garbage collector?  E.g.

Caution-Caution-http://www.hboehm.info/gc/ ;< Caution-

http://www.hboehm.info/gc/ ;>  < Caution-
      > Caution-http://www.hboehm.info/gc/ ;< Caution-http://www.hboehm.

info/gc/ >  > .  Looking through the header file, it appears

that there are calls specifically for handling forks
      > (GC_set_handle_fork(), GC_atfork_prepare(), GC_atfork_parent(),

GC_atfork_child(), and GC_start_mark_threads()).  Based on

the
      > documentation surrounding GC_start_mark_threads(), it appears

that the collector can handle fork()s that are not followed by

an exec().
      > That may solve the memory leak issues cleanly.  There are also

functions to register finalization methods, so that should handle

dealing
      > with file pointers, etc. that you want to close eventually.
      >
      >       What does a GC gain you, but to make further excuses for

poor coding

      >       practices, in the first place? Been there, done that,

don't need

      >       another T-shirt.
      >
      >
      >       > I use that particular collector in my own work, and it

is quite fast; I allocate a ridiculous number of short-lived objects, and

even
      > then, the profiler shows that garbage collection takes less than

1% of the runtime.  I've never tried forking a child though, so I

don't know
      > how well that part works.
      >       >
      >       > Thanks,
      >       > Cem Karan
      >       >
      >       >> -----Original Message-----
      >       >> From: nanomsg-bounce@xxxxxxxxxxxxx < Caution-mailto:

nanomsg-bounce@xxxxxxxxxxxxx >  < Caution-Caution-

mailto:nanomsg-bounce@xxxxxxxxxxxxx ;< Caution-mailto:nanomsg-bounce@

freelists.org >  >  [Caution-Caution-mailto:nanomsg- ;<

Caution-mailto:nanomsg- ;>
      > bounce@xxxxxxxxxxxxx < Caution-mailto:bounce@xxxxxxxxxxxxx ;>  <

Caution-Caution-mailto:nanomsg-bounce@xxxxxxxxxxxxx ;<

Caution-mailto:nanomsg-bounce@xxxxxxxxxxxxx ;>  > ] On Behalf Of Garrett

D'Amore

      >       >> Sent: Wednesday, December 14, 2016 1:53 AM

      >       >> To: nanomsg@xxxxxxxxxxxxx < Caution-mailto:nanomsg@

freelists.org >  < Caution-Caution-mailto:nanomsg@xxxxxxxxxxxxx ;<

Caution-mailto:nanomsg@xxxxxxxxxxxxx ;>  >
      >       >> Subject: [Non-DoD Source] [nanomsg] Re: On

pthread_atfork(), and fork()-safe implementation

      >       >>
      >       >> All active links contained in this email were disabled.

Please verify the identity of the sender, and confirm the authenticity

of all
      > links
      >       >> contained within the message prior to copying and

pasting the address to a Web browser.

      >       >>
      >       >>
      >       >> ________________________________
      >       >>
      >       >>
      >       >>
      >       >> Well I thought I had a brilliant idea, and I spent a

number of hours this evening trying to bake in a solution.  I eventually

had to
      > throw my
      >       >> hands up in the air.
      >       >>
      >       >> I can see that it *is* possible to build a solution

that leaks *only* any memory used by mutexes and condvars.  That’s

definitely
      > possible.
      >       >> The problem is, the work you have to do for this is

extreme, and it requires you to basically build the equivalent of an

operating
      > system in
      >       >> some ways.  I had a scheme to suspend threads, and mark

regions fork-safe vs. unsafe, etc.  The problem is that in order

to
      > avoid leaking
      >       >> memory, you pretty *have* to manage your own heap — as

in every single memory object in your system has to be

globally
      > discoverable.
      >       >> This turns out to be rather inconvenient if you don’t

also want to build your own memory manager, since some memory

      > objects are going
      >       >> to be used by threads, and frankly I had objects that

were “orphaned” in that they didn’t have any global state to them,

only
      > locally used
      >       >> inside functions in threads.
      >       >>
      >       >> One day I may come back to this, by supplying my own

memory manager that will let me reclaim every allocated object in

the
      > system
      >       >> (perhaps simply by reclaiming the entire heap in one

fell swoop).  I’d also need a way to reclaim files, and handle mutexes

and
      > condvars
      >       >> “magically”.  I’m pretty sure I know how to do that,

and that it can be done in the platform layer.  Which means it can be

done
      > in the
      >       >> future, as a fairly straight-forward retrofit, once I

decide I’m willing to take the larger action to stop using “ordinary”

memory
      >       >> management.  I’ve got enough other stuff to do in the

meantime, that I’m taking my earlier action, which is to panic when

the
      > user
      >       >> attempts to reenter the library from the child after

fork().

      >       >>
      >       >>  - Garrett
      >       >>
      >       >>

      >       >> On Tue, Dec 13, 2016 at 8:51 AM, Garrett D'Amore <

garrett@xxxxxxxxxx < Caution-mailto:garrett@xxxxxxxxxx ;>  <

Caution-Caution-mailto:garrett@xxxxxxxxxx ;< Caution-mailto:

garrett@xxxxxxxxxx >  >  < Caution-

      > Caution-Caution-mailto:garrett@xxxxxxxxxx ;< Caution-mailto:

garrett@xxxxxxxxxx >  < Caution-Caution-

mailto:garrett@xxxxxxxxxx ;< Caution-mailto:garrett@xxxxxxxxxx ;>  >  > >

wrote:

      >       >>
      >       >>
      >       >>       Thanks.  I had planned to design a fork safe

version of things in the new design. I had implemented freeze and thaw

and
      > reset
      >       >> entry points at various points and was pretty sure that

this would have worked well.  Until I discovered that the child side

      > version was not
      >       >> allowed to call any mutex functions or to call free.
      >       >>
      >       >>       I will think about this some more.  Delaying the

child side action might be reasonable and lead to a working solution.

      >       >>
      >       >>       Sent from my iPhone
      >       >>
      >       >>
      >       >>       > On Dec 13, 2016, at 12:20 AM, Franklin Mathieu <

franklinmathieu@xxxxxxxxx < Caution-

mailto:franklinmathieu@xxxxxxxxx ;>  < Caution-
      > Caution-mailto:franklinmathieu@xxxxxxxxx ;< Caution-mailto:

franklinmathieu@xxxxxxxxx >  >  < Caution-Caution-Caution-

mailto:franklinmathieu@xxxxxxxxx ;< Caution-mailto:franklinmathieu

@gmail.com >  < Caution-

      > Caution-mailto:franklinmathieu@xxxxxxxxx ;< Caution-mailto:

franklinmathieu@xxxxxxxxx >  >  >

      >       >> > wrote:
      >       >>       >
      >       >>       > I'm going to give my 2 cents on the matter as I

was the one that initially

      >       >>       > opened the github issue regarding fork()-safety

and I had the time

      >       >>       > to work with different approaches on the matter.
      >       >>       >
      >       >>       > I've been maintaining an unit testing framework

for C that relies on

      >       >>       > worker processes to run tests safely, and as

such, for the longest

      >       >>       > time, this had been implemented with fork()

without a subsequent exec().

      >       >>       > I recently switched the I/O layer of the

framework to use nanomsg

      >       >>       > because it was simple, and it was much more

"correct" than what

      >       >>       > I had been doing before with pipe() shenanigans.
      >       >>       >
      >       >>       > However, as nanomsg isn't fork()-safe, I took a

swab at implementing

      >       >>       > a fork()-safety mechanism, which ended up being

brittle but was

      >       >>       > "good enough" for my purposes, and I reworked

other dependencies

      >       >>       > to make sure they handled forks correctly.
      >       >>       >
      >       >>       > The problem with fork()-safety is that unless

you think of it right at the

      >       >>       > design of the software, you're going to end up

doing something hack-ish;

      >       >>       > which means that the rewrite could be a good

starting point to actually

      >       >>       > implement the structural basis towards

fork()-safety. POSIX might be

      >       >>       > right on target with the problems caused by

pthread_atfork(), but in

      >       >>       > practice there is a lot of wiggle room to do

what we must to make

      >       >>       > things work at fork.
      >       >>       >
      >       >>       > With all of that being said, I've given up

myself on fork()-safety.

      >       >>       > The fact is that there is no single silver

bullet to address this,

      >       >>       > that a lot of software is expecting exec() to

be called after a fork(),

      >       >>       > and that there aren't many use cases in having

worker processes.

      >       >>       >
      >       >>       > I ended up writing a library dedicated to

spawning worker

      >       >>       > processes [1] in a manner that calls fork()

then re-exec()s the current

      >       >>       > executable with a patched main function, which

while not ideal, is

      >       >>       > in my opinion less of a hack than having to

make the software and

      >       >>       > all of its dependencies fork-safe().
      >       >>       >
      >       >>       > This is why I understand your decision of

giving up and panicking

      >       >>       > the process on fork-reentry. You might also be

able to compromise

      >       >>       > by only allowing calls to nng_socket_create

after fork, which could under

      >       >>       > the covers completely drop the current invalid

state and just reinitialize

      >       >>       > the library. This would cause a resource leak,

but allow the usage

      >       >>       > of sockets in the child for those that really

want it.

      >       >>       >

      >       >>       > [1]: Caution-Caution-Caution-https:

//github.com/diacritic/BoxFort < Caution-https://github.com/
diacritic/BoxFort >

< Caution-Caution-https://github.com/diacritic/BoxFort ;< Caution-

https://github.com/diacritic/BoxFort ;>  >  < Caution-

      > Caution-Caution-https://github.com/diacritic/BoxFort ;< Caution-

https://github.com/diacritic/BoxFort ;>  < Caution-Caution-

https://github.com/diacritic/BoxFort ;< Caution-https://github.com/

diacritic/BoxFort >  >  >

      >       >>       >
      >       >>       > 2016-12-12 19:31 GMT+01:00 Garrett D'Amore <

garrett@xxxxxxxxxx < Caution-mailto:garrett@xxxxxxxxxx ;>  <

Caution-Caution-mailto:garrett@xxxxxxxxxx ;< Caution-mailto:

garrett@xxxxxxxxxx >  >  <

      > Caution-Caution-Caution-mailto:garrett@xxxxxxxxxx ;<

Caution-mailto:garrett@xxxxxxxxxx ;>  < Caution-Caution-

mailto:garrett@xxxxxxxxxx ;< Caution-mailto:garrett@xxxxxxxxxx ;>  >  > >:
      >       >>       >> The following conversation relates to using

fork() with nanomsg (or future

      >       >>       >> rewrites), where you do *not* immediately call

exec().  Using fork() and

      >       >>       >> then immediately calling exec() is fine, and

will continue to work as it

      >       >>       >> always.
      >       >>       >>
      >       >>       >> But some people want to use fork() to spawn

children, e.g. a child worker

      >       >>       >> process, that communicates back to the parent

somehow.   This is never going

      >       >>       >> to work.
      >       >>       >>
      >       >>       >> I’ve been doing a bit more research into

pthread_atfork() as part of an

      >       >>       >> attempt to make my new nng library properly

fork()-safe.  I’ve more or less

      >       >>       >> given up though.
      >       >>       >>
      >       >>       >> The reason for this is that even the OpenGroup

has given up — see

      >       >>       >> Caution-Caution-Caution-http:/

/pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_atfork.html <

Caution-http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html >  < Caution-

      > Caution-http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html < Caution-

http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html >  >  < Caution-

      >       >> Caution-Caution-http://pubs.opengroup.org/onlinepubs/

9699919799/functions/pthread_atfork.html < Caution-

http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html >  < Caution-

      > Caution-http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html < Caution-

http://pubs.opengroup.org/onlinepubs/9699919799/

functions/pthread_atfork.html >  >  >

      >       >>       >> — and especially the RATIONALE section, for

the logic behind this.  They

      >       >>       >> have even indicated plans to deprecate the

pthread_atfork() API altogether.

      >       >>       >>
      >       >>       >> Essentially, it isn’t possible to make a

version of the library fork() safe

      >       >>       >> as it would be necessary to free resources, do

locks, etc. — i.e. all those

      >       >>       >> Async-Signal-Unsafe calls.
      >       >>       >>
      >       >>       >> So, for libnng, and possibly in the future for

libnanomsg, I will be

      >       >>       >> changing the API so that if you attempt to

callback into the library after

      >       >>       >> fork(), it will actually panic the process.
      >       >>       >>
      >       >>       >> I probably will also arrange for

pthread_atfork() to be called to close any

      >       >>       >> file descriptors that were not marked

close-on-exec…

      >       >>       >>
      >       >>       >> Stay tuned for more details.
      >       >>       >>
      >       >>       >> - Garrett
      >       >>       >
      >       >>       >
      >       >>       > --
      >       >>       > Franklin "Snaipe" Mathieu
      >       >>       > 🝰 Caution-Caution-Caution-https://diacritic.io

< Caution-https://diacritic.io ;>  < Caution-Caution-https://diacritic.io ;<

Caution-https://diacritic.io ;>  >  < Caution-Caution-Caution-https:

//diacritic.io < Caution-https://diacritic.io ;>  < Caution-

      > Caution-https://diacritic.io ;< Caution-https://diacritic.io ;>
>
      >       >>       >
      >       >>
      >       >>
      >       >
      >
      >
      >

Follow-Ups:
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Franklin Mathieu

References:
- [nanomsg] On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Franklin Mathieu
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Karan, Cem F CIV USARMY RDECOM ARL (US)
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Michael Powell
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Karan, Cem F CIV USARMY RDECOM ARL (US)
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Karan, Cem F CIV USARMY RDECOM ARL (US)

[nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation

Other related posts: