[nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation

From: Michael Powell <mwpowellhtx@xxxxxxxxx>
To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
Date: Wed, 14 Dec 2016 09:50:40 -0500

On Wed, Dec 14, 2016 at 9:07 AM, Karan, Cem F CIV USARMY RDECOM ARL
(US) <cem.f.karan.civ@xxxxxxxx> wrote:

Have you considered using a garbage collector?  E.g.
http://www.hboehm.info/gc/.  Looking through the header file, it appears that ;
there are calls specifically for handling forks (GC_set_handle_fork(),
GC_atfork_prepare(), GC_atfork_parent(), GC_atfork_child(), and
GC_start_mark_threads()).  Based on the documentation surrounding
GC_start_mark_threads(), it appears that the collector can handle fork()s
that are not followed by an exec().  That may solve the memory leak issues
cleanly.  There are also functions to register finalization methods, so that
should handle dealing with file pointers, etc. that you want to close
eventually.

What does a GC gain you, but to make further excuses for poor coding
practices, in the first place? Been there, done that, don't need
another T-shirt.

I use that particular collector in my own work, and it is quite fast; I
allocate a ridiculous number of short-lived objects, and even then, the
profiler shows that garbage collection takes less than 1% of the runtime.
I've never tried forking a child though, so I don't know how well that part
works.

Thanks,
Cem Karan

-----Original Message-----
From: nanomsg-bounce@xxxxxxxxxxxxx [mailto:nanomsg-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Garrett D'Amore
Sent: Wednesday, December 14, 2016 1:53 AM
To: nanomsg@xxxxxxxxxxxxx
Subject: [Non-DoD Source] [nanomsg] Re: On pthread_atfork(), and fork()-safe
implementation

All active links contained in this email were disabled. Please verify the
identity of the sender, and confirm the authenticity of all links
contained within the message prior to copying and pasting the address to a
Web browser.

________________________________

Well I thought I had a brilliant idea, and I spent a number of hours this
evening trying to bake in a solution.  I eventually had to throw my
hands up in the air.

I can see that it *is* possible to build a solution that leaks *only* any
memory used by mutexes and condvars.  That’s definitely possible.
The problem is, the work you have to do for this is extreme, and it requires
you to basically build the equivalent of an operating system in
some ways.  I had a scheme to suspend threads, and mark regions fork-safe
vs. unsafe, etc.  The problem is that in order to avoid leaking
memory, you pretty *have* to manage your own heap — as in every single
memory object in your system has to be globally discoverable.
This turns out to be rather inconvenient if you don’t also want to build
your own memory manager, since some memory objects are going
to be used by threads, and frankly I had objects that were “orphaned” in
that they didn’t have any global state to them, only locally used
inside functions in threads.

One day I may come back to this, by supplying my own memory manager that
will let me reclaim every allocated object in the system
(perhaps simply by reclaiming the entire heap in one fell swoop).  I’d also
need a way to reclaim files, and handle mutexes and condvars
“magically”.  I’m pretty sure I know how to do that, and that it can be done
in the platform layer.  Which means it can be done in the
future, as a fairly straight-forward retrofit, once I decide I’m willing to
take the larger action to stop using “ordinary” memory
management.  I’ve got enough other stuff to do in the meantime, that I’m
taking my earlier action, which is to panic when the user
attempts to reenter the library from the child after fork().

- Garrett

On Tue, Dec 13, 2016 at 8:51 AM, Garrett D'Amore <garrett@xxxxxxxxxx <
Caution-mailto:garrett@xxxxxxxxxx ;> > wrote:

      Thanks.  I had planned to design a fork safe version of things in the
new design. I had implemented freeze and thaw and reset
entry points at various points and was pretty sure that this would have
worked well.  Until I discovered that the child side version was not
allowed to call any mutex functions or to call free.

      I will think about this some more.  Delaying the child side action
might be reasonable and lead to a working solution.

      Sent from my iPhone

      > On Dec 13, 2016, at 12:20 AM, Franklin Mathieu
<franklinmathieu@xxxxxxxxx < Caution-mailto:franklinmathieu@xxxxxxxxx ;>

wrote:

      >
      > I'm going to give my 2 cents on the matter as I was the one that
initially
      > opened the github issue regarding fork()-safety and I had the time
      > to work with different approaches on the matter.
      >
      > I've been maintaining an unit testing framework for C that relies on
      > worker processes to run tests safely, and as such, for the longest
      > time, this had been implemented with fork() without a subsequent
exec().
      > I recently switched the I/O layer of the framework to use nanomsg
      > because it was simple, and it was much more "correct" than what
      > I had been doing before with pipe() shenanigans.
      >
      > However, as nanomsg isn't fork()-safe, I took a swab at implementing
      > a fork()-safety mechanism, which ended up being brittle but was
      > "good enough" for my purposes, and I reworked other dependencies
      > to make sure they handled forks correctly.
      >
      > The problem with fork()-safety is that unless you think of it right
at the
      > design of the software, you're going to end up doing something
hack-ish;
      > which means that the rewrite could be a good starting point to
actually
      > implement the structural basis towards fork()-safety. POSIX might be
      > right on target with the problems caused by pthread_atfork(), but in
      > practice there is a lot of wiggle room to do what we must to make
      > things work at fork.
      >
      > With all of that being said, I've given up myself on fork()-safety.
      > The fact is that there is no single silver bullet to address this,
      > that a lot of software is expecting exec() to be called after a
fork(),
      > and that there aren't many use cases in having worker processes.
      >
      > I ended up writing a library dedicated to spawning worker
      > processes [1] in a manner that calls fork() then re-exec()s the
current
      > executable with a patched main function, which while not ideal, is
      > in my opinion less of a hack than having to make the software and
      > all of its dependencies fork-safe().
      >
      > This is why I understand your decision of giving up and panicking
      > the process on fork-reentry. You might also be able to compromise
      > by only allowing calls to nng_socket_create after fork, which could
under
      > the covers completely drop the current invalid state and just
reinitialize
      > the library. This would cause a resource leak, but allow the usage
      > of sockets in the child for those that really want it.
      >
      > [1]: Caution-https://github.com/diacritic/BoxFort ;<
Caution-https://github.com/diacritic/BoxFort ;>
      >
      > 2016-12-12 19:31 GMT+01:00 Garrett D'Amore <garrett@xxxxxxxxxx <
Caution-mailto:garrett@xxxxxxxxxx ;> >:
      >> The following conversation relates to using fork() with nanomsg (or
future
      >> rewrites), where you do *not* immediately call exec().  Using
fork() and
      >> then immediately calling exec() is fine, and will continue to work
as it
      >> always.
      >>
      >> But some people want to use fork() to spawn children, e.g. a child
worker
      >> process, that communicates back to the parent somehow.   This is
never going
      >> to work.
      >>
      >> I’ve been doing a bit more research into pthread_atfork() as part
of an
      >> attempt to make my new nng library properly fork()-safe.  I’ve more
or less
      >> given up though.
      >>
      >> The reason for this is that even the OpenGroup has given up — see
      >>
Caution-http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_atfork.html
< Caution-
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_atfork.html
>
      >> — and especially the RATIONALE section, for the logic behind this.
They
      >> have even indicated plans to deprecate the pthread_atfork() API
altogether.
      >>
      >> Essentially, it isn’t possible to make a version of the library
fork() safe
      >> as it would be necessary to free resources, do locks, etc. — i.e.
all those
      >> Async-Signal-Unsafe calls.
      >>
      >> So, for libnng, and possibly in the future for libnanomsg, I will be
      >> changing the API so that if you attempt to callback into the
library after
      >> fork(), it will actually panic the process.
      >>
      >> I probably will also arrange for pthread_atfork() to be called to
close any
      >> file descriptors that were not marked close-on-exec…
      >>
      >> Stay tuned for more details.
      >>
      >> - Garrett
      >
      >
      > --
      > Franklin "Snaipe" Mathieu
      > 🝰 Caution-https://diacritic.io ;< Caution-https://diacritic.io ;>
      >

Follow-Ups:
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore

References:
- [nanomsg] On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Franklin Mathieu
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Garrett D'Amore
- [nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation
  - From: Karan, Cem F CIV USARMY RDECOM ARL (US)

[nanomsg] Re: [Non-DoD Source] Re: On pthread_atfork(), and fork()-safe implementation

Other related posts: