[nanomsg] Re: How to bind on a random port?

  • From: "Jason E. Aten" <j.e.aten@xxxxxxxxx>
  • To: nanomsg <nanomsg@xxxxxxxxxxxxx>
  • Date: Sun, 16 Nov 2014 08:22:57 -0800

I suspect we may be simply thinking about solving different problems, but
I'll try to outline my use case below.

Let's ignore the problems of using root-only ports (those below number
1024) for the moment, even though practically that does make
depending on running a service on port 1 an impossibility in many
situations. Also lets ignore strings versus ports, since that
is just a size-of-the-namespace issue, and less fundamental.  Let's focus
on dynamic vs static port allocation.

Static strings might work for servers (to use a phone analogy, where I'm a
business and I want a listed number in the phone book), but static strings
don't work for clients (exactly when I don't want to list my number, and in
fact for security I may want an unknown-in-advance and/or unpredictable
number).

Example: When a client socket connect()s, it cannot reuse a well known port
on the client end. So the tcp stack allocates the client a random, unused
port.

Whenever we have an analogous situation to that, then we'll need a dynamic
port, no?

* * *

Keeping that analogy in mind, I see a separate set of use cases where a
dynamically allocated port is required.

My use case is that I have a simple job processing system with multiple
worker processes on the same host.

Each worker wishes to open a port to receive a reply message meant
specifically for them. Since we wish to de-couple requests and the replies
to those requests,
each worker must have its own listening port.

At their start, as workers request a job. As a part of that job request,
say to a job-dispatching service, the worker tells the
dispatcher where to send the job, in the form of an IP:port pair.

In order to have multiple independent processes running on the same
machine, and so replies go directly to the worker that requested
that particular job, each worker must listen for replies on its own port.
Since they can't all share the same port, they need to allocate
a currently unused port.  Really this is just like the socket connect()
from the client example, perhaps generalized a bit.

* * *

Practically speaking, the re-try functionality of nanomsg is very
important. But failure to bind a port even once at startup time is an
initialization
error that retry cannot solve automagically for us.  We should distinguish
and elevate a failure to bind a port the first time, so that applications
using nanomsg can take appropriate remedial measures.

It's rather kludgy to have to determine a port outside of nanomsg at the
moment, because it has a built in race condition (the gap in time between
when I ask
the kernel for a socket and then release it and tell nanomsg to take that
port).  I suppose the other way things could work would just be to provide
a way to tell nanomsg to use the socket that I've already allocated.


Best regards,
Jason



On Sun, Nov 16, 2014 at 6:17 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 16/11/14 05:45, Matt Howlett wrote:
> > It should be treated as a failure to connect. The implementation
> > should try to re-connect assuming that the interface will become
> > available eventually.
> >
> >
> > I think it would be very common for people to want an atomic way
> > of checking if a port is free, binding to it if it is, and moving
> > on if it is not (I want this and I notice a number of others here
> > do too). I also think that in most scenarios where a port is
> > already in use, it's likely to stay that way for a time period long
> > enough that the application wanting to bind to it is not going to
> > find it useful to wait for it to become free (I'm not an expert
> > though and could be missing a common scenario where this
> > functionality is actually useful - examples?). So based on that, I
> > think the former behavior is the best, or should at least be an
> > option.
>
> Ok, let me give you the big picture.
>
> Long ago the authors of Internet protocol suite have decided to use
> 16-bit integer IDs to address different "services" on a host. I guess
> back then the decision must have seemed reasonable, but later on it
> became clear that the namespace is too small and the service often
> clash using the same ID (port).
>
> In 1988 there was an attempt to solve the problem (at least for TCP)
> by defining a protocol that used arbitrary string as IDs instead of
> integers (TCPMUX, RFC 1078, TCP port 1) but it never got wide adoption.
>
> If you think of it, using strings solves the problem fully. You can
> use fully qualified name as a service ID, it can contain the name of
> the vendor etc. In such world there's would be no service name clashes
> and thus no need for dynamically assigned service IDs.
>
> Unfortunately, TCPMUX was ignored and instead everybody sticks to port
> numbers and hacks around the problem using dynamically assigned ports.
>
> Yet, dynamically assigned IDs are a spectacularly bad solution:
>
> 1. The dynamically assigned ID has to be somehow communicated to the
> peers in runtime. That adds whole new layer of complexity to the
> applications.
>
> 2. Given that service IDs change as the services are restarted,
> there's no way to define service-specific policies at network level
> (e.g. allocating 100MB/s of bandwidth for service X).
>
> 3. The solution is prone to inconsistencies and race conditions, with
> different nodes having different ideas about what the ID of service X
> happens to be. You can think of it as a very simple distributed
> database which means you have to deal with CAP theorem: Either you can
> always determine the port number to use (Availability) or everybody
> can agree on the same number (Consistency) -- bet never both.
>
> In context of nanomsg (which does automatics reconnecting) the problem
> becomes even more pronounced: Say you bind to a wi-fi interface, get
> port X assigned, then the interface goes away (you are out of range of
> the wi-fi), then becomes available once more. You can try to re-bind
> to the same port, but it may already be taken. You can ask for a new
> dynamically allocated port but then it's certainly going to be
> different from the old one. The change would have to be communicated
> back to the application. How? Callback? And even if so, what would
> application do in such case? Et c.
>
> The bottom line is the we should rather than using dynamically
> assigned ports we should rather using string-based service names. I
> was thinking of actually implementing TCPMUX, but the idea surfaced
> again yesterday when speaking of WebSocket transport and using
> WebSocket URLs basically as service names.
>
> Well, that's more or less it. Sorry for the long rant but I hope it'll
> explain why things are done as they are and what is my preferred
> solution to the problem of service ID clashes.
>
> Martin
>

Other related posts: