[nanomsg] Re: How to bind on a random port?

  • From: "Jason E. Aten" <j.e.aten@xxxxxxxxx>
  • To: nanomsg <nanomsg@xxxxxxxxxxxxx>
  • Date: Sat, 15 Nov 2014 15:26:56 -0800

Shixi, is that the line your stack trace is originating from?

https://github.com/nanomsg/nanomsg/blob/master/src/transports/tcp/ctcp.c#L619

It does seem problematic to me that there is an errnum_assert() here.

Martin/everyone: it would seem reasonable to return an error instead of
aborting here, no?

Thanks.
Jason




On Sat, Nov 15, 2014 at 7:28 AM, xreborner (Shixi Chen) <xreborner@xxxxxxxxx
> wrote:

> The problem is that, if it fails, it just aborts. So no chance to retry.
>
> On Sat, Nov 15, 2014 at 4:21 PM, Jason E. Aten <j.e.aten@xxxxxxxxx> wrote:
>
>> This.
>>
>> Echoing Matt's comment -- I just bind to port 0 using non-nanomsg socket
>> calls (so the kernel picks a free port). Then I note the port, then close
>> that socket and reopen on that port in nanomsg. Since there is still a
>> short period in which that port might get taken, I also retry if again that
>> fails, but usually it succeeds.
>>
>>
>> On Fri, Nov 14, 2014 at 4:51 PM, Matt Howlett <matt.howlett@xxxxxxxxx>
>> wrote:
>>
>>>
>>> The behavior of nn_bind was also unexpected me. My work-around is to
>>> find a free port outside of nanomsg, then immediately bind to it. I stagger
>>> the start up of my workers (~1 worker per core per machine) so in practice
>>> I never get a race condition. Not ideal, but it works. If you can control
>>> when all of the processes that bind to random ports start up on each node,
>>> you can do the same thing, though it sounds like your situation might be
>>> more difficult.
>>>
>>>
>>>
>>> On Fri, Nov 14, 2014 at 10:30 PM, xreborner (Shixi Chen) <
>>> xreborner@xxxxxxxxx> wrote:
>>>
>>>> Unfortunately, I'm running a distributed computation application on a
>>>> cluster with thousands of machines, in each machine there could be multiple
>>>> tasks are running in background and have occupied some random ports. If I
>>>> just choose a random port (in fact, i use not only one port) and use it,
>>>> there are roughly 1% probability to fail in one machine. If my application
>>>> is running on 200 machines, then it almost always fail.
>>>>
>>>> On Fri, Nov 14, 2014 at 11:05 PM, Martin Sustrik <sustrik@xxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> On 14/11/14 15:55, xreborner (Shixi Chen) wrote:
>>>>> > So, if i don't know any port numbers that are available, it is
>>>>> > impossible to use nanomsg?
>>>>>
>>>>> Yes. Although on a typical machine, almost all ports are unused, so
>>>>> just picking one and using it tends to work.
>>>>>
>>>>> > My program is to be run on a remote cluster, where no port numbers
>>>>> > are known to be reserved. I was using zeromq and my solution was to
>>>>> > repeat calling zmq_bind. I'm considering to switch to nanomsg since
>>>>> > it looks better (and also due to some problems in zeromq). Is there
>>>>> > any plan to solve this problem in the future?
>>>>>
>>>>> I was thinking of implementing tcpmux (RFC 1) but it's not coming any
>>>>> time soon.
>>>>>
>>>>> Martin
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG v1.4.11 (GNU/Linux)
>>>>>
>>>>> iQEcBAEBAgAGBQJUZho9AAoJENTpVjxCNN9YouMH/RI/d9AismH7RuEH7aY6oOQV
>>>>> snl5ad/wZsupguf5uGtYfomnJOMtMrwLo+qEHK+u5JCWmBN73VikfJuJtwZs/lsg
>>>>> umD1xt6tGvOyxmI1V1bzXkNASyUktPpjedA0xgbBXlw8KwsDTTKIRaVCwNQt+FND
>>>>> tKKMHIQKJ9B0qmD8UrlT8fg1qwLsG/HUgr1JrkVw1+yLnaGXzwCdxWO49F3X+dEl
>>>>> aXwIO1cZrcpB+hPb7lemn4pWQDa//JiIbE4wbg7aT4ecgIWFd4UheHQfSBr8ZniH
>>>>> XjeGlJcJ4IDos9DzfNTKgj07lgGoMB+lt/7M+qr+Mh4AjJZTgYM11nGyp0ljpQg=
>>>>> =jgr1
>>>>> -----END PGP SIGNATURE-----
>>>>>
>>>>>
>>>>
>>>
>>
>

Other related posts: