[nanomsg] Re: Design of a distributed computing framework

From: Alex Elsayed <eternaleye@xxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Tue, 24 Feb 2015 22:41:14 -0800

Stefano wrote:

> On 20 February 2015 at 19:42, Garrett D'Amore
> <garrett@xxxxxxxxxx> wrote:
>>
>>> On Feb 20, 2015, at 11:22 AM, Stefano
>>> <phd.st.p@xxxxxxxxx> wrote:
>>>
>>> Hi All,
>>>
>>> I have been using nanomsg on a couple of projects with great results
>>> so far. Motivated by this, I'd like to leverage on nanmosg again for a
>>> distributed computing framework I am working on.
>>>
>>> Let's start with the high-level description of the scenario.
>>>
>>> There are a number of stateless computational units, the workers,
>>> which are able to receive some data, perform a computation, and send
>>> the result back.
>>> Then there a number of clients with computations to perform.
>>> Finally there is a single master server whose address is known to all
>>> workers and clients and whose role is to organise the workload (the
>>> set of all computations at a given time) by matching clients to
>>> workers.
>>> In general the time required for a worker to complete a computation is
>>> non-trivial, let's say up to 1 minute, both clients and workers are
>>> going to be distributed around the globe and each of them do not
>>> necessarily have a public IP address (in the sense that it might be
>>> under a ISP sub-net).
>>>
>>> I have read the official nanomsg documentation, plus the blog
>>> articles, and selected sections from the zeromq guide, but it's not
>>> clear to me what is the simplest way to proceed.
>>>
>>> A single REP/REQ channel is not going to work as multiple requests
>>> need to be made at any time before waiting for the replies (each
>>> computation might take up to 1 minute as per above, to which the time
>>> required for data transmission has to be added).
>>
>> You want to use a raw mode REQ socket.  (XREQ).  You can then have
>> multiple requests outstanding.  But at that point you’re responsible for
>> matching replies to requests, and for doing any timeout or retransmit
>> activity.  The workers don’t have to have any of that complexity though.
> 
> Is this functionality documented in nanomsg?
> http://nanomsg.org/v0.5/nn_reqrep.7.html is not mentioning XREQ at all.
> 
>>
>>>
>>> A possible alternative would be to manually handle a collection of
>>> PAIR connections. However, this would require the use of 1 port for
>>> each connection so it would not scale, plus we have the issue of how
>>> to actually coordinate the initiation of such connections as the
>>> surveyor pattern could not be used (we cannot rely on clients and
>>> workers knowing their ip address) and a lot of nanomsg convenient
>>> features (load balancing with priorities, failure handling,
>>> reconnections, ...) would be wasted.
>>>
>>> Finally, let me add that it would be even better if it would be
>>> possible to make clients and workers being able to connect to each
>>> other to avoid passing by the master, but for this I don't have any
>>> decent idea yet.
>>
>> You can do that, but you probably want to create new sockets, and
>> understand what the topology is.
> 
> The problem I don't know how to solve is that only master is
> guaranteed to have a public IP, client and worker might be using an
> ISP that do not offer a directly reachable public IP.
> This means that they only connections that are guaranteed to work are
> connect() toward the master IP.
> I am confident that there is a solution to this, but I'm not an expert
> in the field so I don't know how this problem is usually approached.

The hat-trick is that which side is REQ and which side is REP is independent 
of which side calls bind() and which side calls connect().

You can certainly have the master server bind() one REP (the job submission 
endpoint) and one REQ (the work dispatching endpoint). Clients would 
connect() to the former, and workers would connect() to the latter.

>>
>>>
>>> Please let me know if something is unclear.
>>>
>>> Any suggestion that could point me to the right direction would be
>>> really appreciated, and thanks for publishing nanomsg under a liberal
>>> license.
>>
>> I wish I had some C code to run in raw mode that I could easily give you.
>>  Unfortunately, all my work for  asynchronous REQ clients is written in
>> Go, using my mangos framework (which is wire compatible with nanomsg.) 
>> If you want to see the logic for that, I’d be happy to send you a
>> snippet.  Admittedly, its more complex than just synchronous mode.  (In
>> my case I wanted to support asynchronous client RPC.  That RPC framework
>> will some day be open source I hope, but we’re still improving it and not
>> ready to share it publicly yet.  It includes clients & servers written in
>> Go, C, and Erlang to date.)
> 
> Yes, that would help greatly.
> I had a look at Go in the past and this project is being developed in
> LuaJIT so there is no need to restrict code snippets to C.
> Thanks for the prompt and informative reply.
> 
> Stefano
> 
>>
>>         - Garrett
>>
>>

Follow-Ups:
- [nanomsg] Re: Design of a distributed computing framework
  - From: Stefano

References:
- [nanomsg] Design of a distributed computing framework
  - From: Stefano
- [nanomsg] Re: Design of a distributed computing framework
  - From: Garrett D'Amore
- [nanomsg] Re: Design of a distributed computing framework
  - From: Stefano

[nanomsg] Re: Design of a distributed computing framework

Other related posts: