[nanomsg] Re: Design of a distributed computing framework

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Fri, 20 Feb 2015 11:42:15 -0800

> On Feb 20, 2015, at 11:22 AM, Stefano <phd.st.p@xxxxxxxxx> wrote:
> 
> Hi All,
> 
> I have been using nanomsg on a couple of projects with great results
> so far. Motivated by this, I'd like to leverage on nanmosg again for a
> distributed computing framework I am working on.
> 
> Let's start with the high-level description of the scenario.
> 
> There are a number of stateless computational units, the workers,
> which are able to receive some data, perform a computation, and send
> the result back.
> Then there a number of clients with computations to perform.
> Finally there is a single master server whose address is known to all
> workers and clients and whose role is to organise the workload (the
> set of all computations at a given time) by matching clients to
> workers.
> In general the time required for a worker to complete a computation is
> non-trivial, let's say up to 1 minute, both clients and workers are
> going to be distributed around the globe and each of them do not
> necessarily have a public IP address (in the sense that it might be
> under a ISP sub-net).
> 
> I have read the official nanomsg documentation, plus the blog
> articles, and selected sections from the zeromq guide, but it's not
> clear to me what is the simplest way to proceed.
> 
> A single REP/REQ channel is not going to work as multiple requests
> need to be made at any time before waiting for the replies (each
> computation might take up to 1 minute as per above, to which the time
> required for data transmission has to be added).

You want to use a raw mode REQ socket.  (XREQ).  You can then have multiple 
requests outstanding.  But at that point you’re responsible for matching 
replies to requests, and for doing any timeout or retransmit activity.  The 
workers don’t have to have any of that complexity though.

> 
> A possible alternative would be to manually handle a collection of
> PAIR connections. However, this would require the use of 1 port for
> each connection so it would not scale, plus we have the issue of how
> to actually coordinate the initiation of such connections as the
> surveyor pattern could not be used (we cannot rely on clients and
> workers knowing their ip address) and a lot of nanomsg convenient
> features (load balancing with priorities, failure handling,
> reconnections, ...) would be wasted.
> 
> Finally, let me add that it would be even better if it would be
> possible to make clients and workers being able to connect to each
> other to avoid passing by the master, but for this I don't have any
> decent idea yet.

You can do that, but you probably want to create new sockets, and understand 
what the topology is.

> 
> Please let me know if something is unclear.
> 
> Any suggestion that could point me to the right direction would be
> really appreciated, and thanks for publishing nanomsg under a liberal
> license.

I wish I had some C code to run in raw mode that I could easily give you.  
Unfortunately, all my work for  asynchronous REQ clients is written in Go, 
using my mangos framework (which is wire compatible with nanomsg.)  If you want 
to see the logic for that, I’d be happy to send you a snippet.  Admittedly, its 
more complex than just synchronous mode.  (In my case I wanted to support 
asynchronous client RPC.  That RPC framework will some day be open source I 
hope, but we’re still improving it and not ready to share it publicly yet.  It 
includes clients & servers written in Go, C, and Erlang to date.)

        - Garrett


Other related posts: