[nanomsg] Re: Design of a distributed computing framework

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Fri, 20 Feb 2015 14:26:14 -0800

> On Feb 20, 2015, at 1:53 PM, Stefano <phd.st.p@xxxxxxxxx> wrote:
> 
> On 20 February 2015 at 19:42, Garrett D'Amore <garrett@xxxxxxxxxx 
> <mailto:garrett@xxxxxxxxxx>> wrote:
>> 
>>> On Feb 20, 2015, at 11:22 AM, Stefano <phd.st.p@xxxxxxxxx> wrote:
>>> 
>>> Hi All,
>>> 
>>> I have been using nanomsg on a couple of projects with great results
>>> so far. Motivated by this, I'd like to leverage on nanmosg again for a
>>> distributed computing framework I am working on.
>>> 
>>> Let's start with the high-level description of the scenario.
>>> 
>>> There are a number of stateless computational units, the workers,
>>> which are able to receive some data, perform a computation, and send
>>> the result back.
>>> Then there a number of clients with computations to perform.
>>> Finally there is a single master server whose address is known to all
>>> workers and clients and whose role is to organise the workload (the
>>> set of all computations at a given time) by matching clients to
>>> workers.
>>> In general the time required for a worker to complete a computation is
>>> non-trivial, let's say up to 1 minute, both clients and workers are
>>> going to be distributed around the globe and each of them do not
>>> necessarily have a public IP address (in the sense that it might be
>>> under a ISP sub-net).
>>> 
>>> I have read the official nanomsg documentation, plus the blog
>>> articles, and selected sections from the zeromq guide, but it's not
>>> clear to me what is the simplest way to proceed.
>>> 
>>> A single REP/REQ channel is not going to work as multiple requests
>>> need to be made at any time before waiting for the replies (each
>>> computation might take up to 1 minute as per above, to which the time
>>> required for data transmission has to be added).
>> 
>> You want to use a raw mode REQ socket.  (XREQ).  You can then have multiple 
>> requests outstanding.  But at that point you’re responsible for matching 
>> replies to requests, and for doing any timeout or retransmit activity.  The 
>> workers don’t have to have any of that complexity though.
> 
> Is this functionality documented in nanomsg?
> http://nanomsg.org/v0.5/nn_reqrep.7.html 
> <http://nanomsg.org/v0.5/nn_reqrep.7.html> is not mentioning XREQ at all.

I’d have to search the docs.  it should be documented, but the exact location 
isn’t immediately at my finger tips.  The RFC for req/rep certainly covers it 
but doesn’t go into the libnanomsg API.  Basically, you need to use AF_SP_RAW 
instead of AF_SP.  And you have to be prepared for the fact that then you have 
to handle the headers and all the rest of the “cooked” mode processing in your 
application.


> 
>> 
>>> 
>>> A possible alternative would be to manually handle a collection of
>>> PAIR connections. However, this would require the use of 1 port for
>>> each connection so it would not scale, plus we have the issue of how
>>> to actually coordinate the initiation of such connections as the
>>> surveyor pattern could not be used (we cannot rely on clients and
>>> workers knowing their ip address) and a lot of nanomsg convenient
>>> features (load balancing with priorities, failure handling,
>>> reconnections, ...) would be wasted.
>>> 
>>> Finally, let me add that it would be even better if it would be
>>> possible to make clients and workers being able to connect to each
>>> other to avoid passing by the master, but for this I don't have any
>>> decent idea yet.
>> 
>> You can do that, but you probably want to create new sockets, and understand 
>> what the topology is.
> 
> The problem I don't know how to solve is that only master is
> guaranteed to have a public IP, client and worker might be using an
> ISP that do not offer a directly reachable public IP.
> This means that they only connections that are guaranteed to work are
> connect() toward the master IP.
> I am confident that there is a solution to this, but I'm not an expert
> in the field so I don't know how this problem is usually approached.

This is sort of STUN type thing.  I’m not too familiar with the details.  I 
don’t know if you can make this today with nanomsg itself.  I’ve only *just* 
started looking at this today — because someone suggested WebRTC was a 
potential transport for mangos (which sounds really really cool.)  I doubt I’ll 
make any immediate progress on it though.

> 
>> 
>>> 
>>> Please let me know if something is unclear.
>>> 
>>> Any suggestion that could point me to the right direction would be
>>> really appreciated, and thanks for publishing nanomsg under a liberal
>>> license.
>> 
>> I wish I had some C code to run in raw mode that I could easily give you.  
>> Unfortunately, all my work for  asynchronous REQ clients is written in Go, 
>> using my mangos framework (which is wire compatible with nanomsg.)  If you 
>> want to see the logic for that, I’d be happy to send you a snippet.  
>> Admittedly, its more complex than just synchronous mode.  (In my case I 
>> wanted to support asynchronous client RPC.  That RPC framework will some day 
>> be open source I hope, but we’re still improving it and not ready to share 
>> it publicly yet.  It includes clients & servers written in Go, C, and Erlang 
>> to date.)
> 
> Yes, that would help greatly.
> I had a look at Go in the past and this project is being developed in
> LuaJIT so there is no need to restrict code snippets to C.
> Thanks for the prompt and informative reply.

Ok, I need to figure out how to “sanitize” the work I’ve done, or rather 
provide under a suitable license since its part of a proprietary (for now) 
framework.  Stay tuned.  (What might be cool is an RPC implementation that 
demonstrates Go RPCs using Gobs over mangos. That would be a nice example to 
have in the framework, I think.)

        - Garrett

> 
> Stefano
> 
>> 
>>        - Garrett

Other related posts: