[nanomsg] Re: The name service for nanomsg

  • From: Alex Elsayed <eternaleye@xxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Sat, 07 Sep 2013 02:24:23 -0700

On Saturday, September 07, 2013 07:51:11 AM Martin Sustrik wrote:
> Hi Nico,
> 
> > For some use cases I'd think an IGMP-like protocol would be best.
> > Specifically for pub/sub, where a URI might denote/resolve to one or
> > more group membership management services, where publishers, routers,
> > and subscribers join as such and where the topology is worked out
> > dynamically (this is easy enough for publishers and subscribers, but a
> > bit harder for routers as real topology information would be nice to
> > have but possibly difficult to extract from the network).
> 
> As an analogy, I would say pub/sub subscriptions basically work as IGMP.
> 
> As for the name service (btw, we should think of a better name for the
> thign) the analogy is more like a network admin running around with
> cables, plugging in switches etc.
> 
> The core difference being that the former is fully automatic, while the
> latter is fully human-driven.

As far as names go, "topology rendezvous service" or "topology parameter 
distribution service" are more meaningful - but that is really messy, because 
it describes something trying to solve too many things at once in my view.

The thing is, I really do think we will want to separate even the admin part 
into multiple layers. There are really a couple of distinct classes of data we 
want to let the admin control (phrased from the POV of a topology participant)

* "The other side of this interaction has property X"
* "I have property Y"
* "The relationship between myself and the other side has property Z"

In general, X is better off being queried (DNS is the canonical example).
In general, Y is better off being propagated to the participants ahead of time 
(DHCP handing out addresses).
In general, Z is a pain and a half because it requires global knowledge, not 
only of the *desired* state but of the *current* state as well, which can 
change dynamically and have emergent properties.

So far, the discussion has lumped all these together and tried to solve them 
all in one go. Even if we do provide a single API to the *coder*, it may be 
best to have different protocols for these things rather than one overarching 
protocol.

The actual tasks being discussed so far basically amount to:

* Mapping from an ID to a locator (Very much X)
* Bind vs Connect (Z)
* Parameters (A mix of X, Y, and Z depending on the parameter)

DNS SRV + getting a query-side wildcarding capability may be the nicest way to 
solve the ID to locator bit. It requires a change to SRV via the IETF, but 
since it's in the interactions rather than the record format it might be 
easier to swing.

I don't think I explained this idea adequately in my other mail, so I'll 
correct that now.

Currently, a DNS SRV record takes a form like:
_service._transport.host TTL IN SRV <prio> <weight> <port> <canonical host>

and a query takes the form:
_service._transport.host

Now, for most usages this is great, since it's common for a service to only 
use one transport. However, since SP can (potentially) operate over arbitrary 
transports, it does not suit us particularly well because in order to look up 
a locator, we'd need to enumerate every transport and query them all because 
the queries are literal.

My suggestion is to see if in the *query* we can wildcard the transport, as:
_service.*.host
and have that result in the DNS server returning all results for _service, 
regardless of transport.

That solves the ID to locator issue handily.

Bind vs. connect is a thorny problem, because it's not just a parameter that 
can be set unilaterally. It's a role to play in another, lower-layer 
distributed algorithm of bind/listen/connect/accept.

Transitioning from bind to connect (or vice versa) while running is honestly 
something I don't think any admin will do without having some serious wibblies 
about it, at least in part because while the transition is running you 
suddenly have two incompatible topologies sharing a name - two hosts set to 
connect() cannot communicate directly, nor can two set to bind().

I strongly suspect that best practices, even if bind/connect is available to 
the admin, would quickly converge on "always deploy a device as bind() and 
have the ends connect(). Rolling out another device for failover/load 
balancing later is just adding a new DNS record; switching bind() to connect() 
on the endpoints is more pain than it's worth."

This is especially true because of the behavior of devices - it makes very 
little sense for a device to do anything other than bind(), so as soon as they 
roll out the first device every endpoint in the topology will use connect().

As far as parameters go, I'm not sure what the best solution is (primarily due 
to how some are X, some are Y, and some are Z)

Thoughts?

Other related posts: