[nanomsg] Re: Name service experiments

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Wed, 11 Sep 2013 08:17:07 +0200

Hi Paul,

Impressive work in couple of days!

I've finally done a simple name service.

https://github.com/tailhook/dotns

It requires python3 (tested on 3.3), and has no other dependencies
(built-in library binding pieces). You might also need to install
(update) the nanomsg from the name_service branch:

https://github.com/tailhook/nanomsg/tree/name_service


What Have Been Done
--------------------------------

I drew a pretty complex topology with graphviz:

https://docs.google.com/file/d/0Byd_FTQksoKLY3lyZnlQalFtRnc/edit

And tried to make name service that is able to setup that topology. The
aformentioned dotns utility parse topology.dot [1] file and creates
addresses for nodes based on the info there.

Nice. I haven't hoped for topologies so complex so soon.

In the meantime:

1. Implemented nanodev utility which basically CLI for nn_device

Right. That'll need to be a part of the package.

2. Added NN_OVERRIDE_HOSTNAME environment variable, to override host
name got by gethostname() inside nanomsg, so that I can spawn whole
cluster on my single machine

We should think more about this. Not being able to put 2 components on the same box without overloading the hostname is clumsy.

3. Added socket type to the NS request and priority to the NS reply

Ack.


Graph-based Topology
---------------------------------

Building the name service based on graph was interesting research
project, but is not something that can become production quality.

Pros:

1. Watch your structure ahead of time

2. Connections are written in intuitive form (A -> B)

Cons:

1. All nodes must be added to topology (probably), not just rules

2. Dot is not well-suited for that, need unique name for each process,
duplicate labels with hidden attributes, etc.

Resume: Won't work in production environment. It's better to output
graph based on some DSL than to derive whatever info from the graph.

+1

It's nice to have a tool to draw the graph, however, the full complexity of the setup can only be expressed by a language of some kind.

Protocol Limitations
-----------------------------

1. Can't connect device for NN_PAIR protocol, because there is no way to
distinguish between two sockets of the single device (others have two
different socket types e.g. REQ vs. REP). Do we need device for PAIR
socket as a part of topology?

Devices for PAIR are probably useless, but devices for BUS would suffer similar problems IMO.

2. Need to override hostname for running multiple nodes on development
machine (It's possible to use CLONE_NEWUTS in linux, but that's rather
too complex)

The Complicated Stuff About Topologies
-----------------------------------------------------------

While working on the project I've got the following issues:

1. It's possible that topology contains cycle. E.g. in diagram above if
all workers are killed, the messages would loop forever. There are
probably no good protection for this except limit for the trace size, as
specified in draft RFC. Is the limit implemented?

Not yet :(

2. When setting up complex topology it's possible that all nodes
downstream the device are off. E.g. on the picture if workers 1 and 2 in
box "wally" are off, but the "balancer" is not. Messages will still be
sent to the node.  It may be fixed with two ways:

2a) Adding lower priority connection from balancer to somewhere. Even
adding it to the upstream device would work (with appropriate cycle
detection and efficiency statistics). E.g. add a connection between
"balancer" in box "wally" and the device in box "laura".

2b) By updating the name service records, using a feedback from
monitoring. That requires updating name records "on the fly". Which is
probably a topic of another big discussion.

My understanding was that the device with no downstream nodes will stop receiving messages and eventually apply pushback to the upstream node. The node then stops sending new messages to the device and will send them to a lower priority connection instead.

If no downstream node becomes available for an extended period of time, the stuck requests will be eventually resent by the original requesters and routed elsewhere.

3. The priorities stuff is complex to setup. Note there are only three
low priority connections on the scheme that make whole cluster to be
switched when no workers available. Note also that the only priorities
do prevent messages being bounced in the cluster forever (in the case
there are at least single worker). All in all there are many tiny things
that is complex to setup unless there are appropriate tools.

So what exactly do you feel is complex about prioritised connections? Loop avoidance?

Socket Options
----------------------

I've hesitated to put socket options to the name service. But after some
experimenting, I think there are strong reasons to put at least these:

* IPv4ONLY, so that programmers don't need to know what it is

* SO_REUSEPORT, for processes that should bind on single port, since
it's very dependent on the topology, and it's very dangerous to just use
SO_REUSEPORT everywhere

Regarding other options like SNDBUF, RCVBUF, they might or might not be
put there. Their tuning is probably a task of administrator, but they
don't change very often, and they depend on application more than on
exact place a component put in a topology.

I would say SNDBUF, RCVBUF depend on things like throughput, how fast you want the backpressure to kick in etc. Which seems to be something that admin knows rather than programmer.

What's next
-------------------

I don't believe DNS might solve our problem fully. So if someone wants
to try it's a good time to step up and make a prototype. (otherwise I'll
do a simple mapping of name/config service to DNS somewhere near the
point we agree on the protocol and start integrating the changes into
mainline)

I am having a kind of dilemma about this one.

DNS is a database that's already there, installed and accessible from *everywhere*. That's such a tremendous advantage that it almost beats any possible downside. Further, its distributed, global character and caching capabilities make it almost irresistible.

That being said, some requirements we have don't seem to match to DNS (rule evaluation) but that may well be just a problem with ourselves rather than with DNS as an technology. After all, DNS is just a generic distributed database with possibility to use generic data types (TXT records).

What I think is worth investigating is how to build topology based on
rules, and what exact info is needed for the name/config service to
build topology with as little user intervention as possible. So I'll
start experimenting with the rules.

+1

I don't believe we can do anything sane without rules.

The big question is what kind of criteria should the rules be based on. It's kind of complex as it seems to be difference geospatial information, system characteristics etc.

One option would be to use your hostname overload mechanism in a more generic way. The value would not be treated as a hostname overload, rather an arbitrary string set by admin to be used by the name server as criteria for rule evaluation.

It would be great if everyone show the most complicated topologies they
saw in the real life :)

One interesting research area are topologies spanning multiple administrative units. You get this everytime someone from outside of your company connects to your topology. The node is not part of your organisation's topology graph and it may be in fact a device and hide a significant part of the topology behind it.

Martin

Other related posts: