[nanomsg] Name service experiments

From: Paul Colomiets <paul@xxxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Tue, 10 Sep 2013 20:54:43 +0300
Hi,

I've finally done a simple name service.

https://github.com/tailhook/dotns

It requires python3 (tested on 3.3), and has no other dependencies
(built-in library binding pieces). You might also need to install (update)
the nanomsg from the name_service branch:

https://github.com/tailhook/nanomsg/tree/name_service


What Have Been Done
--------------------------------

I drew a pretty complex topology with graphviz:

https://docs.google.com/file/d/0Byd_FTQksoKLY3lyZnlQalFtRnc/edit

And tried to make name service that is able to setup that topology. The
aformentioned dotns utility parse topology.dot [1] file and creates
addresses for nodes based on the info there.

In the meantime:

1. Implemented nanodev utility which basically CLI for nn_device

2. Added NN_OVERRIDE_HOSTNAME environment variable, to override host name
got by gethostname() inside nanomsg, so that I can spawn whole cluster on
my single machine

3. Added socket type to the NS request and priority to the NS reply


Graph-based Topology
---------------------------------

Building the name service based on graph was interesting research project,
but is not something that can become production quality.

Pros:

1. Watch your structure ahead of time

2. Connections are written in intuitive form (A -> B)

Cons:

1. All nodes must be added to topology (probably), not just rules

2. Dot is not well-suited for that, need unique name for each process,
duplicate labels with hidden attributes, etc.

Resume: Won't work in production environment. It's better to output graph
based on some DSL than to derive whatever info from the graph.


Protocol Limitations
-----------------------------

1. Can't connect device for NN_PAIR protocol, because there is no way to
distinguish between two sockets of the single device (others have two
different socket types e.g. REQ vs. REP). Do we need device for PAIR socket
as a part of topology?

2. Need to override hostname for running multiple nodes on development
machine (It's possible to use CLONE_NEWUTS in linux, but that's rather too
complex)

The Complicated Stuff About Topologies
-----------------------------------------------------------

While working on the project I've got the following issues:

1. It's possible that topology contains cycle. E.g. in diagram above if all
workers are killed, the messages would loop forever. There are probably no
good protection for this except limit for the trace size, as specified in
draft RFC. Is the limit implemented?

2. When setting up complex topology it's possible that all nodes downstream
the device are off. E.g. on the picture if workers 1 and 2 in box "wally"
are off, but the "balancer" is not. Messages will still be sent to the
node.  It may be fixed with two ways:

2a) Adding lower priority connection from balancer to somewhere. Even
adding it to the upstream device would work (with appropriate cycle
detection and efficiency statistics). E.g. add a connection between
"balancer" in box "wally" and the device in box "laura".

2b) By updating the name service records, using a feedback from monitoring.
That requires updating name records "on the fly". Which is probably a topic
of another big discussion.

3. The priorities stuff is complex to setup. Note there are only three low
priority connections on the scheme that make whole cluster to be switched
when no workers available. Note also that the only priorities do prevent
messages being bounced in the cluster forever (in the case there are at
least single worker). All in all there are many tiny things that is complex
to setup unless there are appropriate tools.


Socket Options
----------------------

I've hesitated to put socket options to the name service. But after some
experimenting, I think there are strong reasons to put at least these:

* IPv4ONLY, so that programmers don't need to know what it is

* SO_REUSEPORT, for processes that should bind on single port, since it's
very dependent on the topology, and it's very dangerous to just use
SO_REUSEPORT everywhere

Regarding other options like SNDBUF, RCVBUF, they might or might not be put
there. Their tuning is probably a task of administrator, but they don't
change very often, and they depend on application more than on exact place
a component put in a topology.


What's next
-------------------

I don't believe DNS might solve our problem fully. So if someone wants to
try it's a good time to step up and make a prototype. (otherwise I'll do a
simple mapping of name/config service to DNS somewhere near the point we
agree on the protocol and start integrating the changes into mainline)

What I think is worth investigating is how to build topology based on
rules, and what exact info is needed for the name/config service to build
topology with as little user intervention as possible. So I'll start
experimenting with the rules.

It would be great if everyone show the most complicated topologies they saw
in the real life :)


Any suggestions are welcome.


References:

[1]
https://github.com/tailhook/dotns/blob/master/examples/twodc/topology.dot

-- 
Paul
Follow-Ups:
- [nanomsg] Re: Name service experiments
  - From: Martin Sustrik
[nanomsg] Name service experiments

Other related posts: