Hi Paul, Impressive work in couple of days!
I've finally done a simple name service. https://github.com/tailhook/dotns It requires python3 (tested on 3.3), and has no other dependencies (built-in library binding pieces). You might also need to install (update) the nanomsg from the name_service branch: https://github.com/tailhook/nanomsg/tree/name_service What Have Been Done -------------------------------- I drew a pretty complex topology with graphviz: https://docs.google.com/file/d/0Byd_FTQksoKLY3lyZnlQalFtRnc/edit And tried to make name service that is able to setup that topology. The aformentioned dotns utility parse topology.dot [1] file and creates addresses for nodes based on the info there.
Nice. I haven't hoped for topologies so complex so soon.
In the meantime: 1. Implemented nanodev utility which basically CLI for nn_device
Right. That'll need to be a part of the package.
2. Added NN_OVERRIDE_HOSTNAME environment variable, to override host name got by gethostname() inside nanomsg, so that I can spawn whole cluster on my single machine
We should think more about this. Not being able to put 2 components on the same box without overloading the hostname is clumsy.
3. Added socket type to the NS request and priority to the NS reply
Ack.
Graph-based Topology --------------------------------- Building the name service based on graph was interesting research project, but is not something that can become production quality. Pros: 1. Watch your structure ahead of time 2. Connections are written in intuitive form (A -> B) Cons: 1. All nodes must be added to topology (probably), not just rules 2. Dot is not well-suited for that, need unique name for each process, duplicate labels with hidden attributes, etc. Resume: Won't work in production environment. It's better to output graph based on some DSL than to derive whatever info from the graph.
+1It's nice to have a tool to draw the graph, however, the full complexity of the setup can only be expressed by a language of some kind.
Protocol Limitations ----------------------------- 1. Can't connect device for NN_PAIR protocol, because there is no way to distinguish between two sockets of the single device (others have two different socket types e.g. REQ vs. REP). Do we need device for PAIR socket as a part of topology?
Devices for PAIR are probably useless, but devices for BUS would suffer similar problems IMO.
2. Need to override hostname for running multiple nodes on development machine (It's possible to use CLONE_NEWUTS in linux, but that's rather too complex) The Complicated Stuff About Topologies ----------------------------------------------------------- While working on the project I've got the following issues: 1. It's possible that topology contains cycle. E.g. in diagram above if all workers are killed, the messages would loop forever. There are probably no good protection for this except limit for the trace size, as specified in draft RFC. Is the limit implemented?
Not yet :(
2. When setting up complex topology it's possible that all nodes downstream the device are off. E.g. on the picture if workers 1 and 2 in box "wally" are off, but the "balancer" is not. Messages will still be sent to the node. It may be fixed with two ways: 2a) Adding lower priority connection from balancer to somewhere. Even adding it to the upstream device would work (with appropriate cycle detection and efficiency statistics). E.g. add a connection between "balancer" in box "wally" and the device in box "laura". 2b) By updating the name service records, using a feedback from monitoring. That requires updating name records "on the fly". Which is probably a topic of another big discussion.
My understanding was that the device with no downstream nodes will stop receiving messages and eventually apply pushback to the upstream node. The node then stops sending new messages to the device and will send them to a lower priority connection instead.
If no downstream node becomes available for an extended period of time, the stuck requests will be eventually resent by the original requesters and routed elsewhere.
3. The priorities stuff is complex to setup. Note there are only three low priority connections on the scheme that make whole cluster to be switched when no workers available. Note also that the only priorities do prevent messages being bounced in the cluster forever (in the case there are at least single worker). All in all there are many tiny things that is complex to setup unless there are appropriate tools.
So what exactly do you feel is complex about prioritised connections? Loop avoidance?
Socket Options ---------------------- I've hesitated to put socket options to the name service. But after some experimenting, I think there are strong reasons to put at least these: * IPv4ONLY, so that programmers don't need to know what it is * SO_REUSEPORT, for processes that should bind on single port, since it's very dependent on the topology, and it's very dangerous to just use SO_REUSEPORT everywhere Regarding other options like SNDBUF, RCVBUF, they might or might not be put there. Their tuning is probably a task of administrator, but they don't change very often, and they depend on application more than on exact place a component put in a topology.
I would say SNDBUF, RCVBUF depend on things like throughput, how fast you want the backpressure to kick in etc. Which seems to be something that admin knows rather than programmer.
What's next ------------------- I don't believe DNS might solve our problem fully. So if someone wants to try it's a good time to step up and make a prototype. (otherwise I'll do a simple mapping of name/config service to DNS somewhere near the point we agree on the protocol and start integrating the changes into mainline)
I am having a kind of dilemma about this one.DNS is a database that's already there, installed and accessible from *everywhere*. That's such a tremendous advantage that it almost beats any possible downside. Further, its distributed, global character and caching capabilities make it almost irresistible.
That being said, some requirements we have don't seem to match to DNS (rule evaluation) but that may well be just a problem with ourselves rather than with DNS as an technology. After all, DNS is just a generic distributed database with possibility to use generic data types (TXT records).
What I think is worth investigating is how to build topology based on rules, and what exact info is needed for the name/config service to build topology with as little user intervention as possible. So I'll start experimenting with the rules.
+1 I don't believe we can do anything sane without rules.The big question is what kind of criteria should the rules be based on. It's kind of complex as it seems to be difference geospatial information, system characteristics etc.
One option would be to use your hostname overload mechanism in a more generic way. The value would not be treated as a hostname overload, rather an arbitrary string set by admin to be used by the name server as criteria for rule evaluation.
It would be great if everyone show the most complicated topologies they saw in the real life :)
One interesting research area are topologies spanning multiple administrative units. You get this everytime someone from outside of your company connects to your topology. The node is not part of your organisation's topology graph and it may be in fact a device and hide a significant part of the topology behind it.
Martin