[nanomsg] Re: Introduction and questions

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Mon, 24 Jun 2013 07:34:08 +0200

Hi Jimmy,

I do not like the way multipart messages are handled in zmq.

I'm sure there are very good reasons to glop it onto the socket. I
found the threads where it was introduced in ZMQ but I did not find
the reasons for its inclusion compelling enough to justify the added
complexity.

I'd rather have it be a separate system/microprotocol entirely that
you choose to layer on top of a socket or not that needn't even be
part of the core package.

Exactly. That's currently the case with nanomsg. There's no support for multi-part messages. Later on we can build it on top (a different lib even?) as a light-weight alternative to more complex presentation layer protocols (JSON/XML).

3. We can simply create a new protocol. Sounds like the best option to me,
but it would mean more work with writing the protocol down etc. On the other
hand, writing it down would allow us to do that in RFC format, so that it
can be easily passed to IETF when the time comes.

New protocol(s) would probably be simplest.

I don't know if jumping straight into encoding it into RFCese would be
the easiest path, but the project (any project, really) could only
benefit from clear documentation, regardless of format. If RFC is
easiest for you, that's fine.

I think it should be modular.

One nanomsg protocol, then a protocol for each class of sockets
(PAIR/REQ/REP, PUB/SUB etc). The same way there isn't one RFC for
TCP/IP/HTTP/FTP/TELNET/ET AL.

Yes. That's what I was thinking myself.

Even if the REQ/REP protocol is just "you don't have to do anything
special, just send and/or receive data," and the PUB/SUB protocol is
just that the first bit of the message is a null terminated topic. I'm
sure there'd be more to them than that, but you get the idea.

Yes. There will definitely be more. The protocols look superficially simple, but they are not. There's a lot of issues to consider and the design choices should be clearly worded in the RFCs.

I don't see it anywhere in the code, but I may have missed it, but I'd
like to see a handshake when you connect two sockets. Nothing fancy.
Just, "hi I'm nanomsg wire format x.xx and this is a type Y socket,
version x.xx"† and a simple OK or NOK response (and anything that's
not one of those two is an implicit NOK) before getting down to
business.

Check src/transports/utils/protohdr.h&.c That's the state machine that exchanges 8 bytes when connection is established. So far, there's no actual data filled in, but that's easy to add that.

Each party, IMO, should advertise at least the following properties:

1. Some constant tag so that SP communication can be distinguished from other TCP connections. Currently it's 4 bytes like this: \0\0SP

2. Protocol ID (i.e. PUB/SUB, REQ/REP etc.) This also includes version. If there's a new version of PUB/SUB it can just get new ID. No need for explicit version field.

3. Role of the endpoint in the protocol (e.g. PUB vs. SUB).

4. Topology ID. So, for example, if you have two pub/sub topologies on your network (e.g. stock quotes vs. stock trades) you want to assign them different IDs so that node from one topology cannot be accidentally connected to the other topology. This property needs some more thinking about though.

† it would probably look more like NM:1;REQ:2 which is easy enough for
humans to read, machines to parse, and let's scalability protocols be
added willy nilly without having to come up with magic numbers to
identify them that everyone has to agree on in advance.

I personally prefer binary encoding (e.g. fixed 8 byte header) as it makes it easier for hardware to deal with it, even in high-volume scenarios (backbone routers etc.)

Also, when there are new connection-less transports added, the header will be included into each packet. Thus, making it as short as possible so avoid excessive bandwidth overhead seems like a good idea.

Of course, UDP header could be binary while TCP header is text-based, however, it kind of feels cleaner to strive for similar header style for different transports.

Every message after that, as far as I can see, can just be the message
length header. The protocol for the sockets can prepend anything it
needs before framing the message. Or even, if need be, have a second
protocol-specific handshake to negotiate any special considerations.
Having each SP in charge of whatever extra it needs to add on top of
the framing but letting the lower level take care of negotiations
should also help keep the code modular and easy to expand, too.

Yes. That's the idea.

One thing to keep in mind is that by creating a new implementation has the
drawback of not automatically getting new features from nanomsg. If someone,
say, implements InfiniBand transport for nanomsg, the Go implementation
wouldn't get it for free. Same applies to possible new messaging patterns.

That is a concern. Of course, it cuts both ways. If it turns out much
easier to get a new messaging pattern up and running in the Go port,
it could very well turn into a playground for new patterns even if new
transports come somewhat more slowly. There are downsides, but I think
it would be a net win.

There's also no reason not to have a nanomsg port to Go and a Go
binding, and you choose which best suits your needs.

Yes.

In the long run I would like to have SPs implemented directly in the kernel so that every language has access to the same functionality without need for additional native libraries (we've already done a PoC for that) but looking at the DBus-in-Linux-kernel saga it doesn't seem to happen any time soon.

In the meantime, both bindings and ports sound like a reasonable ideas.

Martin


Other related posts: