[nanomsg] Re: Introduction and questions

From: jimmy frasche <soapboxcicero@xxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Mon, 24 Jun 2013 16:10:58 -0700

On Sun, Jun 23, 2013 at 4:08 PM, Gonzalo Diethelm
<gonzalo.diethelm@xxxxxxxxx> wrote:
> Therefore, one must answer: what should be ported to Go?
>
> One way to approach this, as it was already pointed out, would be defining
> the nanomsg protocol and implementing that in Go. Perhaps another approach
> could be implementing an even more low-level "communications and
> synchronization engine" in Go that allows you to then implement something
> like the nanomsg protocol on top of that. This might even make it possible
> to implement the ZeroMQ protocol on top of that as well (if that makes any
> sense)... Still, you would have to clearly state what this really low-level
> engine should be composed from. Perhaps this is exactly what nanomsg has
> accomplished; I really need to look into the code in more details.

As I see it there are two conceptual pieces to nanomsg.

One is the protocols, not just stuff like REQ/REP but the nanomsg wire
format itself.

The other is the "taming of the socket" and a lot of that goes away in
Go since its sockets are already pretty tame.

In one respect that makes it much easier because so much is done but
the downside is that that's where all the blurry lines are going to be
because there's going to have to be a lot of "is this already in Go?
Is this an artifact of this specific implementation or is it
necessary?". After all those blurry lines have been clearly delineated
I think it will be mostly downhill from there.

To be a true port it would need both but I don't see why we couldn't
implement the simpler protocols first. Aside from the feeling of
accomplishment it would give us something to use to test the tamed
sockets against a C-nanomsg echo server.

On Sun, Jun 23, 2013 at 10:34 PM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:
> Check src/transports/utils/protohdr.h&.c That's the state machine that
> exchanges 8 bytes when connection is established. So far, there's no actual
> data filled in, but that's easy to add that.

It looks like it's been renamed streamhdr.{c,h}. Very useful. Thanks.

> Each party, IMO, should advertise at least the following properties:
>
> 1. Some constant tag so that SP communication can be distinguished from
> other TCP connections. Currently it's 4 bytes like this: \0\0SP
>
> 2. Protocol ID (i.e. PUB/SUB, REQ/REP etc.) This also includes version. If
> there's a new version of PUB/SUB it can just get new ID. No need for
> explicit version field.
>
> 3. Role of the endpoint in the protocol (e.g. PUB vs. SUB).

In the hand shake example I gave I did something that either may be
clever or stupid but that seems to have slipped by unnoticed.

The socket didn't say "I'm speaking the REP/REQ protocol version 2 and
I'm a REQ socket".

It said I'm a "REQ version 2 socket". And the other end says "yes I
can talk to you" or "I don't know what that means, go away"

Regardless of ASCII version binary encoding, I think that's a good way
to go about it.

Let's say that for some reason I have some super fancy custom SP that
has some similarities to pub/sub on some level, just enough that I can
handle attaching a sub socket to the topology (but I have no way of
handling a pub socket, for whatever reason). The only thing the sub
socket needs to know is that "I can work with you" and the only thing
my SP cares about is are you one of my sockets or a socket I can work
with. If you say I'm pub/sub 2, specifically a sub, that information
is still there but I'm a sub 2 is the minimum information required.

Maybe that's an unimportant, nonsense distinction, but it makes sense to me.

> 4. Topology ID. So, for example, if you have two pub/sub topologies on your
> network (e.g. stock quotes vs. stock trades) you want to assign them
> different IDs so that node from one topology cannot be accidentally
> connected to the other topology. This property needs some more thinking
> about though.

That seems like something for the PUB/SUB protocol to deal with, not
the nanomsg protocol itself. Any intertangling of the two means that
possibly the nanomsg code has to be aware of the pub/sub code and vice
versa and that means you can't write one without the other and
maintenance of one is more likely to affect the other. That seems like
a bad road to go to down.

It could still be in the nanomsg header and separate in the
implementation if it's some blob of protocol specific bytes but the
protocol gives to nanomsg to package but then I don't see the
advantage of that over just letting the SP come up with its own header
for its own needs. Otherwise you either have a bunch of empty bits or
not enough to fit what you need.

> I personally prefer binary encoding (e.g. fixed 8 byte header) as it makes
> it easier for hardware to deal with it, even in high-volume scenarios
> (backbone routers etc.)
>
> Also, when there are new connection-less transports added, the header will
> be included into each packet. Thus, making it as short as possible so avoid
> excessive bandwidth overhead seems like a good idea.
>
> Of course, UDP header could be binary while TCP header is text-based,
> however, it kind of feels cleaner to strive for similar header style for
> different transports.

Those are good arguments for packing the encoding tight as possible.
And the nanomsg format should be the same regardless of transport
(even if particular transports such as UDP require an extra transport
specific header before the nanomsg header)

I didn't consider connectionless transports. Perhaps the socket
type/version should go in the 'transport specific header' and/or a
transport specific handshake can determine a one-byte identifying
token to use in communication between that pair of sockets? Maybe that
last one is too complex and fiddly though.

Unless the UDP thing really squashes it I think the socket type's name
being ASCII, even if the rest is binary is good, assuming it doesn't
have to be plastered on every message.

The problem with numbers is that the numbers have to be standardized
and even if they're used sequentially initially years on the (name,
version) to number table starts to get weird and troublesome to follow
as new protocols are added between new versions of old and soon you
have sockets being compatiable with 3, 27, 28, 104, and 5689, and that
map would have to be in the RFC. If two people come up with SPs on
their own and happen to choose the same ident someone's going to have
to switch their system over if either wants to open source their SP.
Likewise if I have a custom SP not worth open sourcing that uses ident
111 and a new nanomsg comes out that uses 111 for the new version of
sub sockets, I have to change it over even though my socket isn't
named sub.

Maybe the efficiency is worth having to keep a spreadsheet of (socket,
version) -> ident map as part of the standard and having everyone else
work around that. I don't like it. I'd rather fritter a few extra
bytes on peace of mind, but I don't have to like it. Not my protocol:
but that's my two cents on the subject.

On Mon, Jun 24, 2013 at 7:25 AM, Gonzalo Diethelm
<gonzalo.diethelm@xxxxxxxxx> wrote:
> That separation sounds good. I would expect that the implementation of
> transports doesn't know anything about protocols, and protocols don't know
> anything about transports; both of them should only know about core, which
> implements the glue between them.

Even better if they don't know about core and core only has to know
about them :)

> Can you think of a really minimal subset of nanomsg that one could design
> and implement? Something like "one protocol, one transport, minimal core"?

I imagine the simplest is TCP/IP REQ/REP (or just one of the two and
have the other side in C for the time being)

My guess as to what the port would look like would be:

Each network connection has its own goroutine that owns said
connection, operates its state machine, and does any transport
specific operations necessary.

It communicates with a controller (one per nanomsg socket) that
handles the queue and message (un)packing, per socket type, and
communicates with the nanosocket.

The nanosocket just sends commands to the controller and receives
replies and is in whatever goroutine the client is using it in.

Martin, does that sound like the correct architecture once you clear
away all the low-level stuff?

Follow-Ups:
- [nanomsg] Re: Introduction and questions
  - From: Martin Sustrik

References:
- [nanomsg] Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Ondrej Kupka
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Ondrej Kupka
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche
- [nanomsg] Re: Introduction and questions
  - From: Martin Sustrik
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Martin Sustrik
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm

[nanomsg] Re: Introduction and questions

Other related posts: