[nanomsg] Re: Introduction and questions

From: Martin Sustrik <sustrik@xxxxxxxxxx>
To: jimmy frasche <soapboxcicero@xxxxxxxxx>
Date: Tue, 25 Jun 2013 07:19:09 +0200

On 25/06/13 01:10, jimmy frasche wrote:

Let's say that for some reason I have some super fancy custom SP that
has some similarities to pub/sub on some level, just enough that I can
handle attaching a sub socket to the topology (but I have no way of
handling a pub socket, for whatever reason). The only thing the sub
socket needs to know is that "I can work with you" and the only thing
my SP cares about is are you one of my sockets or a socket I can work
with. If you say I'm pub/sub 2, specifically a sub, that information
is still there but I'm a sub 2 is the minimum information required.

Let's have a look at the whole SP thing from the philosophical point ofview: What's a protocol, say REQ/REP? It's a specification of adistributed algorithm. The algorithm solves some specific problem. Itconstrains user in what he can do, but on the other hand deliverscertain behaviour and guarantees the user can rely on. All the nodes inthe topology cooperate to deliver desired behaviour.

Now, if you connect a socket to the topology which advertises itself asSUB but has different behaviour, you break the contract. The user cannotreason about the behaviour of the topology as a whole any more. He can'trely on the guarantees given by PUB/SUB specification any more. Etc.

That's why the protocol field is separate. By specifying PUB/SUB in thisfield you are basically saying "this node is going to play by to rulesof PUB/SUB specification, implement the distributed algorithm specifiedtherein and will cooperate with other PUB/SUB nodes to form awell-behaved topology."

In other words, if you want a protocol that's similar to PUB/SUB butdiffers slightly from it, define a new SP protocol with differentprotocol ID. Internally, you can of course re-use the PUB/SUBimplementation if you find that useful.

4. Topology ID. So, for example, if you have two pub/sub topologies on your
network (e.g. stock quotes vs. stock trades) you want to assign them
different IDs so that node from one topology cannot be accidentally
connected to the other topology. This property needs some more thinking
about though.


That seems like something for the PUB/SUB protocol to deal with, not
the nanomsg protocol itself. Any intertangling of the two means that
possibly the nanomsg code has to be aware of the pub/sub code and vice
versa and that means you can't write one without the other and
maintenance of one is more likely to affect the other. That seems like
a bad road to go to down.

It could still be in the nanomsg header and separate in the
implementation if it's some blob of protocol specific bytes but the
protocol gives to nanomsg to package but then I don't see the
advantage of that over just letting the SP come up with its own header
for its own needs. Otherwise you either have a bunch of empty bits or
not enough to fit what you need.


First, it's a generic thing, not specific to PUB/SUB alone.

For example, if you are architecting a stock exchange you'll needfollowing topologies:


1. Posting orders (REQ/REP)
2. Stock quote distribution (PUB/SUB)
3. Trade distribution (PUB/SUB)
4. Management of individual components (REQ/REP)
etc.

The goal of the topology ID is to prevent, for example, a managementclient connecting to the order book.

Additional advantage is that by specifying the topology IDs you suddenlyhave the network traffic categorised based on *business* criteria. Thus,with adequate tools, the network admin can check, for example, what'sthe bandwidth consumed by stock quote feed. Also, he can specify abandwith limit for the stock quotes so that it doesn't exhaust thebandwith needed by other feeds.

I personally prefer binary encoding (e.g. fixed 8 byte header) as it makes
it easier for hardware to deal with it, even in high-volume scenarios
(backbone routers etc.)

Also, when there are new connection-less transports added, the header will
be included into each packet. Thus, making it as short as possible so avoid
excessive bandwidth overhead seems like a good idea.

Of course, UDP header could be binary while TCP header is text-based,
however, it kind of feels cleaner to strive for similar header style for
different transports.


Those are good arguments for packing the encoding tight as possible.
And the nanomsg format should be the same regardless of transport
(even if particular transports such as UDP require an extra transport
specific header before the nanomsg header)

I didn't consider connectionless transports. Perhaps the socket
type/version should go in the 'transport specific header' and/or a
transport specific handshake can determine a one-byte identifying
token to use in communication between that pair of sockets? Maybe that
last one is too complex and fiddly though.

More importantly, the line between "transport-specific" and"transport-agnostic" part if pretty blurry. And given that we arespeaking about few byte headers here, I would just make the whole headertransport-specific. That'll provide the most flexibility for the transports.

Unless the UDP thing really squashes it I think the socket type's name
being ASCII, even if the rest is binary is good, assuming it doesn't
have to be plastered on every message.

The problem with numbers is that the numbers have to be standardized
and even if they're used sequentially initially years on the (name,
version) to number table starts to get weird and troublesome to follow
as new protocols are added between new versions of old and soon you
have sockets being compatiable with 3, 27, 28, 104, and 5689, and that
map would have to be in the RFC. If two people come up with SPs on
their own and happen to choose the same ident someone's going to have
to switch their system over if either wants to open source their SP.
Likewise if I have a custom SP not worth open sourcing that uses ident
111 and a new nanomsg comes out that uses 111 for the new version of
sub sockets, I have to change it over even though my socket isn't
named sub.

Maybe the efficiency is worth having to keep a spreadsheet of (socket,
version) ->  ident map as part of the standard and having everyone else
work around that. I don't like it. I'd rather fritter a few extra
bytes on peace of mind, but I don't have to like it. Not my protocol:
but that's my two cents on the subject.

You'll have the same problem with textual names. The words that makesense as socket types are rather limited in number so you are going toget clashes.

In either case you need a central authority to keep the list of existingprotocols/socket types. The obvious choice for that is IANA. (See, e.g.the list of TCP ports managed by IANA.) Till then we can just keep thetable on the web page somewhere.

Each network connection has its own goroutine that owns said
connection, operates its state machine, and does any transport
specific operations necessary.

It communicates with a controller (one per nanomsg socket) that
handles the queue and message (un)packing, per socket type, and
communicates with the nanosocket.

The nanosocket just sends commands to the controller and receives
replies and is in whatever goroutine the client is using it in.

Martin, does that sound like the correct architecture once you clear
away all the low-level stuff?

I think there's an "endpoint" object missing. So, when you do "nn_bind("tcp://127.0.0.1:5555") an endpoint is created, which, itself, has alist of connections.


Martin

Follow-Ups:
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche

References:
- [nanomsg] Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Ondrej Kupka
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Ondrej Kupka
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche
- [nanomsg] Re: Introduction and questions
  - From: Martin Sustrik
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: Martin Sustrik
- [nanomsg] Re: Introduction and questions
  - From: Gonzalo Diethelm
- [nanomsg] Re: Introduction and questions
  - From: jimmy frasche

[nanomsg] Re: Introduction and questions

Other related posts: