On 25/06/13 01:10, jimmy frasche wrote:
Let's say that for some reason I have some super fancy custom SP that has some similarities to pub/sub on some level, just enough that I can handle attaching a sub socket to the topology (but I have no way of handling a pub socket, for whatever reason). The only thing the sub socket needs to know is that "I can work with you" and the only thing my SP cares about is are you one of my sockets or a socket I can work with. If you say I'm pub/sub 2, specifically a sub, that information is still there but I'm a sub 2 is the minimum information required.
Let's have a look at the whole SP thing from the philosophical point of view: What's a protocol, say REQ/REP? It's a specification of a distributed algorithm. The algorithm solves some specific problem. It constrains user in what he can do, but on the other hand delivers certain behaviour and guarantees the user can rely on. All the nodes in the topology cooperate to deliver desired behaviour.
Now, if you connect a socket to the topology which advertises itself as SUB but has different behaviour, you break the contract. The user cannot reason about the behaviour of the topology as a whole any more. He can't rely on the guarantees given by PUB/SUB specification any more. Etc.
That's why the protocol field is separate. By specifying PUB/SUB in this field you are basically saying "this node is going to play by to rules of PUB/SUB specification, implement the distributed algorithm specified therein and will cooperate with other PUB/SUB nodes to form a well-behaved topology."
In other words, if you want a protocol that's similar to PUB/SUB but differs slightly from it, define a new SP protocol with different protocol ID. Internally, you can of course re-use the PUB/SUB implementation if you find that useful.
4. Topology ID. So, for example, if you have two pub/sub topologies on your network (e.g. stock quotes vs. stock trades) you want to assign them different IDs so that node from one topology cannot be accidentally connected to the other topology. This property needs some more thinking about though.That seems like something for the PUB/SUB protocol to deal with, not the nanomsg protocol itself. Any intertangling of the two means that possibly the nanomsg code has to be aware of the pub/sub code and vice versa and that means you can't write one without the other and maintenance of one is more likely to affect the other. That seems like a bad road to go to down. It could still be in the nanomsg header and separate in the implementation if it's some blob of protocol specific bytes but the protocol gives to nanomsg to package but then I don't see the advantage of that over just letting the SP come up with its own header for its own needs. Otherwise you either have a bunch of empty bits or not enough to fit what you need.
First, it's a generic thing, not specific to PUB/SUB alone.For example, if you are architecting a stock exchange you'll need following topologies:
1. Posting orders (REQ/REP) 2. Stock quote distribution (PUB/SUB) 3. Trade distribution (PUB/SUB) 4. Management of individual components (REQ/REP) etc.The goal of the topology ID is to prevent, for example, a management client connecting to the order book.
Additional advantage is that by specifying the topology IDs you suddenly have the network traffic categorised based on *business* criteria. Thus, with adequate tools, the network admin can check, for example, what's the bandwidth consumed by stock quote feed. Also, he can specify a bandwith limit for the stock quotes so that it doesn't exhaust the bandwith needed by other feeds.
I personally prefer binary encoding (e.g. fixed 8 byte header) as it makes it easier for hardware to deal with it, even in high-volume scenarios (backbone routers etc.) Also, when there are new connection-less transports added, the header will be included into each packet. Thus, making it as short as possible so avoid excessive bandwidth overhead seems like a good idea. Of course, UDP header could be binary while TCP header is text-based, however, it kind of feels cleaner to strive for similar header style for different transports.Those are good arguments for packing the encoding tight as possible. And the nanomsg format should be the same regardless of transport (even if particular transports such as UDP require an extra transport specific header before the nanomsg header) I didn't consider connectionless transports. Perhaps the socket type/version should go in the 'transport specific header' and/or a transport specific handshake can determine a one-byte identifying token to use in communication between that pair of sockets? Maybe that last one is too complex and fiddly though.
More importantly, the line between "transport-specific" and "transport-agnostic" part if pretty blurry. And given that we are speaking about few byte headers here, I would just make the whole header transport-specific. That'll provide the most flexibility for the transports.
Unless the UDP thing really squashes it I think the socket type's name being ASCII, even if the rest is binary is good, assuming it doesn't have to be plastered on every message. The problem with numbers is that the numbers have to be standardized and even if they're used sequentially initially years on the (name, version) to number table starts to get weird and troublesome to follow as new protocols are added between new versions of old and soon you have sockets being compatiable with 3, 27, 28, 104, and 5689, and that map would have to be in the RFC. If two people come up with SPs on their own and happen to choose the same ident someone's going to have to switch their system over if either wants to open source their SP. Likewise if I have a custom SP not worth open sourcing that uses ident 111 and a new nanomsg comes out that uses 111 for the new version of sub sockets, I have to change it over even though my socket isn't named sub. Maybe the efficiency is worth having to keep a spreadsheet of (socket, version) -> ident map as part of the standard and having everyone else work around that. I don't like it. I'd rather fritter a few extra bytes on peace of mind, but I don't have to like it. Not my protocol: but that's my two cents on the subject.
You'll have the same problem with textual names. The words that make sense as socket types are rather limited in number so you are going to get clashes.
In either case you need a central authority to keep the list of existing protocols/socket types. The obvious choice for that is IANA. (See, e.g. the list of TCP ports managed by IANA.) Till then we can just keep the table on the web page somewhere.
Each network connection has its own goroutine that owns said connection, operates its state machine, and does any transport specific operations necessary. It communicates with a controller (one per nanomsg socket) that handles the queue and message (un)packing, per socket type, and communicates with the nanosocket. The nanosocket just sends commands to the controller and receives replies and is in whatever goroutine the client is using it in. Martin, does that sound like the correct architecture once you clear away all the low-level stuff?
I think there's an "endpoint" object missing. So, when you do "nn_bind ("tcp://127.0.0.1:5555") an endpoint is created, which, itself, has a list of connections.
Martin