On Sun, Jun 23, 2013 at 4:08 PM, Gonzalo Diethelm <gonzalo.diethelm@xxxxxxxxx> wrote: > Therefore, one must answer: what should be ported to Go? > > One way to approach this, as it was already pointed out, would be defining > the nanomsg protocol and implementing that in Go. Perhaps another approach > could be implementing an even more low-level "communications and > synchronization engine" in Go that allows you to then implement something > like the nanomsg protocol on top of that. This might even make it possible > to implement the ZeroMQ protocol on top of that as well (if that makes any > sense)... Still, you would have to clearly state what this really low-level > engine should be composed from. Perhaps this is exactly what nanomsg has > accomplished; I really need to look into the code in more details. As I see it there are two conceptual pieces to nanomsg. One is the protocols, not just stuff like REQ/REP but the nanomsg wire format itself. The other is the "taming of the socket" and a lot of that goes away in Go since its sockets are already pretty tame. In one respect that makes it much easier because so much is done but the downside is that that's where all the blurry lines are going to be because there's going to have to be a lot of "is this already in Go? Is this an artifact of this specific implementation or is it necessary?". After all those blurry lines have been clearly delineated I think it will be mostly downhill from there. To be a true port it would need both but I don't see why we couldn't implement the simpler protocols first. Aside from the feeling of accomplishment it would give us something to use to test the tamed sockets against a C-nanomsg echo server. On Sun, Jun 23, 2013 at 10:34 PM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote: > Check src/transports/utils/protohdr.h&.c That's the state machine that > exchanges 8 bytes when connection is established. So far, there's no actual > data filled in, but that's easy to add that. It looks like it's been renamed streamhdr.{c,h}. Very useful. Thanks. > Each party, IMO, should advertise at least the following properties: > > 1. Some constant tag so that SP communication can be distinguished from > other TCP connections. Currently it's 4 bytes like this: \0\0SP > > 2. Protocol ID (i.e. PUB/SUB, REQ/REP etc.) This also includes version. If > there's a new version of PUB/SUB it can just get new ID. No need for > explicit version field. > > 3. Role of the endpoint in the protocol (e.g. PUB vs. SUB). In the hand shake example I gave I did something that either may be clever or stupid but that seems to have slipped by unnoticed. The socket didn't say "I'm speaking the REP/REQ protocol version 2 and I'm a REQ socket". It said I'm a "REQ version 2 socket". And the other end says "yes I can talk to you" or "I don't know what that means, go away" Regardless of ASCII version binary encoding, I think that's a good way to go about it. Let's say that for some reason I have some super fancy custom SP that has some similarities to pub/sub on some level, just enough that I can handle attaching a sub socket to the topology (but I have no way of handling a pub socket, for whatever reason). The only thing the sub socket needs to know is that "I can work with you" and the only thing my SP cares about is are you one of my sockets or a socket I can work with. If you say I'm pub/sub 2, specifically a sub, that information is still there but I'm a sub 2 is the minimum information required. Maybe that's an unimportant, nonsense distinction, but it makes sense to me. > 4. Topology ID. So, for example, if you have two pub/sub topologies on your > network (e.g. stock quotes vs. stock trades) you want to assign them > different IDs so that node from one topology cannot be accidentally > connected to the other topology. This property needs some more thinking > about though. That seems like something for the PUB/SUB protocol to deal with, not the nanomsg protocol itself. Any intertangling of the two means that possibly the nanomsg code has to be aware of the pub/sub code and vice versa and that means you can't write one without the other and maintenance of one is more likely to affect the other. That seems like a bad road to go to down. It could still be in the nanomsg header and separate in the implementation if it's some blob of protocol specific bytes but the protocol gives to nanomsg to package but then I don't see the advantage of that over just letting the SP come up with its own header for its own needs. Otherwise you either have a bunch of empty bits or not enough to fit what you need. > I personally prefer binary encoding (e.g. fixed 8 byte header) as it makes > it easier for hardware to deal with it, even in high-volume scenarios > (backbone routers etc.) > > Also, when there are new connection-less transports added, the header will > be included into each packet. Thus, making it as short as possible so avoid > excessive bandwidth overhead seems like a good idea. > > Of course, UDP header could be binary while TCP header is text-based, > however, it kind of feels cleaner to strive for similar header style for > different transports. Those are good arguments for packing the encoding tight as possible. And the nanomsg format should be the same regardless of transport (even if particular transports such as UDP require an extra transport specific header before the nanomsg header) I didn't consider connectionless transports. Perhaps the socket type/version should go in the 'transport specific header' and/or a transport specific handshake can determine a one-byte identifying token to use in communication between that pair of sockets? Maybe that last one is too complex and fiddly though. Unless the UDP thing really squashes it I think the socket type's name being ASCII, even if the rest is binary is good, assuming it doesn't have to be plastered on every message. The problem with numbers is that the numbers have to be standardized and even if they're used sequentially initially years on the (name, version) to number table starts to get weird and troublesome to follow as new protocols are added between new versions of old and soon you have sockets being compatiable with 3, 27, 28, 104, and 5689, and that map would have to be in the RFC. If two people come up with SPs on their own and happen to choose the same ident someone's going to have to switch their system over if either wants to open source their SP. Likewise if I have a custom SP not worth open sourcing that uses ident 111 and a new nanomsg comes out that uses 111 for the new version of sub sockets, I have to change it over even though my socket isn't named sub. Maybe the efficiency is worth having to keep a spreadsheet of (socket, version) -> ident map as part of the standard and having everyone else work around that. I don't like it. I'd rather fritter a few extra bytes on peace of mind, but I don't have to like it. Not my protocol: but that's my two cents on the subject. On Mon, Jun 24, 2013 at 7:25 AM, Gonzalo Diethelm <gonzalo.diethelm@xxxxxxxxx> wrote: > That separation sounds good. I would expect that the implementation of > transports doesn't know anything about protocols, and protocols don't know > anything about transports; both of them should only know about core, which > implements the glue between them. Even better if they don't know about core and core only has to know about them :) > Can you think of a really minimal subset of nanomsg that one could design > and implement? Something like "one protocol, one transport, minimal core"? I imagine the simplest is TCP/IP REQ/REP (or just one of the two and have the other side in C for the time being) My guess as to what the port would look like would be: Each network connection has its own goroutine that owns said connection, operates its state machine, and does any transport specific operations necessary. It communicates with a controller (one per nanomsg socket) that handles the queue and message (un)packing, per socket type, and communicates with the nanosocket. The nanosocket just sends commands to the controller and receives replies and is in whatever goroutine the client is using it in. Martin, does that sound like the correct architecture once you clear away all the low-level stuff?