So I didn’t see a reply to this. I’d really like to move forward with this — I have a need for “fixed” surveyor methods in my application. I’m writing the code that does REQ/REP style processing for now - I think this is more than sufficient for all current needs. I’d hate to defer fixing this pending the requirements for the creation of an as yet non-existant UDP transport. I’ve certainly convinced myself that even UDP can live with the 32-bit “pipe IDs” that are currently being embedded in the headers. Doing so will require some modest amount of state on the peers, but frankly that’s not unreasonable, and I think its far better than carrying all that state in the headers itself. (I see grave concerns with carrying identifying information like intermediate IP addresses in the headers… *that* design is fundamentally flawed as far as I’m concerned.) - Garrett > On Feb 20, 2015, at 11:16 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote: > >> >> On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote: >> >> On 2015-02-19 22:08, Garrett D'Amore wrote: >> >>> Thinking about it further, I think this is a *bad* idea. The problem >>> is that then don’t have a way to infer stack depth easily — which >>> makes it impossible to count hops and problematic therefore for loop >>> prevention. >>> Additionally, there may be value in keeping more state (even for UDP) >>> with a pipe than the peer. Therefore, I’m going to propose that a UDP >>> transport implementation could create pseudo-pipes, with a cache and >>> timeout associated with them, as well as some upper bound. >>> For example, time out any pipe without traffic seen in the last 60 >>> seconds. Then when a new message is received from a different peer, >>> create a pipe ID for it, storing the IP address & port of the peer. >>> When traffic comes in from the same peer, or goes out to it, bump the >>> timer on it. >>> Figure a maximum of “n” UDP pipes to be opened. For example, 10,000 >>> ports. In the worse case, you’d need to store something like 64 bits >>> for the IP address and port (more for IPv6), plus room for a sweep >>> hand timer (for mark and sweep based timeout, which would be >>> simplest), so data buckets are 8 bytes, and figure another 32 bytes >>> for tracking linked list linkage (linking buckets in hash table) — >>> plus guess maybe another 8 bytes of over head, so 64 bytes per UDP >>> port. The some total of this is 64K per 1000 ports, which comes in >>> at less than a MB for an entire 10,000 ports. If you want to support >>> up to 1M active unique peers, it gets a little more expensive, but its >>> still only 100MB, which is not that big a deal for modern computers. >>> I doubt many single servers have to deal with 1M unique visitors per >>> minute, and those that do are pretty darned beefy. :-) (Actually, >>> looking at say Google — which had the highest web visitor count for >>> the month back in May of 2012 they had 173 M unique visitors per >>> month, which is actually only 4004 unique visitors per *minute*. So >>> having a limit of 1000, or even 10000 max open pipes for one service >>> instance doesn’t seem limiting.) >> >> First: Why have pseudo-connections at all? (Ignoring the issue of >> variable-length backtrace records.) > > Again, its tracking whatever state might be necessary to process the packet > *and* return the reply. To get through your topology state is required. The > question is whether all the state lives in the packet, or you are willing to > let devices along the path participate in state keeping. Since really the > state that is necessary is only required for routing replies, not every > protocol needs it. For example pub/sub only really needs a hop count which > can travel with the frame. (And that’s missing, but another problem to fix > later for loop prevention.) > > There’s another point here too… the middle components may have state doesn’t > fit well in 32-bits, could even be pretty large. Forcing that to travel with > the frame is onerous. > > And, then there is a privacy problem. If all the state needed is kept with > the frame, then it is exposed on the wire. This may expose things about my > internal network (IP addresses and so forth) that I consider private to me. > That has two potential side effects. One is security oriented (my internal > network gets exposed via this protocol), the other is architectural (people > can start attempting to *use* that knowledge in their applications, violating > all the nice clean layering that we’ve built; having parseable headers is I > think ultimately a road to hell.) > > >> >> Second: My conceptual image of an UDP socket is a universal radio >> transmitter/receiver. It can get data from anyone and send data to anyone. >> No restrictions aside of the limited packet length. If we are going to have >> udp:// transport I would like to preserve that conceptual image. If, on the >> other hand, we are going to build a more connection-like transport on top of >> UDP, let's call it something different. In short, transport functionality >> should correspond to the transport name. > > I don’t see how that is at odds with what I’ve described, for the protocols > where that makes sense (e.g. BUS). Now that said, I’m only thinking about > unicast UDP. If you’re wanting to figure out ways to use broadcast or > multicast UDP, *that* feels like a bigger departure — I think some of the > protocols (such as req/rep) fall down in the face of this. > >> >> Third: Here's another use case for variable-length items, just off the top >> of my head: Imagine a REQ/REP or SURVEYOR topology spanning from inside of a >> company to the outside world. The company may not want to expose details of >> its network to the world (via the traceback records) and thus may choose to >> place device at the edge of their network that takes the current stack of >> the request and encrypts it, creating a single mangled record. When the >> replies arrive at the edge, they are decrypted and the message is routed >> forward into the corporate network. > > That level of privacy is *easier* to achieve by just ripping off the header > entirely and writing a new one - in fact, if you have some state here you can > save the backtrace in your state. You could of course implement that > mangling bit you just described today instead. But in that case its going to > still appear to have a set number of hops. If the mangled header has a > different size, that will cause confusion. It would be bad to store a very > much longer header than what the message had on ingress, because that would > appear to be adding hops to a naive examiner. > > You know, it occurs to me that we could probably dispense with a lot of these > if we just changed the final request ID part of the header from a 32-bit word > (1 + 31 bits) to have a different format; for example, 1+7bits+24 bits. The > 24-bits would be a pipe ID, and the 7-bits could carry a hop count. That > would leave room for up to 16 million pipes, and really who can handle more > than that simultaneously? And you’d be able to count up to 127 hops — and > frankly nobody wants messages bouncing around their network for more hops > than that! :-) > > If we made *that* change, then we could dispense with most of the header > payload rules, except to require the following: > > a) devices always strip off the same size header that they attach. > b) headers are always grown in increments of 32-bits. > c) each intermediate 32-bit word of a header must have the upper bit cleared. > > What transports or protocols do beyond that then becomes a transport/protocol > decision. > > Now it turns out that in my implementation of mangos, the protocol is > responsible for adding / removing “pipe IDs” to the header, because the > protocol doesn’t know transport details. Internally all transports just have > a 32-bit ID assigned by the system, for each pipe they present. Breaking > that abstraction would cause serious internal redesign to be done, and that’s > not something I’d like to do. But I also keep “connection” state details to > offer up to APIs as well. For example, for TLS connections I can present the > TLS peer certificate that was presented (if any), for websocket I give access > to the actual enclosing HTTP headers, and and for TCP and things on top if > it, I give access to the peer’s TCP endpoint address. (In the future I hope > to offer access to peer credentials for IPC, and on systems that offer it, on > local TCP connections too. There is some ahem — work — to do to make that > happen for systems because Go doesn’t expose the necessary system calls — > yet. I’m probably going to send patches upstream to Go to fix that for > illumos/Solaris at least.) > > - Garrett