[nanomsg] Re: draft surveyor RFC

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Sun, 08 Mar 2015 13:39:58 +0100

Hi Garrett,

It's entirely up to you. I've just wanted to mention that the transports may appreciate if they could store more than 31 bits in the backtrace record -- but it is no hard requirement. Given that req/rep uses 31-bit IDs today may event imply that surveyor should do the same for consistency's sake.

Martin

On 2015-03-07 20:00, Garrett D'Amore wrote:
Bueller?  Bueller?

Would really really like a solution to this.  Any other opinions (for
my approach, or against it)?  Or should I just go ahead and submit a
pull request at this point?

        - Garrett

On Feb 25, 2015, at 9:57 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:

So I didn’t see a reply to this. I’d really like to move forward with this — I have a need for “fixed” surveyor methods in my application. I’m writing the code that does REQ/REP style processing for now - I think this is more than sufficient for all current needs. I’d hate to defer fixing this pending the requirements for the creation of an as yet non-existant UDP transport.

I’ve certainly convinced myself that even UDP can live with the 32-bit “pipe IDs” that are currently being embedded in the headers. Doing so will require some modest amount of state on the peers, but frankly that’s not unreasonable, and I think its far better than carrying all that state in the headers itself. (I see grave concerns with carrying identifying information like intermediate IP addresses

        - Garrett

On Feb 20, 2015, at 11:16 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:


On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:

On 2015-02-19 22:08, Garrett D'Amore wrote:

Thinking about it further, I think this is a *bad* idea. The problem
is that then don’t have a way to infer stack depth easily — which
makes it impossible to count hops and problematic therefore for loop
prevention.
Additionally, there may be value in keeping more state (even for UDP) with a pipe than the peer. Therefore, I’m going to propose that a UDP transport implementation could create pseudo-pipes, with a cache and
timeout associated with them, as well as some upper bound.
For example, time out any pipe without traffic seen in the last 60
seconds. Then when a new message is received from a different peer,
create a pipe ID for it, storing the IP address & port of the peer.
When traffic comes in from the same peer, or goes out to it, bump the
timer on it.
Figure a maximum of “n” UDP pipes to be opened. For example, 10,000 ports. In the worse case, you’d need to store something like 64 bits
for the IP address and port (more for IPv6), plus room for a sweep
hand timer (for mark and sweep based timeout, which would be
simplest), so data buckets are 8 bytes, and figure another 32 bytes
for tracking linked list linkage (linking buckets in hash table) —
plus guess maybe another 8 bytes of over head, so 64 bytes per UDP
port. The some total of this is 64K per 1000 ports, which comes in at less than a MB for an entire 10,000 ports. If you want to support up to 1M active unique peers, it gets a little more expensive, but its still only 100MB, which is not that big a deal for modern computers. I doubt many single servers have to deal with 1M unique visitors per
minute, and those that do are pretty darned beefy. :-)  (Actually,
looking at say Google — which had the highest web visitor count for
the month back in May of 2012 they had 173 M unique visitors per
month, which is actually only 4004 unique visitors per *minute*. So having a limit of 1000, or even 10000 max open pipes for one service
instance doesn’t seem limiting.)

First: Why have pseudo-connections at all? (Ignoring the issue of variable-length backtrace records.)

Again, its tracking whatever state might be necessary to process the packet *and* return the reply. To get through your topology state is required. The question is whether all the state lives in the packet, or you are willing to let devices along the path participate in state keeping. Since really the state that is necessary is only required for routing replies, not every protocol needs it. For example pub/sub only really needs a hop count which can travel with the frame. (And that’s missing, but another problem to fix later for loop prevention.)

There’s another point here too… the middle components may have state doesn’t fit well in 32-bits, could even be pretty large. Forcing that to travel with the frame is onerous.

And, then there is a privacy problem. If all the state needed is kept with the frame, then it is exposed on the wire. This may expose things about my internal network (IP addresses and so forth) that I consider private to me. That has two potential side effects. One is security oriented (my internal network gets exposed via this protocol), the other is architectural (people can start attempting to *use* that knowledge in their applications, violating all the nice clean layering that we’ve built; having parseable headers is I think ultimately a road to hell.)



Second: My conceptual image of an UDP socket is a universal radio transmitter/receiver. It can get data from anyone and send data to anyone. No restrictions aside of the limited packet length. If we are going to have udp:// transport I would like to preserve that conceptual image. If, on the other hand, we are going to build a more connection-like transport on top of UDP, let's call it something different. In short, transport functionality should correspond to the transport name.

I don’t see how that is at odds with what I’ve described, for the protocols where that makes sense (e.g. BUS). Now that said, I’m only thinking about unicast UDP. If you’re wanting to figure out ways to use broadcast or multicast UDP, *that* feels like a bigger departure — I think some of the protocols (such as req/rep) fall down in the face of this.


Third: Here's another use case for variable-length items, just off the top of my head: Imagine a REQ/REP or SURVEYOR topology spanning from inside of a company to the outside world. The company may not want to expose details of its network to the world (via the traceback records) and thus may choose to place device at the edge of their network that takes the current stack of the request and encrypts it, creating a single mangled record. When the replies arrive at the edge, they are decrypted and the message is routed forward into the corporate network.

That level of privacy is *easier* to achieve by just ripping off the header entirely and writing a new one - in fact, if you have some state here you can save the backtrace in your state. You could of course implement that mangling bit you just described today instead. But in that case its going to still appear to have a set number of hops. If the mangled header has a different size, that will cause confusion. It would be bad to store a very much longer header than what the message had on ingress, because that would appear to be adding hops to a naive examiner.

You know, it occurs to me that we could probably dispense with a lot of these if we just changed the final request ID part of the header from a 32-bit word (1 + 31 bits) to have a different format; for example, 1+7bits+24 bits. The 24-bits would be a pipe ID, and the 7-bits could carry a hop count. That would leave room for up to 16 million pipes, and really who can handle more than that simultaneously? And you’d be able to count up to 127 hops — and frankly nobody wants messages bouncing around their network for more hops than that! :-)

If we made *that* change, then we could dispense with most of the header payload rules, except to require the following:

a) devices always strip off the same size header that they attach.
b) headers are always grown in increments of 32-bits.
c) each intermediate 32-bit word of a header must have the upper bit cleared.

What transports or protocols do beyond that then becomes a transport/protocol decision.

Now it turns out that in my implementation of mangos, the protocol is responsible for adding / removing “pipe IDs” to the header, because the protocol doesn’t know transport details. Internally all transports just have a 32-bit ID assigned by the system, for each pipe they present. Breaking that abstraction would cause serious internal redesign to be done, and that’s not something I’d like to do. But I also keep “connection” state details to offer up to APIs as well. For example, for TLS connections I can present the TLS peer certificate that was presented (if any), for websocket I give access to the actual enclosing HTTP headers, and and for TCP and things on top if it, I give access to the peer’s TCP endpoint address. (In the future I hope to offer access to peer credentials for IPC, and on systems that offer it, on local TCP connections too. There is some ahem — work — to do to make that happen for systems because Go doesn’t expose the necessary system calls — yet. I’m probably going to send patches upstream to Go to fix that for illumos/Solaris at least.)

        - Garrett


Other related posts: