[nanomsg] Re: draft surveyor RFC

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Wed, 25 Feb 2015 12:57:59 -0500

So I didn’t see a reply to this.  I’d really like to move forward with this — I 
have a need for “fixed” surveyor methods in my application.  I’m writing the 
code that does REQ/REP style processing for now - I think this is more than 
sufficient for all current needs.  I’d hate to defer fixing this pending the 
requirements for the creation of an as yet non-existant UDP transport.

I’ve certainly convinced myself that even UDP can live with the 32-bit “pipe 
IDs” that are currently being embedded in the headers.  Doing so will require 
some modest amount of state on the peers, but frankly that’s not unreasonable, 
and I think its far better than carrying all that state in the headers itself.  
(I see grave concerns with carrying identifying information like intermediate 
IP addresses in the headers… *that* design is fundamentally flawed as far as 
I’m concerned.)

        - Garrett

> On Feb 20, 2015, at 11:16 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:
> 
>> 
>> On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:
>> 
>> On 2015-02-19 22:08, Garrett D'Amore wrote:
>> 
>>> Thinking about it further, I think this is a *bad* idea.  The problem
>>> is that then don’t have a way to infer stack depth easily — which
>>> makes it impossible to count hops and problematic therefore for loop
>>> prevention.
>>> Additionally, there may be value in keeping more state (even for UDP)
>>> with a pipe than the peer.  Therefore, I’m going to propose that a UDP
>>> transport implementation could create pseudo-pipes, with a cache and
>>> timeout associated with them, as well as some upper bound.
>>> For example, time out any pipe without traffic seen in the last 60
>>> seconds.  Then when a new message is received from a different peer,
>>> create a pipe ID for it, storing the IP address & port of the peer.
>>> When traffic comes in from the same peer, or goes out to it, bump the
>>> timer on it.
>>> Figure a maximum of “n” UDP pipes to be opened.  For example, 10,000
>>> ports.  In the worse case, you’d need to store something like 64 bits
>>> for the IP address and port (more for IPv6), plus room for a sweep
>>> hand timer (for mark and sweep based timeout, which would be
>>> simplest), so data buckets are 8 bytes, and  figure another 32 bytes
>>> for tracking linked list linkage (linking buckets in hash table) —
>>> plus guess maybe another 8 bytes of over head, so 64 bytes per UDP
>>> port.  The some total of this is 64K per 1000 ports, which  comes in
>>> at less than a MB for an entire 10,000 ports.  If you want to support
>>> up to 1M active unique peers, it gets a little more expensive, but its
>>> still only 100MB, which is not that big a deal for modern computers.
>>> I doubt many single servers have to deal with 1M unique visitors per
>>> minute, and those that do are pretty darned beefy. :-)  (Actually,
>>> looking at say Google — which had the highest web visitor count for
>>> the month back in May of 2012 they had 173 M unique visitors per
>>> month, which is actually only 4004 unique visitors per *minute*.  So
>>> having a limit of 1000, or even 10000 max open pipes for one service
>>> instance doesn’t seem limiting.)
>> 
>> First: Why have pseudo-connections at all? (Ignoring the issue of 
>> variable-length backtrace records.)
> 
> Again, its tracking whatever state might be necessary to process the packet 
> *and* return the reply.  To get through your topology state is required.  The 
> question is whether all the state lives in the packet, or you are willing to 
> let devices along the path participate in state keeping.  Since really  the 
> state that is necessary is only required for routing replies, not every 
> protocol needs it.  For example pub/sub only really needs a hop count which 
> can travel with the frame. (And that’s missing, but another problem to fix 
> later for loop prevention.)
> 
> There’s another point here too… the middle components may have state doesn’t 
> fit well in 32-bits, could even be pretty large.  Forcing that to travel with 
> the frame is onerous.
> 
> And, then there is a privacy problem.  If all the state needed is kept with 
> the frame, then it is exposed on the wire.   This may expose things about my 
> internal network (IP addresses and so forth) that I consider private to me.  
> That has two potential side effects.  One is security oriented (my internal 
> network gets exposed via this protocol), the other is architectural (people 
> can start attempting to *use* that knowledge in their applications, violating 
> all the nice clean layering that we’ve built; having parseable headers is I 
> think ultimately a road to hell.)
> 
> 
>> 
>> Second: My conceptual image of an UDP socket is a universal radio 
>> transmitter/receiver. It can get data from anyone and send data to anyone. 
>> No restrictions aside of the limited packet length. If we are going to have 
>> udp:// transport I would like to preserve that conceptual image. If, on the 
>> other hand, we are going to build a more connection-like transport on top of 
>> UDP, let's call it something different. In short, transport functionality 
>> should correspond to the transport name.
> 
> I don’t see how that is at odds with what I’ve described, for the protocols 
> where that makes sense (e.g. BUS).  Now that said, I’m only thinking about 
> unicast UDP.  If you’re wanting to figure out ways to use broadcast or 
> multicast UDP, *that* feels like a bigger departure — I think some of the 
> protocols (such as req/rep) fall down in the face of this.
> 
>> 
>> Third: Here's another use case for variable-length items, just off the top 
>> of my head: Imagine a REQ/REP or SURVEYOR topology spanning from inside of a 
>> company to the outside world. The company may not want to expose details of 
>> its network to the world (via the traceback records) and thus may choose to 
>> place device at the edge of their network that takes the current stack of 
>> the request and encrypts it, creating a single mangled record. When the 
>> replies arrive at the edge, they are decrypted and the message is routed 
>> forward into the corporate network.
> 
> That level of privacy is *easier* to achieve by just ripping off the header 
> entirely and writing a new one - in fact, if you have some state here you can 
> save the backtrace in your state.  You could of course implement that 
> mangling bit you just described today instead.  But in that case its going to 
> still appear to have a set number of hops.  If the mangled header has a 
> different size, that will cause confusion.  It would be bad to store a very 
> much longer header than what the message had on ingress, because that would 
> appear to be adding hops to a naive examiner.
> 
> You know, it occurs to me that we could probably dispense with a lot of these 
> if we just changed the final request ID part of the header from a 32-bit word 
> (1 + 31 bits) to have a different format; for example, 1+7bits+24 bits.  The 
> 24-bits would be a pipe ID, and the 7-bits could carry a hop count.  That 
> would leave room for up to 16 million pipes, and really who can handle more 
> than that simultaneously?  And you’d be able to count up to 127 hops — and 
> frankly nobody wants messages bouncing around their network for more hops 
> than that! :-)
> 
> If we made *that* change, then we could dispense with most of the header 
> payload rules, except to require the following:
> 
> a) devices always strip off the same size header that they attach.
> b) headers are always grown in increments of 32-bits.
> c) each intermediate 32-bit word of a header must have the upper bit cleared.
> 
> What transports or protocols do beyond that then becomes a transport/protocol 
> decision.
> 
> Now it turns out that in my implementation of mangos, the protocol is 
> responsible for adding / removing “pipe IDs” to the header, because the 
> protocol doesn’t know transport details.  Internally all transports just have 
> a 32-bit ID assigned by the system, for each pipe they present.  Breaking 
> that abstraction would cause serious internal redesign to be done, and that’s 
> not something I’d like to do.  But I also keep “connection” state details to 
> offer up to APIs as well.  For example, for TLS connections I can present the 
> TLS peer certificate that was presented (if any), for websocket I give access 
> to the actual enclosing HTTP headers, and and for TCP and things on top if 
> it, I give access to the peer’s TCP endpoint address.  (In the future I hope 
> to offer access to peer credentials for IPC, and on systems that offer it, on 
> local TCP connections too.  There is some ahem — work — to do to make that 
> happen for systems because Go doesn’t expose the necessary system calls — 
> yet.  I’m probably going to send patches upstream to Go to fix that for 
> illumos/Solaris at least.)
> 
>       - Garrett

Other related posts: