[nanomsg] Re: draft surveyor RFC

From: Garrett D'Amore <garrett@xxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Fri, 20 Feb 2015 08:16:17 -0800
> On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:
> 
> On 2015-02-19 22:08, Garrett D'Amore wrote:
> 
>> Thinking about it further, I think this is a *bad* idea.  The problem
>> is that then don’t have a way to infer stack depth easily — which
>> makes it impossible to count hops and problematic therefore for loop
>> prevention.
>> Additionally, there may be value in keeping more state (even for UDP)
>> with a pipe than the peer.  Therefore, I’m going to propose that a UDP
>> transport implementation could create pseudo-pipes, with a cache and
>> timeout associated with them, as well as some upper bound.
>> For example, time out any pipe without traffic seen in the last 60
>> seconds.  Then when a new message is received from a different peer,
>> create a pipe ID for it, storing the IP address & port of the peer.
>> When traffic comes in from the same peer, or goes out to it, bump the
>> timer on it.
>> Figure a maximum of “n” UDP pipes to be opened.  For example, 10,000
>> ports.  In the worse case, you’d need to store something like 64 bits
>> for the IP address and port (more for IPv6), plus room for a sweep
>> hand timer (for mark and sweep based timeout, which would be
>> simplest), so data buckets are 8 bytes, and  figure another 32 bytes
>> for tracking linked list linkage (linking buckets in hash table) —
>> plus guess maybe another 8 bytes of over head, so 64 bytes per UDP
>> port.  The some total of this is 64K per 1000 ports, which  comes in
>> at less than a MB for an entire 10,000 ports.  If you want to support
>> up to 1M active unique peers, it gets a little more expensive, but its
>> still only 100MB, which is not that big a deal for modern computers.
>> I doubt many single servers have to deal with 1M unique visitors per
>> minute, and those that do are pretty darned beefy. :-)  (Actually,
>> looking at say Google — which had the highest web visitor count for
>> the month back in May of 2012 they had 173 M unique visitors per
>> month, which is actually only 4004 unique visitors per *minute*.  So
>> having a limit of 1000, or even 10000 max open pipes for one service
>> instance doesn’t seem limiting.)
> 
> First: Why have pseudo-connections at all? (Ignoring the issue of 
> variable-length backtrace records.)

Again, its tracking whatever state might be necessary to process the packet 
*and* return the reply.  To get through your topology state is required.  The 
question is whether all the state lives in the packet, or you are willing to 
let devices along the path participate in state keeping.  Since really  the 
state that is necessary is only required for routing replies, not every 
protocol needs it.  For example pub/sub only really needs a hop count which can 
travel with the frame. (And that’s missing, but another problem to fix later 
for loop prevention.)

There’s another point here too… the middle components may have state doesn’t 
fit well in 32-bits, could even be pretty large.  Forcing that to travel with 
the frame is onerous.

And, then there is a privacy problem.  If all the state needed is kept with the 
frame, then it is exposed on the wire.   This may expose things about my 
internal network (IP addresses and so forth) that I consider private to me.  
That has two potential side effects.  One is security oriented (my internal 
network gets exposed via this protocol), the other is architectural (people can 
start attempting to *use* that knowledge in their applications, violating all 
the nice clean layering that we’ve built; having parseable headers is I think 
ultimately a road to hell.)


> 
> Second: My conceptual image of an UDP socket is a universal radio 
> transmitter/receiver. It can get data from anyone and send data to anyone. No 
> restrictions aside of the limited packet length. If we are going to have 
> udp:// transport I would like to preserve that conceptual image. If, on the 
> other hand, we are going to build a more connection-like transport on top of 
> UDP, let's call it something different. In short, transport functionality 
> should correspond to the transport name.

I don’t see how that is at odds with what I’ve described, for the protocols 
where that makes sense (e.g. BUS).  Now that said, I’m only thinking about 
unicast UDP.  If you’re wanting to figure out ways to use broadcast or 
multicast UDP, *that* feels like a bigger departure — I think some of the 
protocols (such as req/rep) fall down in the face of this.

> 
> Third: Here's another use case for variable-length items, just off the top of 
> my head: Imagine a REQ/REP or SURVEYOR topology spanning from inside of a 
> company to the outside world. The company may not want to expose details of 
> its network to the world (via the traceback records) and thus may choose to 
> place device at the edge of their network that takes the current stack of the 
> request and encrypts it, creating a single mangled record. When the replies 
> arrive at the edge, they are decrypted and the message is routed forward into 
> the corporate network.

That level of privacy is *easier* to achieve by just ripping off the header 
entirely and writing a new one - in fact, if you have some state here you can 
save the backtrace in your state.  You could of course implement that mangling 
bit you just described today instead.  But in that case its going to still 
appear to have a set number of hops.  If the mangled header has a different 
size, that will cause confusion.  It would be bad to store a very much longer 
header than what the message had on ingress, because that would appear to be 
adding hops to a naive examiner.

You know, it occurs to me that we could probably dispense with a lot of these 
if we just changed the final request ID part of the header from a 32-bit word 
(1 + 31 bits) to have a different format; for example, 1+7bits+24 bits.  The 
24-bits would be a pipe ID, and the 7-bits could carry a hop count.  That would 
leave room for up to 16 million pipes, and really who can handle more than that 
simultaneously?  And you’d be able to count up to 127 hops — and frankly nobody 
wants messages bouncing around their network for more hops than that! :-)

If we made *that* change, then we could dispense with most of the header 
payload rules, except to require the following:

a) devices always strip off the same size header that they attach.
b) headers are always grown in increments of 32-bits.
c) each intermediate 32-bit word of a header must have the upper bit cleared.

What transports or protocols do beyond that then becomes a transport/protocol 
decision.

Now it turns out that in my implementation of mangos, the protocol is 
responsible for adding / removing “pipe IDs” to the header, because the 
protocol doesn’t know transport details.  Internally all transports just have a 
32-bit ID assigned by the system, for each pipe they present.  Breaking that 
abstraction would cause serious internal redesign to be done, and that’s not 
something I’d like to do.  But I also keep “connection” state details to offer 
up to APIs as well.  For example, for TLS connections I can present the TLS 
peer certificate that was presented (if any), for websocket I give access to 
the actual enclosing HTTP headers, and and for TCP and things on top if it, I 
give access to the peer’s TCP endpoint address.  (In the future I hope to offer 
access to peer credentials for IPC, and on systems that offer it, on local TCP 
connections too.  There is some ahem — work — to do to make that happen for 
systems because Go doesn’t expose the necessary system calls — yet.  I’m 
probably going to send patches upstream to Go to fix that for illumos/Solaris 
at least.)

        - Garrett
Follow-Ups:
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
References:
- [nanomsg] draft surveyor RFC
  - From: Garrett D'Amore
- [nanomsg] Re: draft surveyor RFC
  - From: Martin Sustrik
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
- [nanomsg] Re: draft surveyor RFC
  - From: Martin Sustrik
[nanomsg] Re: draft surveyor RFC

Other related posts: