[nanomsg] Re: draft surveyor RFC

From: Drew Crawford <drew@xxxxxxxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Sat, 7 Mar 2015 14:11:30 -0600
My opinion can be summarized as follows.

1.  I don’t suffer from the problem motivating this change.
2.  As a corollary to 1, I won’t implement it.  
3. That is not to say that it is a bad solution to the motivating problem, just 
that I don’t need a solution to the motivating problem.
4. I do in principle object to “unnecessarily narrow” problems being solved 
inside new protocol specifications, of which I think this is an example
5. But it is not the only example, and I do not think it is an especially 
egregious departure from existing nanomsg practice, so the real villain is 
another castle
6. At some level we have to decide what is the purpose of the RFC directory.
6a.  If it is descriptive, then this document adequately describes what you are 
doing, and I don’t object to that.  Nor do I understand how an objection would 
be possible to any RFC that adequately describes what somebody is doing.
6b.  If it is prescriptive, then I simply won’t follow it.  But I’m not 
invested enough in the motivating problem to present an argument that those who 
follow it are wrong.

Drew


> On Mar 7, 2015, at 1:00 PM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:
> 
> Bueller?  Bueller?
> 
> Would really really like a solution to this.  Any other opinions (for my 
> approach, or against it)?  Or should I just go ahead and submit a pull 
> request at this point?
> 
>       - Garrett
> 
>> On Feb 25, 2015, at 9:57 AM, Garrett D'Amore <garrett@xxxxxxxxxx 
>> <mailto:garrett@xxxxxxxxxx>> wrote:
>> 
>> So I didn’t see a reply to this.  I’d really like to move forward with this 
>> — I have a need for “fixed” surveyor methods in my application.  I’m writing 
>> the code that does REQ/REP style processing for now - I think this is more 
>> than sufficient for all current needs.  I’d hate to defer fixing this 
>> pending the requirements for the creation of an as yet non-existant UDP 
>> transport.
>> 
>> I’ve certainly convinced myself that even UDP can live with the 32-bit “pipe 
>> IDs” that are currently being embedded in the headers.  Doing so will 
>> require some modest amount of state on the peers, but frankly that’s not 
>> unreasonable, and I think its far better than carrying all that state in the 
>> headers itself.  (I see grave concerns with carrying identifying information 
>> like intermediate IP addresses 
>> 
>>      - Garrett
>> 
>>> On Feb 20, 2015, at 11:16 AM, Garrett D'Amore <garrett@xxxxxxxxxx 
>>> <mailto:garrett@xxxxxxxxxx>> wrote:
>>> 
>>>> 
>>>> On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx 
>>>> <mailto:sustrik@xxxxxxxxxx>> wrote:
>>>> 
>>>> On 2015-02-19 22:08, Garrett D'Amore wrote:
>>>> 
>>>>> Thinking about it further, I think this is a *bad* idea.  The problem
>>>>> is that then don’t have a way to infer stack depth easily — which
>>>>> makes it impossible to count hops and problematic therefore for loop
>>>>> prevention.
>>>>> Additionally, there may be value in keeping more state (even for UDP)
>>>>> with a pipe than the peer.  Therefore, I’m going to propose that a UDP
>>>>> transport implementation could create pseudo-pipes, with a cache and
>>>>> timeout associated with them, as well as some upper bound.
>>>>> For example, time out any pipe without traffic seen in the last 60
>>>>> seconds.  Then when a new message is received from a different peer,
>>>>> create a pipe ID for it, storing the IP address & port of the peer.
>>>>> When traffic comes in from the same peer, or goes out to it, bump the
>>>>> timer on it.
>>>>> Figure a maximum of “n” UDP pipes to be opened.  For example, 10,000
>>>>> ports.  In the worse case, you’d need to store something like 64 bits
>>>>> for the IP address and port (more for IPv6), plus room for a sweep
>>>>> hand timer (for mark and sweep based timeout, which would be
>>>>> simplest), so data buckets are 8 bytes, and  figure another 32 bytes
>>>>> for tracking linked list linkage (linking buckets in hash table) —
>>>>> plus guess maybe another 8 bytes of over head, so 64 bytes per UDP
>>>>> port.  The some total of this is 64K per 1000 ports, which  comes in
>>>>> at less than a MB for an entire 10,000 ports.  If you want to support
>>>>> up to 1M active unique peers, it gets a little more expensive, but its
>>>>> still only 100MB, which is not that big a deal for modern computers.
>>>>> I doubt many single servers have to deal with 1M unique visitors per
>>>>> minute, and those that do are pretty darned beefy. :-)  (Actually,
>>>>> looking at say Google — which had the highest web visitor count for
>>>>> the month back in May of 2012 they had 173 M unique visitors per
>>>>> month, which is actually only 4004 unique visitors per *minute*.  So
>>>>> having a limit of 1000, or even 10000 max open pipes for one service
>>>>> instance doesn’t seem limiting.)
>>>> 
>>>> First: Why have pseudo-connections at all? (Ignoring the issue of 
>>>> variable-length backtrace records.)
>>> 
>>> Again, its tracking whatever state might be necessary to process the packet 
>>> *and* return the reply.  To get through your topology state is required.  
>>> The question is whether all the state lives in the packet, or you are 
>>> willing to let devices along the path participate in state keeping.  Since 
>>> really  the state that is necessary is only required for routing replies, 
>>> not every protocol needs it.  For example pub/sub only really needs a hop 
>>> count which can travel with the frame. (And that’s missing, but another 
>>> problem to fix later for loop prevention.)
>>> 
>>> There’s another point here too… the middle components may have state 
>>> doesn’t fit well in 32-bits, could even be pretty large.  Forcing that to 
>>> travel with the frame is onerous.
>>> 
>>> And, then there is a privacy problem.  If all the state needed is kept with 
>>> the frame, then it is exposed on the wire.   This may expose things about 
>>> my internal network (IP addresses and so forth) that I consider private to 
>>> me.  That has two potential side effects.  One is security oriented (my 
>>> internal network gets exposed via this protocol), the other is 
>>> architectural (people can start attempting to *use* that knowledge in their 
>>> applications, violating all the nice clean layering that we’ve built; 
>>> having parseable headers is I think ultimately a road to hell.)
>>> 
>>> 
>>>> 
>>>> Second: My conceptual image of an UDP socket is a universal radio 
>>>> transmitter/receiver. It can get data from anyone and send data to anyone. 
>>>> No restrictions aside of the limited packet length. If we are going to 
>>>> have udp:// transport I would like to preserve that conceptual image. If, 
>>>> on the other hand, we are going to build a more connection-like transport 
>>>> on top of UDP, let's call it something different. In short, transport 
>>>> functionality should correspond to the transport name.
>>> 
>>> I don’t see how that is at odds with what I’ve described, for the protocols 
>>> where that makes sense (e.g. BUS).  Now that said, I’m only thinking about 
>>> unicast UDP.  If you’re wanting to figure out ways to use broadcast or 
>>> multicast UDP, *that* feels like a bigger departure — I think some of the 
>>> protocols (such as req/rep) fall down in the face of this.
>>> 
>>>> 
>>>> Third: Here's another use case for variable-length items, just off the top 
>>>> of my head: Imagine a REQ/REP or SURVEYOR topology spanning from inside of 
>>>> a company to the outside world. The company may not want to expose details 
>>>> of its network to the world (via the traceback records) and thus may 
>>>> choose to place device at the edge of their network that takes the current 
>>>> stack of the request and encrypts it, creating a single mangled record. 
>>>> When the replies arrive at the edge, they are decrypted and the message is 
>>>> routed forward into the corporate network.
>>> 
>>> That level of privacy is *easier* to achieve by just ripping off the header 
>>> entirely and writing a new one - in fact, if you have some state here you 
>>> can save the backtrace in your state.  You could of course implement that 
>>> mangling bit you just described today instead.  But in that case its going 
>>> to still appear to have a set number of hops.  If the mangled header has a 
>>> different size, that will cause confusion.  It would be bad to store a very 
>>> much longer header than what the message had on ingress, because that would 
>>> appear to be adding hops to a naive examiner.
>>> 
>>> You know, it occurs to me that we could probably dispense with a lot of 
>>> these if we just changed the final request ID part of the header from a 
>>> 32-bit word (1 + 31 bits) to have a different format; for example, 
>>> 1+7bits+24 bits.  The 24-bits would be a pipe ID, and the 7-bits could 
>>> carry a hop count.  That would leave room for up to 16 million pipes, and 
>>> really who can handle more than that simultaneously?  And you’d be able to 
>>> count up to 127 hops — and frankly nobody wants messages bouncing around 
>>> their network for more hops than that! :-)
>>> 
>>> If we made *that* change, then we could dispense with most of the header 
>>> payload rules, except to require the following:
>>> 
>>> a) devices always strip off the same size header that they attach.
>>> b) headers are always grown in increments of 32-bits.
>>> c) each intermediate 32-bit word of a header must have the upper bit 
>>> cleared.
>>> 
>>> What transports or protocols do beyond that then becomes a 
>>> transport/protocol decision.
>>> 
>>> Now it turns out that in my implementation of mangos, the protocol is 
>>> responsible for adding / removing “pipe IDs” to the header, because the 
>>> protocol doesn’t know transport details.  Internally all transports just 
>>> have a 32-bit ID assigned by the system, for each pipe they present.  
>>> Breaking that abstraction would cause serious internal redesign to be done, 
>>> and that’s not something I’d like to do.  But I also keep “connection” 
>>> state details to offer up to APIs as well.  For example, for TLS 
>>> connections I can present the TLS peer certificate that was presented (if 
>>> any), for websocket I give access to the actual enclosing HTTP headers, and 
>>> and for TCP and things on top if it, I give access to the peer’s TCP 
>>> endpoint address.  (In the future I hope to offer access to peer 
>>> credentials for IPC, and on systems that offer it, on local TCP connections 
>>> too.  There is some ahem — work — to do to make that happen for systems 
>>> because Go doesn’t expose the necessary system calls — yet.  I’m probably 
>>> going to send patches upstream to Go to fix that for illumos/Solaris at 
>>> least.)
>>> 
>>>     - Garrett
>> 
>
Follow-Ups:
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
References:
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
[nanomsg] Re: draft surveyor RFC

Other related posts: