[nanomsg] Re: draft surveyor RFC

  • From: Alex Elsayed <eternaleye@xxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Sat, 07 Mar 2015 17:11:07 -0800

Drew Crawford wrote:

>> I think the RFCs need to be prescriptive at least in terms of
>> interoperability.
> Interoperability with what?
> For fun, I searched github for NN_SURVEYOR.  I found nanomsg forks,
> nanomsg bindings, and 6-line code samples.  I did not see any serious
> users of NN_SURVEYOR.  (Admittedly I did not look beyond the first few
> pages.)  And in any case nobody seems to be responding to your questions
> about the protocol’s future.

In my case, NN_SURVEYOR is very useful; it's just not in public code. And I 
didn't respond mainly because I figured he was asking if people had 
objections, and I have none.

> I doubt very much that there is enough nanomsg deployed at present that
> any breaking change in any protocol would deserve more than a handful of
> scheduled maintenance windows.  I probably run one of the larger nanomsg
> deployments in terms of user number and yet I am the one making breaking
> changes.  That’s a clue.

Sure, that's true. But if people expect code in go using mangos and code in 
c using libnanomsg to interoperate, this is kind of an issue. Similarly for 
any other implementations of nanomsg.

Nanomsg is a protocol. Saying 'interoperability with what?' is, in my view, 
incredibly short-sighted.

>> Otherwise we should all just go our separate ways. That would be
>> unfortunate IMO.
> Well, let’s not kid ourselves.  This is already the present reality.
> libnanomsg is, in my estimation, unmaintained at the present time
> mangos is driving this SURVEYOR change, which few outside it seem to care
> about strongly I have for more than a year now been semi-quietly shipping
> a fork which breaks RFCs at minor points I am currently crossing the
> threshold between minor and major RFC breakage and am making plans to move
> my breaking work into the open

And in my view, the proper way to do that would be... to draft revisions for 
the RFCs. If the semantics are wrong in a *protocol*, silently diverging in 
an implementation is the *last* thing anyone should do. Instead, propose 
revisions of the semantics. Which is exactly what Garrett is doing.

(I've already brought up my concerns over the security framework you've 
chosen in past discussions)

> The ship to produce a single universal standard has sailed.  It’s shuffled
> off the mortal coil.  It is an ex-parrot.

I strongly disagree. A protocol spec only dies when people refuse to work 
together on it. Which is why I feel very strongly that Garrett is very much 
doing the right thing here in proposing changes to the spec, while being 
rather lukewarm on an implementation that diverges from the specs without 
having proposed changes to address what you see as shortcompings.

> I don’t know if I would describe that as “unfortunate”.  It simply means
> people have different problems, and that there is not one solution to rule
> them all.  I know that even as I am breaking RFCs, I am still glad I have
> RFCs to break.  They contain useful ideas.  There are just some details
> that aren’t a good fit for my circumstances.

The whole point of nanomsg is that there's not 'one solution to rule them 
all' - that is, after all, the entire idea of having multiple different 
messaging patterns and transports. If the specs don't suit your needs, 
proposing revisions or new ones is entirely valid. The important thing is 
that it gets _actually discussed_ - which silently diverging and then 
unveiling an incompatible fork doesn't do.

> Returning to the RFC in question, I can tell you that I won’t implement
> it.  Again, that is not to say that the solution is bad, simply that I do
> not suffer from the motivating problem.  As such I don’t think I’m in a
> fair position to evaluate the solution in enough detail to have an
> informed opinion.

As you've made clear that your intent is an incompatible fork regardless 
(due to silently differing semantics), then there seems to be no help for 
it. Though I'll ask you to please ensure your implementation makes a change 
at the protocol layer to be clear that your SP is not nanomsg's SP - better 
to fail at negotiation time than suddenly explode when yours violates 
something other implementations see as an invariant.

> As far as building consensus on getting the RFC merged in, I think the
> operating question is what the support is from the implementations.
> Clearly I am out; but if there is a clear plan to bring it to both
> libnanomsg and mangos then I think there is reason enough to have it
> standardized.

I'm in favor of the RFC as a user, though I'm not an implementor.

> Drew
>> On Mar 7, 2015, at 4:31 PM, Garrett D'Amore
>> <garrett@xxxxxxxxxx> wrote:
>> I think the RFCs need to be prescriptive at least in terms of
>> interoperability.  Otherwise we should all just go our separate ways.
>> That would be unfortunate IMO.  The current implementation in libnanomsg
>> cannot support devices properly with survey nor can it support multiple
>> surveyors connected to a responder.  This is a bug that must be fixed.
>> One could argue that the approach of using backtraces could be an option
>> maybe a new protocol altogether.  I imagine you'd prefer to have the
>> inclusion be optional so that trivial 1:1 topologies or those without
>> replies could skip it. But that seems like a substantial departure from
>> existing practice.
>> The question therefore I think is whether the old protocol needs to be
>> preserved in parallel to the new one and whether this approach to the fix
>> is generally agreeable.
>> I'd vote to drop the legacy broken protocol from libnanomsg but I can see
>> that some may want to preserve it for compatibility.
>> Sent from my iPhone
>> On Mar 7, 2015, at 12:11 PM, Drew Crawford
>> <drew@xxxxxxxxxxxxxxxxxx
>> <mailto:drew@xxxxxxxxxxxxxxxxxx>> wrote:
>>> My opinion can be summarized as follows.
>>> 1.  I don’t suffer from the problem motivating this change.
>>> 2.  As a corollary to 1, I won’t implement it.
>>> 3. That is not to say that it is a bad solution to the motivating
>>> problem, just that I don’t need a solution to the motivating problem. 4.
>>> I do in principle object to “unnecessarily narrow” problems being solved
>>> inside new protocol specifications, of which I think this is an example
>>> 5. But it is not the only example, and I do not think it is an
>>> especially egregious departure from existing nanomsg practice, so the
>>> real villain is another castle 6. At some level we have to decide what
>>> is the purpose of the RFC directory.
>>> 6a.  If it is descriptive, then this document adequately describes what
>>> you are doing, and I don’t object to that.  Nor do I understand how an
>>> objection would be possible to any RFC that adequately describes what
>>> somebody is doing.
>>> 6b.  If it is prescriptive, then I simply won’t follow it.  But I’m not
>>> invested enough in the motivating problem to present an argument that
>>> those who follow it are wrong.
>>> Drew
>>>> On Mar 7, 2015, at 1:00 PM, Garrett D'Amore
>>>> <garrett@xxxxxxxxxx
>>>> <mailto:garrett@xxxxxxxxxx>> wrote:
>>>> Bueller?  Bueller?
>>>> Would really really like a solution to this.  Any other opinions (for
>>>> my approach, or against it)?  Or should I just go ahead and submit a
>>>> pull request at this point?
>>>> - Garrett
>>>>> On Feb 25, 2015, at 9:57 AM, Garrett D'Amore
>>>>> <garrett@xxxxxxxxxx
>>>>> <mailto:garrett@xxxxxxxxxx>> wrote:
>>>>> So I didn’t see a reply to this.  I’d really like to move forward with
>>>>> this — I have a need for “fixed” surveyor methods in my application. 
>>>>> I’m writing the code that does REQ/REP style processing for now - I
>>>>> think this is more than sufficient for all current needs.  I’d hate to
>>>>> defer fixing this pending the requirements for the creation of an as
>>>>> yet non-existant UDP transport.
>>>>> I’ve certainly convinced myself that even UDP can live with the 32-bit
>>>>> “pipe IDs” that are currently being embedded in the headers.  Doing so
>>>>> will require some modest amount of state on the peers, but frankly
>>>>> that’s not unreasonable, and I think its far better than carrying all
>>>>> that state in the headers itself.  (I see grave concerns with carrying
>>>>> identifying information like intermediate IP addresses
>>>>> - Garrett
>>>>>> On Feb 20, 2015, at 11:16 AM, Garrett D'Amore
>>>>>> <garrett@xxxxxxxxxx
>>>>>> <mailto:garrett@xxxxxxxxxx>> wrote:
>>>>>>> On Feb 20, 2015, at 12:49 AM, Martin Sustrik
>>>>>>> <sustrik@xxxxxxxxxx
>>>>>>> <mailto:sustrik@xxxxxxxxxx>> wrote:
>>>>>>> On 2015-02-19 22:08, Garrett D'Amore wrote:
>>>>>>>> Thinking about it further, I think this is a *bad* idea.  The
>>>>>>>> problem is that then don’t have a way to infer stack depth easily —
>>>>>>>> which makes it impossible to count hops and problematic therefore
>>>>>>>> for loop prevention.
>>>>>>>> Additionally, there may be value in keeping more state (even for
>>>>>>>> UDP)
>>>>>>>> with a pipe than the peer.  Therefore, I’m going to propose that a
>>>>>>>> UDP transport implementation could create pseudo-pipes, with a
>>>>>>>> cache and timeout associated with them, as well as some upper
>>>>>>>> bound. For example, time out any pipe without traffic seen in the
>>>>>>>> last 60
>>>>>>>> seconds.  Then when a new message is received from a different
>>>>>>>> peer, create a pipe ID for it, storing the IP address & port of the
>>>>>>>> peer. When traffic comes in from the same peer, or goes out to it,
>>>>>>>> bump the timer on it.
>>>>>>>> Figure a maximum of “n” UDP pipes to be opened.  For example,
>>>>>>>> 10,000
>>>>>>>> ports.  In the worse case, you’d need to store something like 64
>>>>>>>> bits for the IP address and port (more for IPv6), plus room for a
>>>>>>>> sweep hand timer (for mark and sweep based timeout, which would be
>>>>>>>> simplest), so data buckets are 8 bytes, and  figure another 32
>>>>>>>> bytes for tracking linked list linkage (linking buckets in hash
>>>>>>>> table) — plus guess maybe another 8 bytes of over head, so 64 bytes
>>>>>>>> per UDP
>>>>>>>> port.  The some total of this is 64K per 1000 ports, which  comes
>>>>>>>> in
>>>>>>>> at less than a MB for an entire 10,000 ports.  If you want to
>>>>>>>> support up to 1M active unique peers, it gets a little more
>>>>>>>> expensive, but its still only 100MB, which is not that big a deal
>>>>>>>> for modern computers. I doubt many single servers have to deal with
>>>>>>>> 1M unique visitors per
>>>>>>>> minute, and those that do are pretty darned beefy. :-)  (Actually,
>>>>>>>> looking at say Google — which had the highest web visitor count for
>>>>>>>> the month back in May of 2012 they had 173 M unique visitors per
>>>>>>>> month, which is actually only 4004 unique visitors per *minute*. 
>>>>>>>> So having a limit of 1000, or even 10000 max open pipes for one
>>>>>>>> service instance doesn’t seem limiting.)
>>>>>>> First: Why have pseudo-connections at all? (Ignoring the issue of
>>>>>>> variable-length backtrace records.)
>>>>>> Again, its tracking whatever state might be necessary to process the
>>>>>> packet *and* return the reply.  To get through your topology state is
>>>>>> required.  The question is whether all the state lives in the packet,
>>>>>> or you are willing to let devices along the path participate in state
>>>>>> keeping.  Since really  the state that is necessary is only required
>>>>>> for routing replies, not every protocol needs it.  For example
>>>>>> pub/sub only really needs a hop count which can travel with the
>>>>>> frame. (And that’s missing, but another problem to fix later for loop
>>>>>> prevention.)
>>>>>> There’s another point here too… the middle components may have state
>>>>>> doesn’t fit well in 32-bits, could even be pretty large.  Forcing
>>>>>> that to travel with the frame is onerous.
>>>>>> And, then there is a privacy problem.  If all the state needed is
>>>>>> kept with the frame, then it is exposed on the wire.   This may
>>>>>> expose things about my internal network (IP addresses and so forth)
>>>>>> that I consider private to me.  That has two potential side effects. 
>>>>>> One is security oriented (my internal network gets exposed via this
>>>>>> protocol), the other is architectural (people can start attempting to
>>>>>> *use* that knowledge in their applications, violating all the nice
>>>>>> clean layering that we’ve built; having parseable headers is I think
>>>>>> ultimately a road to hell.)
>>>>>>> Second: My conceptual image of an UDP socket is a universal radio
>>>>>>> transmitter/receiver. It can get data from anyone and send data to
>>>>>>> anyone. No restrictions aside of the limited packet length. If we
>>>>>>> are going to have udp:// transport I would like to preserve that
>>>>>>> conceptual image. If, on the other hand, we are going to build a
>>>>>>> more connection-like transport on top of UDP, let's call it
>>>>>>> something different. In short, transport functionality should
>>>>>>> correspond to the transport name.
>>>>>> I don’t see how that is at odds with what I’ve described, for the
>>>>>> protocols where that makes sense (e.g. BUS).  Now that said, I’m only
>>>>>> thinking about unicast UDP.  If you’re wanting to figure out ways to
>>>>>> use broadcast or multicast UDP, *that* feels like a bigger departure
>>>>>> — I think some of the protocols (such as req/rep) fall down in the
>>>>>> face of this.
>>>>>>> Third: Here's another use case for variable-length items, just off
>>>>>>> the top of my head: Imagine a REQ/REP or SURVEYOR topology spanning
>>>>>>> from inside of a company to the outside world. The company may not
>>>>>>> want to expose details of its network to the world (via the
>>>>>>> traceback records) and thus may choose to place device at the edge
>>>>>>> of their network that takes the current stack of the request and
>>>>>>> encrypts it, creating a single mangled record. When the replies
>>>>>>> arrive at the edge, they are decrypted and the message is routed
>>>>>>> forward into the corporate network.
>>>>>> That level of privacy is *easier* to achieve by just ripping off the
>>>>>> header entirely and writing a new one - in fact, if you have some
>>>>>> state here you can save the backtrace in your state.  You could of
>>>>>> course implement that mangling bit you just described today instead. 
>>>>>> But in that case its going to still appear to have a set number of
>>>>>> hops.  If the mangled header has a different size, that will cause
>>>>>> confusion.  It would be bad to store a very much longer header than
>>>>>> what the message had on ingress, because that would appear to be
>>>>>> adding hops to a naive examiner.
>>>>>> You know, it occurs to me that we could probably dispense with a lot
>>>>>> of these if we just changed the final request ID part of the header
>>>>>> from a 32-bit word (1 + 31 bits) to have a different format; for
>>>>>> example, 1+7bits+24 bits.  The 24-bits would be a pipe ID, and the
>>>>>> 7-bits could carry a hop count.  That would leave room for up to 16
>>>>>> million pipes, and really who can handle more than that
>>>>>> simultaneously?  And you’d be able to count up to 127 hops — and
>>>>>> frankly nobody wants messages bouncing around their network for more
>>>>>> hops than that! :-)
>>>>>> If we made *that* change, then we could dispense with most of the
>>>>>> header payload rules, except to require the following:
>>>>>> a) devices always strip off the same size header that they attach.
>>>>>> b) headers are always grown in increments of 32-bits.
>>>>>> c) each intermediate 32-bit word of a header must have the upper bit
>>>>>> cleared.
>>>>>> What transports or protocols do beyond that then becomes a
>>>>>> transport/protocol decision.
>>>>>> Now it turns out that in my implementation of mangos, the protocol is
>>>>>> responsible for adding / removing “pipe IDs” to the header, because
>>>>>> the protocol doesn’t know transport details.  Internally all
>>>>>> transports just have a 32-bit ID assigned by the system, for each
>>>>>> pipe they present.  Breaking that abstraction would cause serious
>>>>>> internal redesign to be done, and that’s not something I’d like to
>>>>>> do.  But I also keep “connection” state details to offer up to APIs
>>>>>> as well.  For example, for TLS connections I can present the TLS peer
>>>>>> certificate that was presented (if any), for websocket I give access
>>>>>> to the actual enclosing HTTP headers, and and for TCP and things on
>>>>>> top if it, I give access to the peer’s TCP endpoint address.  (In the
>>>>>> future I hope to offer access to peer credentials for IPC, and on
>>>>>> systems that offer it, on local TCP connections too.  There is some
>>>>>> ahem — work — to do to make that happen for systems because Go
>>>>>> doesn’t expose the necessary system calls — yet.  I’m probably going
>>>>>> to send patches upstream to Go to fix that for illumos/Solaris at
>>>>>> least.)
>>>>>> - Garrett

Other related posts: