[nanomsg] Re: draft surveyor RFC

From: Martin Sustrik <sustrik@xxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Sun, 08 Mar 2015 13:39:58 +0100

Hi Garrett,

It's entirely up to you. I've just wanted to mention that the transportsmay appreciate if they could store more than 31 bits in the backtracerecord -- but it is no hard requirement. Given that req/rep uses 31-bitIDs today may event imply that surveyor should do the same forconsistency's sake.


Martin

On 2015-03-07 20:00, Garrett D'Amore wrote:

Bueller?  Bueller?

Would really really like a solution to this.  Any other opinions (for
my approach, or against it)?  Or should I just go ahead and submit a
pull request at this point?

        - Garrett
On Feb 25, 2015, at 9:57 AM, Garrett D'Amore <garrett@xxxxxxxxxx>wrote:
So I didn’t see a reply to this. I’d really like to move forward withthis — I have a need for “fixed” surveyor methods in my application.I’m writing the code that does REQ/REP style processing for now - Ithink this is more than sufficient for all current needs. I’d hate todefer fixing this pending the requirements for the creation of an asyet non-existant UDP transport.
I’ve certainly convinced myself that even UDP can live with the 32-bit“pipe IDs” that are currently being embedded in the headers. Doing sowill require some modest amount of state on the peers, but franklythat’s not unreasonable, and I think its far better than carrying allthat state in the headers itself. (I see grave concerns with carryingidentifying information like intermediate IP addresses
        - Garrett
On Feb 20, 2015, at 11:16 AM, Garrett D'Amore <garrett@xxxxxxxxxx>wrote:
On Feb 20, 2015, at 12:49 AM, Martin Sustrik <sustrik@xxxxxxxxxx>wrote:
On 2015-02-19 22:08, Garrett D'Amore wrote:
Thinking about it further, I think this is a *bad* idea. Theproblem
is that then don’t have a way to infer stack depth easily — which
makes it impossible to count hops and problematic therefore forloop
prevention.
Additionally, there may be value in keeping more state (even forUDP)with a pipe than the peer. Therefore, I’m going to propose that aUDPtransport implementation could create pseudo-pipes, with a cacheand
timeout associated with them, as well as some upper bound.
For example, time out any pipe without traffic seen in the last 60
seconds. Then when a new message is received from a differentpeer,
create a pipe ID for it, storing the IP address & port of the peer.
When traffic comes in from the same peer, or goes out to it, bumpthe
timer on it.
Figure a maximum of “n” UDP pipes to be opened. For example,10,000ports. In the worse case, you’d need to store something like 64bits
for the IP address and port (more for IPv6), plus room for a sweep
hand timer (for mark and sweep based timeout, which would be
simplest), so data buckets are 8 bytes, and figure another 32bytes
for tracking linked list linkage (linking buckets in hash table) —
plus guess maybe another 8 bytes of over head, so 64 bytes per UDP
port. The some total of this is 64K per 1000 ports, which comesinat less than a MB for an entire 10,000 ports. If you want tosupportup to 1M active unique peers, it gets a little more expensive, butitsstill only 100MB, which is not that big a deal for moderncomputers.I doubt many single servers have to deal with 1M unique visitorsper
minute, and those that do are pretty darned beefy. :-)  (Actually,
looking at say Google — which had the highest web visitor count for
the month back in May of 2012 they had 173 M unique visitors per
month, which is actually only 4004 unique visitors per *minute*.Sohaving a limit of 1000, or even 10000 max open pipes for oneservice
instance doesn’t seem limiting.)
First: Why have pseudo-connections at all? (Ignoring the issue ofvariable-length backtrace records.)
Again, its tracking whatever state might be necessary to process thepacket *and* return the reply. To get through your topology state isrequired. The question is whether all the state lives in the packet,or you are willing to let devices along the path participate in statekeeping. Since really the state that is necessary is only requiredfor routing replies, not every protocol needs it. For examplepub/sub only really needs a hop count which can travel with theframe. (And that’s missing, but another problem to fix later for loopprevention.)
There’s another point here too… the middle components may have statedoesn’t fit well in 32-bits, could even be pretty large. Forcingthat to travel with the frame is onerous.
And, then there is a privacy problem. If all the state needed iskept with the frame, then it is exposed on the wire. This mayexpose things about my internal network (IP addresses and so forth)that I consider private to me. That has two potential side effects.One is security oriented (my internal network gets exposed via thisprotocol), the other is architectural (people can start attempting to*use* that knowledge in their applications, violating all the niceclean layering that we’ve built; having parseable headers is I thinkultimately a road to hell.)
Second: My conceptual image of an UDP socket is a universal radiotransmitter/receiver. It can get data from anyone and send data toanyone. No restrictions aside of the limited packet length. If weare going to have udp:// transport I would like to preserve thatconceptual image. If, on the other hand, we are going to build amore connection-like transport on top of UDP, let's call itsomething different. In short, transport functionality shouldcorrespond to the transport name.
I don’t see how that is at odds with what I’ve described, for theprotocols where that makes sense (e.g. BUS). Now that said, I’m onlythinking about unicast UDP. If you’re wanting to figure out ways touse broadcast or multicast UDP, *that* feels like a bigger departure— I think some of the protocols (such as req/rep) fall down in theface of this.
Third: Here's another use case for variable-length items, just offthe top of my head: Imagine a REQ/REP or SURVEYOR topology spanningfrom inside of a company to the outside world. The company may notwant to expose details of its network to the world (via thetraceback records) and thus may choose to place device at the edgeof their network that takes the current stack of the request andencrypts it, creating a single mangled record. When the repliesarrive at the edge, they are decrypted and the message is routedforward into the corporate network.
That level of privacy is *easier* to achieve by just ripping off theheader entirely and writing a new one - in fact, if you have somestate here you can save the backtrace in your state. You could ofcourse implement that mangling bit you just described today instead.But in that case its going to still appear to have a set number ofhops. If the mangled header has a different size, that will causeconfusion. It would be bad to store a very much longer header thanwhat the message had on ingress, because that would appear to beadding hops to a naive examiner.
You know, it occurs to me that we could probably dispense with a lotof these if we just changed the final request ID part of the headerfrom a 32-bit word (1 + 31 bits) to have a different format; forexample, 1+7bits+24 bits. The 24-bits would be a pipe ID, and the7-bits could carry a hop count. That would leave room for up to 16million pipes, and really who can handle more than thatsimultaneously? And you’d be able to count up to 127 hops — andfrankly nobody wants messages bouncing around their network for morehops than that! :-)
If we made *that* change, then we could dispense with most of theheader payload rules, except to require the following:
a) devices always strip off the same size header that they attach.
b) headers are always grown in increments of 32-bits.
c) each intermediate 32-bit word of a header must have the upper bitcleared.
What transports or protocols do beyond that then becomes atransport/protocol decision.
Now it turns out that in my implementation of mangos, the protocol isresponsible for adding / removing “pipe IDs” to the header, becausethe protocol doesn’t know transport details. Internally alltransports just have a 32-bit ID assigned by the system, for eachpipe they present. Breaking that abstraction would cause seriousinternal redesign to be done, and that’s not something I’d like todo. But I also keep “connection” state details to offer up to APIsas well. For example, for TLS connections I can present the TLS peercertificate that was presented (if any), for websocket I give accessto the actual enclosing HTTP headers, and and for TCP and things ontop if it, I give access to the peer’s TCP endpoint address. (In thefuture I hope to offer access to peer credentials for IPC, and onsystems that offer it, on local TCP connections too. There is someahem — work — to do to make that happen for systems because Godoesn’t expose the necessary system calls — yet. I’m probably goingto send patches upstream to Go to fix that for illumos/Solaris atleast.)
        - Garrett

Follow-Ups:
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore

References:
- [nanomsg] Re: draft surveyor RFC
  - From: Garrett D'Amore

[nanomsg] Re: draft surveyor RFC

Other related posts: