[nanomsg] Re: loop detection

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Thu, 19 Feb 2015 08:50:12 -0800

Actually, reading the RFC in more detail, I see that REQ/REP specifies approach 
III with a value of 8 hops maximum.  I’ll probably simply apply the same 
setting in mangos.  Furthermore, I intend to make this event user visible 
either via statistic or logging.

        - Garrett

> On Feb 18, 2015, at 8:13 PM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:
> 
> I’m working on the updating the SURVEYOR protocol stuff, and I’ve realized we 
> have a pretty nasty problem in SP.
> 
> The SP protocols are all potentially subject to routing loops.  There is no 
> TTL in them, *and* there can be loops.  With some of the topologies, its 
> possible to build tragically bad routing loops that are explosive in nature 
> (think PUB/SUB, or worse, some of the BUS fabrics.)  The very very most worst 
> of these is my STAR protocol, where you can have have exponential packet 
> storms that arise.
> 
> I think there are some fairly simple fixes here, but they will require some 
> things that some may object to:
> 
>       1. Wire protocol changes.  I don’t see how to fix these problems 
> without fixing the borken wire protocol - *unless* we want to have 
> intermediate nodes keep some kind of cache of every packet they’ve seen in 
> the last time t.  (5 seconds?)  That seems tragically bad.
>       2. Expanded header sets - more in detail below.
>       3. A little more time / logic in processing packets.
> 
> I can see several different approaches to this problem.  Here they are:
> 
> APPROACH I.
> 
>       a. Assume every node has a unique ID of some sort.  I’m going to 
> suggest 64-bit EUIs for now.  (E.g. mac addresses).  In the worse case a 
> random value can be used — the risk of collision is sufficiently small that 
> I’m not concerned about it.
>       b. Each hop along the path (device) adds a pipe id, *and* its own 
> 64-bit ID to the stack trace.  That means that each intermediate node adds 
> 96, or more likely 128, bits.  (I’d choose 128 over 96.)  Frankly if the pipe 
> id space wasn’t so constrained, we could use 64bits by constraining pipe IDs 
> to 16 bits, and use 48-bit OUIs.  Except we also need a bit to identify the 
> end of trace.  
>       c. When a node (device) sees its own id in the backtrace, it discards 
> the message.  (Logging the presence of a routing loop might be good, or at 
> least bumping a stat.)
> 
>       pros: perfect filtering, entirely stateless
>       cons: substantial additional header sizes, and increased processing 
> overhead across devices
> 
> 
> APPROACH II.
> 
>       a. Instead of *every* node attaching its own ID, the *originating* node 
> could attach an originator ID next to the request ID & the request ID must be 
> monotonically increasing.
>       b. Every node records the highest value it saw for each originator.  If 
> the difference is negative, and small (to account for wraparound), then the 
> message is discarded.
>       c. Intentional replays must bump the request ID (e.g. retries due to 
> timeout)
> 
>       pros: minimal additional header content, “perfect” filtering
>       cons: requires unique node ID, and requires devices to cache (and 
> expire) originator/ID pairs, replays need additional request ID bump.
> 
> 
> APPROACH III.  (Can be combined with I & II)
> 
>       a. Every intermediate device scans the headers, and counts the number 
> of hops. 
>       b. If hop count > X (a configured limit) then just discard.
> 
>       pros:  Easy to implement, mixes with others, can be implemented with no 
> wire protocol change
>       cons: Not complete - while it eliminates the worse tragedy, explosive 
> expansions still possible.
> 
> APPROACH IV.
> 
>       a. Every node keeps a record (perhaps a hash checksum?) of traffic seen 
> “recently”, and discards duplicates
> 
>       pros: perfect filtering, no protocol changes required
>       cons: excessive amounts of processing, explosive memory requirements — 
> this is just a bad idea
> 
> I’m interested to hear what folks think.
> 
> For now I’m going to just use Approach III and basically punt on it.  But I’m 
> really favoring some other change to improve resilience to routing loops.
> 
> I haven’t looked to see what (if anything) ZMQ has done to solve for this.
> 
>       - Garrett
> 
> 


Other related posts: