[nanomsg] Re: understanding of BUS protocol

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: Jeff Archer <jsarcher@xxxxxxxxxxxxxxxxxxxxxx>, nanomsg@xxxxxxxxxxxxx
  • Date: Thu, 10 Apr 2014 10:02:19 -0700

Ok, I’ve been thinking about this more.

I think there are two possible ways to go with this.

The first way is to just go with what I have done, although I’d probably make 
the raw form stop forwarding.  This would be analogous to req/rep.  (I’d rename 
it to something.. maybe STAR?  Opinions welcome!)  (I’d also reimplement BUS to 
Martin’s spec for folks that prefer that behavior.)

The second way is to add the backtrace checks.  This would require every socket 
to generate a unique (random) 64-bit value, and check for is own ID in any 
messages, and discard them if found.  That would eliminate the first round of 
infinite loops.  The second bit is to add a maximum hop count, so that the 
backtrace is not allowed to grow beyond a certain depth . Probably somewhere 
between 8 and 16 is a reasonable maximum.  That mitigates (but doesn’t solve!) 
a different problem involving message amplification.  The message amplification 
problem occurs if you want to build a fault tolerant architecture where you 
have *two* intermesh connections (e.g. two device nodes A and B in San Diego, 
and two more, C and D in Brussels.)  In this case the network can sustain the 
failure of any one of A and B, or C and D.  But the amplification occurs 
because C will duplicate messages from D, and A would duplicate those from B.

I’m still thinking of ways to eliminate this amplification.  I think to really 
do it, you need to have nodes that serve multiple meshes do something like a 
spanning tree protocol/negotiation.  This is a problem that is well understood 
in ethernet switches, and a solution can be borrowed from that space, I believe.

It strikes me that this new pattern (BCAST?) could be built on top of the same 
logic we use for STAR, but adding this additional header processing, and 
probably some additional messaging (ala STP — spanning tree protocol) under the 
hood to prevent message loops, etc.  This starts to look a bit more heavy 
weight than nanomsg currently is, but it also seems like this would be 
incredibly useful for folks that want to quickly build fault tolerant broadcast 
applications.

Thoughts?  Is STAR going to be useful on its own?  Or do you really want the 
larger BCAST semantics?

-- 
Garrett D'Amore
Sent with Airmail

On April 10, 2014 at 8:25:37 AM, Jeff Archer (jsarcher@xxxxxxxxxxxxxxxxxxxxxx) 
wrote:

I really like your STAR pattern idea.  I have a need for exactly this also.

On Thu, Apr 10, 2014 at 10:53 AM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:


> On Apr 10, 2014, at 2:10 AM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>> On 10/04/14 10:47, Garrett D'Amore wrote:
>> In the everyone to everyone case what keeps the packets from
>> getting stuck in a loop?  Any cycle would be tragic right?  I think
>> a TTL is required.
>
> No. The trick is that BUS socket doesn't forward the messages
> automatically. It just passes any message it gets to the user and
> that's it.
>
> So, if you have a mesh with no devices, you are perfectly OK.
>
> As for devices, they are special applications that re-send every
> delivered message back to the socket it was received from. The socket
> is then responsible for not forwarding the message to the original sender.

Ok.  So this pattern means that the message is only sent to my directly 
connected peers.  That seems kind of unfortunate.  In particular with the 
underlying transports I must either explicitly create a device in my network or 
I must create a mesh if I want full delivery.

I can see uses for the pattern but I think my approach is easier to use.  
Possibly with a modification.  More on that below.

>
> Of course, device admin must make sure that the device connects two
> otherwise un-connected meshes. If that's not the case, infinite loops
> will happen.

Right.  In my design the single socket device would be pointless.  More below.

>
> I admit that the design is kind of ugly, but that was the best I could
> thought of at the time.
>
>> I'm not sure I like exposing pipe ids to applications.  It's
>> needed in your implementation because the forwarding is done by the
>> app rather than by the library. In my implementation the forwarding
>> is handled without the apps awareness.
>>
>> I think non star graphs will work in my case just as well as yours.
>> I think both will suffer catastrophically in a true mesh or any
>> other arrangement with cycles in it.  The only way I know to fix
>> that is to break the cycle with logic at the application.  That can
>> be done with a device using two sockets explicitly.  You can do
>> that today with either implementation.
>
> The architecture would be definitely cleaner that way. What made me go
> the other was that I wanted to have just a single type of device,
> rather then one hub-like and one gateway-like device. For all other
> patterns, at least, we have only one device type.
>
> Also, AF_RAW sockets for all other patterns allow for message
> interception/filtering/tagging/modification. If the forwarding is done
> inside the socket, the application won't be able to do that.
>
> If you have a sane way to address these issues, I'd be happy to change
> the design.

It seems like a good compromise would be to do the forwarding for normal 
sockets but give leave the logic out for raw sockets. We need the header 
approach you designed to make that work.

Devices normally use raw sockets so that works.

Another thing we could do is to create a new pattern STAR or some such that 
follows my semantics.  I think this pattern is much easier for applications 
because it implicitly has the property that all nodes in the network get a copy 
of the message. The admin needs to be careful not to create cycles but with 
normal connect and accept semantics this would not usually be a problem.  In 
the typical case the server will have connections to every client and the 
clients will have only one connection to the server.  This makes the server a 
single point of failure.

I want to think if ways to break the cycle for full meshes too.  I think 
putting a unique node idea in the header and just expanding the header would do 
the trick.  You can then also add a TTL or max hops as an extra layer of 
protection.

I will write something up.


>
>> Essentially my approach automates the bus logic. The only messages
>> a non device app ever needs to explicitly send are those it is
>> originating.
>
> Same is true of for the existing model.
>
> Martin
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJTRl/pAAoJENTpVjxCNN9YtXkH/2HpRANw0iNh7dvQ+4kgAS2y
> tVvydncaoXCykTvyynb8T7eFoaF5dVKw7ECDC0XKcptfcil6KaG8XAAcXVjZ4qQe
> DdkTB1XNcmOrDjk1evaRuA+dgfkM8UcrZ9oTekuzcM6dRwbCGRLxmyvS9S5Q/bjY
> 7sAuY8SMnkiCLHTd12aeoYEdXVf7K2yCS5ZyYhWaxEDkQo6KNbMZ60STBaOAsppP
> 5fqt9rYaPXxgRmJ85m4IlcuovYgIVF0zloDyQmDTSqYnAkCP3199towQjbkC56K3
> YUgsg7mGEFcvApg7iNepB/stKGxmsa/0oQiwdcrF7HjQ9uIDLdNrHcEkQiM3Xdw=
> =t+C7
> -----END PGP SIGNATURE-----
>


Other related posts: