[nanomsg] Re: Trying to implement "directory" pattern

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: Paul Colomiets <paul@xxxxxxxxxxxxxx>
  • Date: Fri, 01 Mar 2013 09:10:21 +0100

On 28/02/13 10:18, Paul Colomiets wrote:

The difference is buffer limit. How would you tune TCP buffering to
handle 130 million subscriptions? I think if you do, then machine will
be open to DoS attack very easily.

The question here is whether we should even try to push 130M subscriptions to a TCP connection in one go. Maybe there's a viable way to iterate through the connection map and send subscriptions as bandwidth becomes available.

    2. Pushback => hanged up publisher can stop the whole topology


Why it would stop? It will result into "inconsistent message delivery"
until eventually subscriptions are sent. As the subscriptions are
usually aggregated on intermediaries, I don't think there are use cases
where subscription pipe is in "pushback state" all the time. So what we
need, is some sign to sysadmins that pushback happens (and that's
separate topic).

It may happen in the case of hanged-up application. It doesn't read messages, so TCP pushback is applied. Next message cannot be sent. It cannot be stored either, as we are out of buffer space. It cannot be dropped as we want the transport to be reliable. So the only thing to do is to stop sending new messages. That means that messages aren't sent even to the well-behaved peers. That way the failure propagates from the hanged-up application "sideways".

However, now it occurred to me that if we use the iterator approach as outlined above, there's only constant amount of data to store per peer (the iterator itself). That would prevent the above scenario. This idea seems worth of investigating.

    There's also the "reconnect" option which is just an evil variation
    on pushback. Instead of waiting for sending the remaining few bytes,
    it disconnects, reconnect and tries to send the whole subscription
    set anew.


The reconnect option needed if there is something with connection,
that's will be fixed with reconnect. I don't know networking that much,
but it happens in some obscure situations. The evil is in details, as
always. The description "Instead of waiting for sending the remaining
few bytes, it disconnects" is opposite to what I propose.

Yes. Agreed. I am not arguing against it. Just saying it doesn't help with the pushback problem.

    There seems to be no way out.

    If you see any other solution to the problem, please let me know.


You are too pessimistic :) Can you ask guys who have millions of
subscriptions in zeromq few questions:

1. Do you use subscription forwarding?
2. Does zeromq solves task well or is there are problems with zeromq
implementation?
3. What HWMs and SND/RCVBUFs are set?
4. How much memory is used by subscriptions (if it's possible to estimate) ?

OK. Will do.

This would give us an evidence of whether "zeromq way" of buffering is
ok for subscriptions.

Agreed.

Martin

Other related posts: