[nanomsg] Re: Trying to implement "directory" pattern

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: Paul Colomiets <paul@xxxxxxxxxxxxxx>
  • Date: Sun, 03 Mar 2013 09:34:24 +0100

Hi Paul,

    On 28/02/13 10:18, Paul Colomiets wrote:

        The difference is buffer limit. How would you tune TCP buffering to
        handle 130 million subscriptions? I think if you do, then
        machine will
        be open to DoS attack very easily.


    The question here is whether we should even try to push 130M
    subscriptions to a TCP connection in one go. Maybe there's a viable
    way to iterate through the connection map and send subscriptions as
    bandwidth becomes available.


This idea came to my mind several times. But, the problem is more
complex, as subscriptions are changing on the fly. While I still think
it's technically possible to optimize it to an iterator and a small
buffer space (list of recent unsubscriptions of visited branches). I
don't think its worthwhile for the following reasons:

1. You promised pluggable filters. It's very complex task to expect from
every plugin.

That's a good point. It doesn't seem realistic to expect filter developers to deal with complexities of iteration over an evolving dataset.

2. It's complex task with a dozen of edge cases. Simple solutions are
usually more reliable.

Ack.


3. It is an optimization that can be done later without affecting users.
It can be done when it's need is demostrated, and when there are at
least one big customer that will use it at real scale.

I am not sure about this one. If it turns out that preventing "sideways" failure propagation cannot be realistically done, we'll have to think out of the box and possibly adjust the affected patterns in such a way as to cope with this scenario. If we do so, it'll affect the users. Let's rather not ignore the problem.

             2. Pushback => hanged up publisher can stop the whole topology


        Why it would stop? It will result into "inconsistent message
        delivery"
        until eventually subscriptions are sent. As the subscriptions are
        usually aggregated on intermediaries, I don't think there are
        use cases
        where subscription pipe is in "pushback state" all the time. So
        what we
        need, is some sign to sysadmins that pushback happens (and that's
        separate topic).


    It may happen in the case of hanged-up application. It doesn't read
    messages, so TCP pushback is applied. Next message cannot be sent.
    It cannot be stored either, as we are out of buffer space. It cannot
    be dropped as we want the transport to be reliable. So the only
    thing to do is to stop sending new messages. That means that
    messages aren't sent even to the well-behaved peers. That way the
    failure propagates from the hanged-up application "sideways".

Technically yes. But looking at the problem slightly wider, the picture
is the following:

1. I assume that subscriptions take much smaller amount of memory than
memory needed to process the messages. So keeping another buffer for
subscriptions is not a problem. Small demonstration: if you have many
publishers, you keep a trie of subscriptions per publisher, and most of
the time there is only one slow/inactive publisher (or your admins are
dumb :) ). So you only duplicate subscriptions of a single publisher,
which is a fraction of whole socket memory. Also buffer is usually more
compact than a trie (not always though)

2. Let's keep 2x the size of the trie number of subscriptions in the
buffer. If next subscription is added, kill the connection. On reconnect
buffer will be 2x shorter. This solves the situation you described:
"Instead of waiting for sending the remaining few bytes, it disconnects,
reconnect and tries to send the whole subscription set anew". Of course
2x size can be tuned to something nicer.

This way slow publisher can never block others and works around
reconnection problem in a smooth way.

Intersting idea. I have to digest it before answering...


             There seems to be no way out.

             If you see any other solution to the problem, please let me
        know.


        You are too pessimistic :) Can you ask guys who have millions of
        subscriptions in zeromq few questions:

        1. Do you use subscription forwarding?
        2. Does zeromq solves task well or is there are problems with zeromq
        implementation?
        3. What HWMs and SND/RCVBUFs are set?
        4. How much memory is used by subscriptions (if it's possible to
        estimate) ?


    OK. Will do.


One final question, if it's not too late: Can pluggable filters make
number of subscriptions much lower in their case? (I imaging some
thousand filters cat be replaced by a single regexp, or some other kind
of rule)

AFAICS it's not the case. Each subscription is unique and non-predictable.

In general, I would say that we should expect some users to use large subscription sets. The question, of course, is whether algorithms for such monster suscription sets should not be built on top of nanomsg using raw PUB/SUB sockets.

Martin


Other related posts: