Hi Paul,
On 28/02/13 10:18, Paul Colomiets wrote: The difference is buffer limit. How would you tune TCP buffering to handle 130 million subscriptions? I think if you do, then machine will be open to DoS attack very easily. The question here is whether we should even try to push 130M subscriptions to a TCP connection in one go. Maybe there's a viable way to iterate through the connection map and send subscriptions as bandwidth becomes available. This idea came to my mind several times. But, the problem is more complex, as subscriptions are changing on the fly. While I still think it's technically possible to optimize it to an iterator and a small buffer space (list of recent unsubscriptions of visited branches). I don't think its worthwhile for the following reasons: 1. You promised pluggable filters. It's very complex task to expect from every plugin.
That's a good point. It doesn't seem realistic to expect filter developers to deal with complexities of iteration over an evolving dataset.
2. It's complex task with a dozen of edge cases. Simple solutions are usually more reliable.
Ack.
3. It is an optimization that can be done later without affecting users. It can be done when it's need is demostrated, and when there are at least one big customer that will use it at real scale.
I am not sure about this one. If it turns out that preventing "sideways" failure propagation cannot be realistically done, we'll have to think out of the box and possibly adjust the affected patterns in such a way as to cope with this scenario. If we do so, it'll affect the users. Let's rather not ignore the problem.
2. Pushback => hanged up publisher can stop the whole topology Why it would stop? It will result into "inconsistent message delivery" until eventually subscriptions are sent. As the subscriptions are usually aggregated on intermediaries, I don't think there are use cases where subscription pipe is in "pushback state" all the time. So what we need, is some sign to sysadmins that pushback happens (and that's separate topic). It may happen in the case of hanged-up application. It doesn't read messages, so TCP pushback is applied. Next message cannot be sent. It cannot be stored either, as we are out of buffer space. It cannot be dropped as we want the transport to be reliable. So the only thing to do is to stop sending new messages. That means that messages aren't sent even to the well-behaved peers. That way the failure propagates from the hanged-up application "sideways". Technically yes. But looking at the problem slightly wider, the picture is the following: 1. I assume that subscriptions take much smaller amount of memory than memory needed to process the messages. So keeping another buffer for subscriptions is not a problem. Small demonstration: if you have many publishers, you keep a trie of subscriptions per publisher, and most of the time there is only one slow/inactive publisher (or your admins are dumb :) ). So you only duplicate subscriptions of a single publisher, which is a fraction of whole socket memory. Also buffer is usually more compact than a trie (not always though) 2. Let's keep 2x the size of the trie number of subscriptions in the buffer. If next subscription is added, kill the connection. On reconnect buffer will be 2x shorter. This solves the situation you described: "Instead of waiting for sending the remaining few bytes, it disconnects, reconnect and tries to send the whole subscription set anew". Of course 2x size can be tuned to something nicer. This way slow publisher can never block others and works around reconnection problem in a smooth way.
Intersting idea. I have to digest it before answering...
There seems to be no way out. If you see any other solution to the problem, please let me know. You are too pessimistic :) Can you ask guys who have millions of subscriptions in zeromq few questions: 1. Do you use subscription forwarding? 2. Does zeromq solves task well or is there are problems with zeromq implementation? 3. What HWMs and SND/RCVBUFs are set? 4. How much memory is used by subscriptions (if it's possible to estimate) ? OK. Will do. One final question, if it's not too late: Can pluggable filters make number of subscriptions much lower in their case? (I imaging some thousand filters cat be replaced by a single regexp, or some other kind of rule)
AFAICS it's not the case. Each subscription is unique and non-predictable.In general, I would say that we should expect some users to use large subscription sets. The question, of course, is whether algorithms for such monster suscription sets should not be built on top of nanomsg using raw PUB/SUB sockets.
Martin