On 28/02/13 10:18, Paul Colomiets wrote:
The difference is buffer limit. How would you tune TCP buffering to handle 130 million subscriptions? I think if you do, then machine will be open to DoS attack very easily.
The question here is whether we should even try to push 130M subscriptions to a TCP connection in one go. Maybe there's a viable way to iterate through the connection map and send subscriptions as bandwidth becomes available.
2. Pushback => hanged up publisher can stop the whole topology Why it would stop? It will result into "inconsistent message delivery" until eventually subscriptions are sent. As the subscriptions are usually aggregated on intermediaries, I don't think there are use cases where subscription pipe is in "pushback state" all the time. So what we need, is some sign to sysadmins that pushback happens (and that's separate topic).
It may happen in the case of hanged-up application. It doesn't read messages, so TCP pushback is applied. Next message cannot be sent. It cannot be stored either, as we are out of buffer space. It cannot be dropped as we want the transport to be reliable. So the only thing to do is to stop sending new messages. That means that messages aren't sent even to the well-behaved peers. That way the failure propagates from the hanged-up application "sideways".
However, now it occurred to me that if we use the iterator approach as outlined above, there's only constant amount of data to store per peer (the iterator itself). That would prevent the above scenario. This idea seems worth of investigating.
There's also the "reconnect" option which is just an evil variation on pushback. Instead of waiting for sending the remaining few bytes, it disconnects, reconnect and tries to send the whole subscription set anew. The reconnect option needed if there is something with connection, that's will be fixed with reconnect. I don't know networking that much, but it happens in some obscure situations. The evil is in details, as always. The description "Instead of waiting for sending the remaining few bytes, it disconnects" is opposite to what I propose.
Yes. Agreed. I am not arguing against it. Just saying it doesn't help with the pushback problem.
There seems to be no way out. If you see any other solution to the problem, please let me know. You are too pessimistic :) Can you ask guys who have millions of subscriptions in zeromq few questions: 1. Do you use subscription forwarding? 2. Does zeromq solves task well or is there are problems with zeromq implementation? 3. What HWMs and SND/RCVBUFs are set? 4. How much memory is used by subscriptions (if it's possible to estimate) ?
OK. Will do.
This would give us an evidence of whether "zeromq way" of buffering is ok for subscriptions.
Agreed. Martin