On 26/02/13 20:41, Paul Colomiets wrote:
The problem with that is auto-reconnect. Subscriber sends a lot of subscriptions, more that the producer is able to accept. It is not processing them, so TCP pushback happens. Subscriber sees that the subscription stream is stuck and disconnects the peer. The producer tries to reconnect and immediately gets hit by a subscription storm. As so on ad infinitum. That's why I've written "if connection does no progress". I mean disconnect if no bytes where sent for 10 seconds, or something like that. It means that any number of subscriptions can be sent, even if it would take minutes to upload them. I think in all realistic situations (up to thousands subscriptions in up to few seconds) it will work. There is an edge case, when you create new subscriptions in the tight loop, and publisher can't keep up with it. But I don't think it's a situation that's need to be taken care of.
It's a single problem IMO. It can be formulated like this:"Given limited tx buffer (whether in kernel or in user space) what should be done when it gets full and user still wants to send new subscription."
Btw, speaking of realistic situation, I've just spoke to guys who are handling 130,000,000 subscriptions in ZeroMQ :)
Anyway, the problem can be split into 2 parts: 1. How to manage pushback. 2. What to do when it can't be managed any more.The options for the first are either relying on TCP (problem occurs when tx buffer limit is hit) or building a rate limiting algorithm on top (problem happens when the rate limit is exceeded) -- the latter being basically what you are proposing.
I would say that both are functionally equivalent (ie. the problem occurs when too much data is sent in too short a time) the only difference being that implementing rate limiting requires more work to be done.
The interesting part is what happens when the problem occurs (tx buffer full, rate limit exceeded). The options here are:
1. Drop => results in inconsistent message delivery 2. Pushback => hanged up publisher can stop the whole topologyThere's also the "reconnect" option which is just an evil variation on pushback. Instead of waiting for sending the remaining few bytes, it disconnects, reconnect and tries to send the whole subscription set anew.
There seems to be no way out. If you see any other solution to the problem, please let me know. Martin