[nanomsg] Re: Trying to implement "directory" pattern

  • From: Paul Colomiets <paul@xxxxxxxxxxxxxx>
  • To: Martin Sustrik <sustrik@xxxxxxxxxx>
  • Date: Mon, 4 Mar 2013 03:27:59 +0200

Hi Martin,


>> 3. It is an optimization that can be done later without affecting users.
>> It can be done when it's need is demostrated, and when there are at
>> least one big customer that will use it at real scale.
>>
>
> I am not sure about this one. If it turns out that preventing "sideways"
> failure propagation cannot be realistically done, we'll have to think out
> of the box and possibly adjust the affected patterns in such a way as to
> cope with this scenario. If we do so, it'll affect the users. Let's rather
> not ignore the problem.
>
>
In your statement its true :) I was talking about "full buffering" vs
"smart iteration" approach to subscriptions. In my case it's clearly an
"invisible" optimization.


>
>>              There seems to be no way out.
>>
>>              If you see any other solution to the problem, please let me
>>         know.
>>
>>
>>         You are too pessimistic :) Can you ask guys who have millions of
>>         subscriptions in zeromq few questions:
>>
>>         1. Do you use subscription forwarding?
>>         2. Does zeromq solves task well or is there are problems with
>> zeromq
>>         implementation?
>>         3. What HWMs and SND/RCVBUFs are set?
>>         4. How much memory is used by subscriptions (if it's possible to
>>         estimate) ?
>>
>>
>>     OK. Will do.
>>
>>
>> One final question, if it's not too late: Can pluggable filters make
>> number of subscriptions much lower in their case? (I imaging some
>> thousand filters cat be replaced by a single regexp, or some other kind
>> of rule)
>>
>
> AFAICS it's not the case. Each subscription is unique and non-predictable
>

Ack.



> In general, I would say that we should expect some users to use large
> subscription sets. The question, of course, is whether algorithms for such
> monster suscription sets should not be built on top of nanomsg using raw
> PUB/SUB sockets.


At the first sight it's nice idea. However, raw sub socket, when forwarding
a subscription has no way to know which pipes are in "pushback" state, and
can't react based on that. So can't reliably deliver subscriptions upstream.


> We can also allocate only a fixed memory amount for holding subscriptions
> (the limit may be set by user). If the limit is exceeded, we can either
> report error or switch the filtering off.


It's not clear is the limit is a size of the tree, or buffer size? If it's
size of the tree, then how large buffer can be (if it's filled by
unsubscriptions). If the limit is buffer size, then (apart from being what
I've proposed in the first place), it may be lower than the size of the
tree. But nevermind, we can have two different limits to cover all cases.

The other thing is not clear too. Reporting error of overflowing size of
the tree is ok. However, as indicated above it's not enought. If we report
error of buffer overflow, then what can user do to it? Never do any
subscription from this point on? Close and reopen the whole socket? Ah, ok
in case of one publisher it would work (user knows which publisher is
failing, and reopening connection to failing pubslisher is not a problem
either). Another option: switch filtering off. I don't even understand how
it can work. The publisher already received some subscriptions. For him to
know that filtering is off it should receive a message (which is stalled in
pushback). We could close that overloaded connection and start a fresh new
one with filtering disabled, but I don't like this idea.

All in all, the limit of the number of subscriptions seems to be a nice
idea, but doesn't solve any problem by itself. Combining it with some
heuristics for buffer size and reconnect mechanics along the lines of what
I've described in previous mail would work for me.

-- 
Paul

Other related posts: