[nanomsg] Re: Generalizing pubsub distribution

  • From: Carlos Pita <carlosjosepita@xxxxxxxxx>
  • To: Martin Sustrik <sustrik@xxxxxxxxxx>
  • Date: Sun, 13 Dec 2015 10:47:38 -0300

The real problem here is that applying backpressure to the publisher
turns a system that works nicely on its own into a system that has to be
supervised by a human on 24/7 basis. Once there's a slow/dead consumer
which causes the backpressure *nobody* is going to get any messages any
more.

Hi Martin, thanks for answering. I agree 100% with you, that's why I'm
looking for an extensibility mechanism and not for changing or
hyperparameterizing the default one, which I find a very sensible choice. A
concrete example: I'm working in an online statistical learning system,
most of the time a stream of market events is processed by a graph of
distributed components that ultimately feed a big numbers of estimators. As
it's more important for me to keep the estimators up to date than to
process every observation, "drop if pipe is full" is appropriate, kinda
random sampling with a dynamically adjusted sampling rate. But then,
sometimes I need to select models or train models offline, for which I
replay the observations from a database. This is full speed, I don't want
to find an "optimal" input frequency and put a leaky bucket at the input of
the system, as this frequency is a moving target that I have to find every
time by trial and error. So I prefer a back pressure behavior in this
scenario, let the system figure out that frequency. Every other part of
pubsub is fine in both use cases, but the dist part should change. Notice
that suicidal snail won't fit the bill, that credit based flow control is
not possible without accessing the internal pipes of the pub and mapping
them to the subscriptors providing credits, that reimplementing the pubsub
on top of -say- pushpull will introduce overhead -this is python for me-
and risks -I would be much more confident in something YOU have
implemented- and, finally, that "block if pipe is full" is not optimal but
its simplicity makes it attractive (another simple option is to run a
thread for each pipe and set a safe sndtimeo). So my point is to let the
user reutilize most of the pubsub pattern but also to be able to replace
the distribution routine, that is to use a different dist. But I think the
protocol api could be all that's needed, as I suggested in my last post. Is
that right?

Cheers
--
Carlos

Other related posts: