[nanomsg] Re: issue with "received" message size

  • From: Paul Colomiets <paul@xxxxxxxxxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Tue, 11 Mar 2014 21:46:13 +0200

Hi Martin,

On Tue, Mar 11, 2014 at 8:21 PM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/03/14 15:20, Paul Colomiets wrote:
>
>> Yes. That kinda makes sense. But I still have two reasons not to
>> return nn_recv priorities back:
>>
>> 1. There are possible stalled data for ages in low-priority pipe.
>> That's not the case for priorities on nn_send side.
>>
>> 2. The way I wan't nn_recv failover to work is following:
>>
>> We have two REQ sockets A and B, and a REP socket C. I'd like
>> low-priority connection B-C to be established only if
>> higher-priority A-C is not connected.
>>
>> The latter makes much more sense for low-latency systems which
>> usually never under pushback.
>
> Ok, I see where you are heading.
>
> Your goal is to route messages to a failover service immediately
> rather than when TCP buffers are full and pushback is applied, right?
>

Not exactly. I'm OK with nn_send failover driven by pushback. But see
my example below, for description of nn_recv failover semantics I'm
thinking of.

> Now consider following observations:
>
> 1. If new high priority connection arrives and REP is out of
> connection limit, it has to close one low priority connection --
> dropping all the messages stored in TCP buffers of that connection.
> That's basically the "stall forever" scenario you want to avoid.
>

Kinda yes. That's why I didn't come up with proposal before. I think
it should be thought out more. I.e. when higher priority connection
arrives the "shutdown" signal is sent (remember my shutdown
proposal?), that allows to deliver replies but not to receive new
requests.

> 2. The proposed algorithm doesn't help much in case there's a device
> in the topology.
>

It does. Look schematic at: http://bit.ly/1g5T7kT

If "Device 1" crashes. I want requests to be be evenly spread
between the workers. I can't do that with nn_send priorities, because
then Worker1 and Worker2 will work only when pushback applies.
So I would do that with nn_recv priorities on worker side.

And It not worth to connect every worker to every device with
same priorities, perhaps because Device1 and Worker[12] are
at one data center and Device2 and Worker[34] are at another
data center.

> 3. It won't work for connection-less transports (UDP or similar).
>

I believe pushback doesn't apply to connection-less transports
too, right?

> 4. It creates a tigh coupling between SP-level functionality
> (priorities) and transport-level functionality (TCP connections).
>

Well, I consider this weak point :)

> So, we will have to do with heuristics, which fail ocassionally and
> send requests to the workers that are not able to process them at the
> moment (due to overload).
>

Yes. My goal is same. I.e. both solutions have deficiencies. I would
argue my idea is more useful and can't be done by user code (without
too much hassle of heartbeating and so on).

> The way I've tried to approach this problem was to adopt internet's
> "dumb netowrk, smart endpoints" principle. In the REQ/REP case it
> means that endpoints assume that network is dumb and mis-schedules
> request at times. In such case the failure to receive the reply is
> considered to be a failure of network and the request is re-sent.
>

Yes the problem is that typically in topology:

A -> B

The request from A to B may hang for a long time, only if B makes no
progress at all. But if we have priorities, the request from A can be
hang if there is a higher priority pipe:

A -> B <= C

And do you think the average programmer expects some request to be
executed in a hour from start or in a day? Even idempotent (but not
stateless) requests may fail in this case. This probably means that
every request must be timestamped. Which in turn means we would need
to get timestamping into the core, which is not good IMO.

-- 
Paul

Other related posts: