[nanomsg] tuning for fast network

From: Marko Vendelin <markov@xxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Fri, 3 Jul 2015 16:29:31 +0300

Dear NanoMSG developers:

I would like to tune nanomsq REQ/REP server/client interaction for
40GbE network. I am using a server to serve ~30 clients using REQ/REP
socket. The clients are all in another machine that is directly
connected through 40GbE card-to-card connection with the server. So,
there are no switches in between. Each message is about 2MB. The
server should be able to saturate 40GbE network. Namely, when I use
ZeroMQ for that, I can reach 36-37Gb/s transfers when using zmq
zero-copy transfers. Now, I am struggling to get the same rates with
nanomsg.

If I use nn_allocmsg and DON'T fill the buffer with data, I can reach
the rates of ~34Gb/s, no problem. As soon as I start using memcpy in
other threads, rate goes down. At present, I can reach 25Gb/s using
other thread(s) to pre-fill messages. I should be able to optimize the
filling of the allocated buffers, but its not very simple. In ZMQ,
while I do have some other problems, I can just use the same thread
for sending and receiving data without pre-filling of the data (data
would be pre-filled separately in other threads of the production
program).

When I compare ZMQ API for zero-copy and the one from nanomsg, it
seems that ZMQ allows to use user-specified buffer for the transfer.
This is in contrast to nanomsg which seem to require that the buffers
are allocated by nn_allocmsg for sending. Or is there a way to specify
my data buffer and use it for sending without copy?

From reading earlier posts, I don't know whether ZMQ 4.1 is fully
using zero-copy, but at present it seem to have faster solution for my
use pattern and very limited knowledge of nanomsg.

Best wishes,

Marko

[nanomsg] tuning for fast network

Other related posts: