[nanomsg] Re: why memcpy data to be sent?

  • From: Garrett D'Amore <garrett@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Wed, 29 Oct 2014 06:16:47 -0700

Generally, yes, protection domain crossing memcpys are where it is cheaper and 
easier to copy.

Within the same address space, using ownership / reference passing is easier.  
In fact, mangos does just this to avoid pointless data copying.  But more than 
that, I found that what *really* helped was optimizing to eliminate pressure on 
the memory allocator and garbage collection.  (In this case maintaining my own 
cache of message objects.)

Having a version of the send and receive routines that could be passed a 
reference, complete with a deallocation function / destructor, would not be a 
bad way eliminate the data copies.

Again, this optimization can be performed relatively painlessly later — I would 
first do some profiling to demonstrate that the change was necessary based upon 
measurement, before adding the complexity up front, though.

  - Garrett

> On Oct 29, 2014, at 3:52 AM, Alex Elsayed <eternaleye@xxxxxxxxx 
> <mailto:eternaleye@xxxxxxxxx>> wrote:
> 
> Matthew Hall wrote:
> 
>> On Tue, Oct 28, 2014 at 05:57:10PM -0700, Garrett D'Amore wrote:
>>> Bluntly.  I think you may be suffering from premature optimization.
>>> 
>>> Getting to tens of gigabits per second isn't that hard modern hardware.
>>> 
>>> Profile your app and check to see where it is spending time.
>>> 
>>> It may be cheaper to throw a little more hardware at the problem and
>>> parallelize than to try extraordinary measures like a user space tcp
>>> stack.
>>> 
>>> Sent from my iPhone
>> 
>> I didn't perform any of the optimizations yet. I was just showing a
>> practical example of the kind of issues I can run into using these
>> different hunks of code together.
>> 
>> I can tell you that on a previous project similar to this one, where all
>> the data was getting memcpy'ed between one half of the TCP/IP stack and
>> the other in a similar environment, removing the unneeded memcpy's gave a
>> 50% boost.
>> 
>> But that environment also memcpy'd a higher percentage of the traffic than
>> this one (necessarily) would.
>> 
>> Regarding user space TCP/IP, I can tell you from past experience there was
>> no way to get close to the top level of performance I eventually want to
>> have without it.
> 
> Linux has recently seen a number of improvements to drastically reduce the 
> overhead of networking; it may be worth looking up the LWN article about 
> 'xmit_more' - one of the tests was, in fact, generating wire-rate traffic on 
> 10gig.
> 
> Also, it seems like what you mean by 'zero-copy' is not _quite_ the same as 
> what's more commonly meant. Usually, zero-copy is referred to not making 
> copies when _crossing some sort of protection domain_ - address space (shmem 
> for IPC), network (RDMA), etc.
> 
> Here, you're referring to crossing _layers_ inside of your own address 
> space, which doesn't really have a term I've seen used consistently, but 
> often shows up under the banner of ownership-based handling of data - where 
> rather than giving an API _access_ to a piece of data, you give it 
> _ownership_ of that data.
> 
> The Rust language is basically designed around that idea.
> 
> The points people have been bringing up against it - zero-copy not being 
> worth it under 512K, etc - are mostly from the crossing-protection-domains 
> type.

Other related posts: